Phishing attacks rely on deception, pure and simple. Using realistic looking, but fake Web sites was one of the first techniques used by phishers. Eventually that approach became somewhat ineffective. Web sites didn’t look exactly right or the URL was wrong, alerting us to the deception.

The real thing

Phishers still use fake Web sites, but have developed a better mouse trap by altering official Web sites. How you say? It’s simple; phishers leverage the same vulnerabilities that are used for Web site defacement and various other attack vectors. It’s a good idea, since there’s no need to create anything, just alter what exists. Besides it’s the perfect deception, the site obviously looks right and the correct URL is displayed.

The “how and why” Web sites are exploited is well documented, with leveraging weaknesses in PHP to gain a foothold on the Web server being one of more preferred methods. An example of this would be the vulnerability discussed in the National Cyber-Alert CVE-2008-3239:

“Unrestricted file upload vulnerability in the writeLogEntry function in system/v_cron_proc.php in PHPizabi 0.848b C1 HFP1, when register_globals is enabled, allows remote attackers to upload and execute arbitrary code via a filename in the CONF[CRON_LOGFILE] parameter and file contents in the CONF[LOCALE_LONG_DATE_TIME] parameter.”

What makes this vulnerability unique is the developer’s insistence that there’s nothing wrong with the code. So they aren’t going to change anything:

“Tough we do not intend to release a security fix for this issue at this time, we want to remind our users of the importance of disabling the “REGISTER_GLOBALS” option of their system. This option will not only enable this vulnerability to be exploited but will also open multiple breaches into your system. Note that if your system is configured properly (with “REGISTER_GLOBALS” disabled), this vulnerability does not apply to your website.”

Kind of a strange statement from a vendor, but it’s exactly what the bad guys like to see. As proof, I did a simple search and found several Web sites advertising exploit code for this vulnerability. I’ve linked one example that’s published at the Milw0rm site.

Current research

I’ve just finished reading a paper written by researchers Tyler Moore (CRCS Harvard University) and Richard Clayton (Computer Laboratory, University of Cambridge) titled “Evil Searching: Compromise and Recompromise of Internet Hosts for Phishing (pdf). Don’t worry about the title; the paper is a good read shedding light on the effectiveness of Web sites altered to steal sensitive information. For example, one interesting statistic was the mix of compromised Web sites versus fake Web sites:

“By far the most common way to host a phishing Web site is to compromise a Web server and load the fraudulent HTML into a directory under the attacker’s control. This method accounts for 75.8% of phishing.

A simpler, though less popular approach, is to load the phishing web page onto a ‘free’ web host, where anyone can register and upload pages. Approximately 17.4% of phishing web pages are hosted on free web space.”

Locating vulnerable Web sites

OK, we now know that phishers prefer to alter real Web sites and how they do it. The next question begging to be asked is how they find vulnerable Web sites. In reality, phishers don’t have too much trouble. They use readily available scanners designed to check for PHP weaknesses. One example is the Web Vulnerability Scanner by Acunetix:

“The best way to check whether your web site & applications are vulnerable to PHP security attacks is by using a Web Vulnerability Scanner. A Web Vulnerability Scanner crawls your entire website and automatically checks for vulnerabilities to PHP attacks. It will indicate which scripts are vulnerable so that you can fix the vulnerability easily.”

Still, most would admit that this type of scanning is slow and very inefficient, especially considering the number of Web sites in existence. Moore and Clayton’s paper again sheds light on what phishers are using to make the locating process easier:

“An alternative approach to scanners, that will also locate vulnerable websites, is to ask an Internet search engine to perform carefully crafted searches. This leverages the scanning which the search engine has already performed, a technique that was dubbed ‘Google hacking’ by Long.

He was interested not only in how compromisable systems might be located, but also in broader issues such as the discovery of information that was intended to be kept private. Long called the actual searches ‘googledorks’, since many of them rely upon extended features of the Google search language, such as ‘inurl’ or ‘intitle’.”

The article that the above quote refers to is written by Johnny Long and titled “Google Hacking Mini-Guide“. It’s a treasure trove of information on how to maximize Google search instructions to get sensitive details about Web sites.

Let’s see if it works. If you remember the PHP vulnerability described by CVE-2008-3239, the key search phrase would be “PHPizabi 0.848b C1 HFP1”. I entered that phrase in Google search and after some digging to get past all the entries referring to this exploit, I found results that definitely would be of interest to phishers:

Side bar: It’s not Google’s fault

In researching this article, I quizzed some of my friends and walked away a bit surprised. A few remarked that Google is partially to blame for this. I totally disagree with that attitude and hope that you would as well.

Google provides a service that makes finding and retrieving data a whole lot easier. As you know I get on Google’s case about storing this information safely, but totally acknowledge that their search engine is the best bar none. In my opinion, the problem lies elsewhere.

Nothing new

Using search engines to find vulnerable Web sites isn’t new. What is new is the way Moore and Clayton were able to statistically link the Web search results with the probability of a specific Web site becoming compromised. They accomplished this by using Webalizer, a program that creates reports from Web server logs. Of special interest to the researchers was the recorded search terms used to locate the Web site:

“In particular, one of the individual sub-reports that Webalizer creates is a list of search terms that have been used to locate the site. It can learn these if a visitor has visited a search engine, typed in particular search terms and then clicked on one of the search results.

In the following slide (courtesy of Moore and Clayton) you can see that several of the Webalizer entries are related to the search shown in the browser window:

Key points of the report

So what’s it all mean? In a convincing fashion, Moore and Clayton have figured out how to pull all of the important data together and assemble it in a usable format which has turned up some interesting results. The following points are two of the more notable ones:

  • 90% of the Web sites in the study group were compromised almost immediately after suspicious search terms were found in the Webalizer report.
  • One surprising statistic was the rate of being compromised multiple times. The report showed that almost 20% of infected Web servers were likely to become re-infected, but when Webalizer found suspicious search terms directed at a particular Web site, the chance on becoming re-infected jumped to 48%.

The fact that there are servers being compromised multiple times is something that I don’t understand at all. That needs to be fixed. To that end, let’s look at what the researchers are suggesting Web hosts do to reduce their risk.

Room for improvement

I hope Web hosting services take what the researchers learned seriously, especially the following suggestions:

  • Obfuscating targeted details: Suspicious searches would be less effective if identifying information such as version numbers of the software being used by the Web server were not publicized.
  • Suspicious search penetration testing: Motivated defenders could run searches to locate Web sites that appear vulnerable, warning their owners of the potential risk.
  • Blocking suspicious search queries: An alternative approach is for the search engines to detect suspicious searches and suppress the results.
  • Lower the reputation of previously phished hosts: In addition to flagging active phishing URLs, mark previously compromised hosts as risky due to the high likelihood of being compromised again.

What can we do

There are a few things that we as Internet users can do to protect ourselves. I’ve been suggesting that everyone use McAfee SiteAdvisor, even Moore and Clayton mention it in their report. It works by installing a browser add on:

“With SiteAdvisor software installed, your browser will look a little different than before. We add small site rating icons to your search results as well as a browser button and optional search box. Together, these alert you to potentially risky sites and help you find safer alternatives.”

An alternative that’s not as user-friendly is to visit the PhishTank Web site if there’s any question as to whether a particular Web site is real, fake, or possibly compromised:

“PhishTank is a collaborative clearing house for data and information about phishing on the Internet. Also, PhishTank provides an open API for developers and researchers to integrate anti-phishing data into their applications at no charge.”

The Anti-Phishing Working Group has a Web site that’s full of good information and specifics as to what’s going on in the world of phishing:

“The Anti-Phishing Working Group (APWG) is the global pan-industrial and law enforcement association focused on eliminating the fraud and identity theft that result from phishing, pharming and email spoofing of all types.”

Final thoughts

All of us, businesses and individual users alike are becoming very reliant on the Internet. So when something like phishing disrupts that trust, I tend to take it personally. Finding out that Web sites get exploited a second and third time just adds to the frustration. It’s just not right. Not sure what to do though, do you have any ideas?