I was recently subjected to an interesting new type of malicious code a few days ago, and I wanted to share it.

A friend of mine asked me to help him with a little bit of PHP coding a few days ago. He understands a bit about programming, but never did it for anything complex, and does not do it very often. He sent me the script he had written, and I started tinkering with it, and the more I started playing with it, it soon ended up being a ground-up re-write of his code.

The script itself had a fairly simple logic to it: parse the Apache access log, find other pages that have referred visitors to this particular page, and create a link page with the number of referrals. When doing some debugging, though, something odd happened. I was dumping the output of the raw log to the browser, and all of a sudden I was redirected to another web site!

Digging through the log file, I found the culprit: a site spider had put a chunk of JavaScript within <SCRIPT> tags to perform a redirect as its User Agent header. Obviously, someone had figured out that many people use Web-based log analysis tools, which will show you the user agents. I am grateful that the site which I was redirected to did not contain any malicious code of its own.

What made this attack extremely interesting to me is that it did not actually attack a particular piece of software, nor did it care what OS I used or anything else. All it needed was for someone to run code that did not validate data. Indeed, it is a very common developer misperception to assume that data, once it is in a database, is clean and does not need to be validated on its way out of the database. This is the real lesson learned here. I can put all of the input validation I want into my program. But if someone else’s software also accesses the same database, and does not properly validate data, I might as well not be doing validation at all, if I assume that the data is valid when I use it.

This is another example of how ignorance or laziness on the programmer’s behalf can become a major catastrophe. Imagine if a piece of software had been written before JavaScript had been introduced, and was still in use? The programmer would not have even known to be able to prevent this kind of attack.

This is yet another reason why I am down on Web applications; it is the only system that I can think of in which input by one user is presented to another user in a way that the second user’s computer will parse and interpreted and maybe even execute the first user’s input, outside of the control of the developer. In thin client and desktop application computing, the programmer has total and complete control over the presentation layer and what occurs there. In Web application, the presentation layer is a complete no-man’s land. There is no telling what will happen there. Data that is good today may become dangerous tomorrow if some new technology gets added to the browser and creates a browser issue. One example would be to allow users to post videos online; if there is a buffer overflow problem in the user’s media player of choice, then you (the programmer) are giving malicious users a tool to attack other users. Web services are just as bad, particularly when using an AJAX method that takes your software out of the loop. In those situations, you do not even control the third party website. It could be riddled with problems, and you would not even know it until users are contacting you and asking why your software infected their computer with malware or crashed their computer completely.

At the end of the day, I was able to complete the script. Naturally, I made sure to strip any and all HTML, JavaScript, etc. from the input as it was being read. But it was a great reminder to me that no matter how many external parsers, validators, etc. that a piece of data goes through, they may not be providing the validation that my application requires. Input that is healthy, acceptable, and possibly even desirable for one program is not necessarily so for another program.