Apps

Fight spam and bots in Web applications with CAPTCHAs

CAPTCHA is used to discern humans from computers to make sure someone (and not automated software or a bot) is really using the Web application. Here are implementations for use within your applications.

 

Differentiating between a human user and a computer is a common task for Web applications. The need for such differentiation is due to spam and automated software or bots. One approach to testing for a human user is CAPTCHA, which makes a user type what they see in an image to use some functionality in a Web application.

What is a CAPTCHA?

CAPTCHA is an acronym for the rather lengthy phrase "Completely Automated Public Turing test to tell Computers and Humans Apart." A CAPTCHA is a program that generates an image of letters and numbers that can be passed by most humans but not current computer systems. The user types the characters into a corresponding text box to pass the test. It provides simple yet practical security for various areas of a Web application.

Why use CAPTCHA?

The main goal of CAPTCHA is preventing automated software or bots from performing certain actions on a site. Sure, the automated code may access the site, but you don't want it posting comments (spam), creating user accounts, or placing orders. There are a variety of situations targeted by CAPTCHA, which include the following:

  • Site registration: A site can limit access to its registration system or page by using CAPTCHA as a gate to accessing it.
  • Comments: Sites that allow comments often have generated content that is obviously not entered by a user. For instance, sites such as Blogger use CAPTCHA to control access to posting comments.
  • Polls: CAPTCHA can help maintain the integrity of a poll by letting only humans participate.
  • Passwords: A common way to attack a site is via a dictionary attack where brunt force is applied to a password field in an attempt to guess it. CAPTCHA can be used to control access to the password field, thus leaving bots in the cold.

Availability

CAPTCHA has been around for some time, so there are various implementations for use within your applications.

If you want to write your own code for using CAPTCHA in an application, the CAPTCHA project site provides a set of guidelines, which instructs you on how to make it accessible and secure the image and script. Here is a good example using ASP.NET.

Accessibility

A main drawback and an initial complaint regarding CAPTCHA is the inaccessibility of such images to users with visual impairments. Some systems gain accessibility by offering a spoken version of the image text via an audio file. When using a technology like CAPTCHA, functionality to recognize users with disabilities as human is necessary. The W3C offers a great paper on the issues of accessibility in a technology like CAPTCHA.

Not bulletproof

CAPTCHA provides a security solution based on artificial intelligence, but it is not a perfect solution. It offers an easy way to thwart most attacks, but a determined programmer may be able to develop code to break through such a hurdle to gain access to site features. For instance, a recent ZDNet story describes how spammers attacked Microsoft's CAPTCHA -- again.

Programmers are persistent, so OCR software may be used to identify the text in an image and pass through the CAPTCHA gate. To counter such moves, new approaches to CAPTCHA are being developed, which include distorting images in a way that makes them unreadable by OCR software. It is a never-ending battle that goes back and forth because spammers are determined to circumvent the system.

Another way around a security mechanism like CAPTCHA is via social engineering. There have been many reports of sites offering free porn to users who key in the solution to a CAPTCHA, which is then used elsewhere to access a site. This is one example of employing humans to crack the code.

Developers continue to push technology to thwart attacks; one example is BaffleText, which offers an improved CAPTCHA.

Conclusion

A Web application is a funny beast -- you want users to visit and use the site, but you only want a certain type of user. For one thing, automated code or bots are usually not welcome -- especially with certain areas of a site like collection information. This is where a security technology like CAPTCHA is used to discern humans from computers to make sure someone is really using the application.

Have you used the CAPTCHA technology in your applications? If so, did you create your own solution or use a free or commercial offering? Has using the technology been successful in keeping unwanted users away?

Tony Patton began his professional career as an application developer earning Java, VB, Lotus, and XML certifications to bolster his knowledge.

----------------------------------------------------------------------------------------------------------------

Get weekly development tips in your inbox Keep your developer skills sharp by signing up for TechRepublic's free Web Developer newsletter, delivered each Tuesday. Automatically subscribe today!

About

Tony Patton has worn many hats over his 15+ years in the IT industry while witnessing many technologies come and go. He currently focuses on .NET and Web Development while trying to grasp the many facets of supporting such technologies in a productio...

13 comments
tgharrison
tgharrison

It doesn't work. Spammers and bot producers have found away around or through CAPTCHAs already. Check out Yahoo Messenger, it no longer a functional IM, they put in CAPTCHAs about 6 months ago.

Edmund
Edmund

CAPTCHA is a neat idea, unfortunately google's broken by the spammers (as per this link...) Oct 2 at Slashdot but apparently simple code isn't working. It still beats not doing anything...

Michael Kassner
Michael Kassner

I'm by no means an expert, but I've heard that this approach is totally ineffective. I love to know if that is the consensus of the members.

billmez
billmez

Since I believe most bots are not submitting the actual form on the web site but gleaming the fields and action url to submit automated responses. One method I use that has been pretty successful is to make the form page dynamic like asp and set a session variable that the form handler then tests for before accepting the submission. If it is successful the session variable is also deleted so the back button can't be used to continually resubmit the form without reloading the page. One caveat is to use meta no store no cache tags so the page is not cached on search engines where the submission would not be validated. While again not foolproof, it does weed out most crap.

john.morse
john.morse

sad to see the advocacy of such an outdated and intrusive solution, even when its implemented properly Captcha is the bane of users and the accessibility and usability issues alone should persuade any developer or designer to avoided the process at all costs, use reverse Captcha (As described in other comments) Blatant Blog plug http://blog.eduserv-psg.net/post/2008/08/Simple-IA---Captcha.aspx

walldorff
walldorff

Why bother with a captcha at all? When I make a form I put in 2 or 3 more fields with labels (and id and name in the INPUT tag) like "information" or "infotext" or whatever, providing those are not used in the visible form elements. With CSS I hide those fields from the user, but the spam bot defenitely will see them. Then the bot will put some text in it, and BINGO: we have a spammer. The validation routine on the server will then discard those forms. The more (hidden) fields you put in the form, the more likely the bot will fill at least one of those fields, especially when the id and name is very tempting :) Roland

Justin James
Justin James

There are a lot of CAPTCHAs which get cracked. Part of the problem is that PCs have very powerful CPUs in them now, to make a CAPTCHA which can't be "deduced" by software writers typically involves one that is nearly unreadable to a person too. The "trick" (if you can call it that) is to use one of the less popular ones out there that no one has bothered writing a crack for. Alternatively, there are CAPTCHA that don't use the "squiggly text" approach which can be more effective. Assira, for example, asks you to look at 10 pictures of cats and dogs and make a pick as to what kind of animals are in the picture. That is MUCH more effective than the squiggly text approach. J.Ja

dixiedi
dixiedi

I have used it as a developer but only when requested by the client. Personally I cringe when I scroll down the form and see one. I don't even have visual problems other than age and I spend more time trying to guess what that squiggly bunch of crap is than I do anything else on the site using it. I would be happy if I never saw another.

chris
chris

I just add a field and ask the user to type in a word that I display. it has eliminated all spam from my sites. I like the hidden method though as an idea. would be nice to make the interface super clean.

Justin James
Justin James

Roland - That is an interesting idea! I am wondering how screen reader technologies would interpret it, but I definitely think (offhand) that it is a good one and it sounds like it should be effective. J.Ja

walldorff
walldorff

Well, in your approach the user has to perform an extra action (typing in a displayed word). Besides: the spam bot can read that word... A simple spam trap like hidden fields does the same trick.

walldorff
walldorff

@Justin: as Jaqui pointed out, screen readers don't see hidden content. :) And yes, it's very effective! I tested this for some time, where I flushed the posted data of all the forms into a MySQL table, forwarding only the valid ones, off course. It gave me a good impression regarding the effectiveness and the valid/spam ratio. This ratio varies per website and depends on the content, time of year, etc. Hope this was helpful :)

Jaqui
Jaqui

have never had a problem with hidden content, since they only act on visible content.

Editor's Picks