CAPTCHA

From Computing and Software Wiki

Revision as of 07:16, 9 April 2009 by Dangelsm (Talk)
Jump to: navigation, search

CAPTCHA is an acronym for Completely Automated Public Turing Test to Tell Computers and Humans Apart. Commonly, these tests take the form of images of scrambled text that a human is able to read, but current optical character recognition software cannot decipher. The most common use of a CAPTCHA is to protect web-accessible services from being abused by "bots".

Contents

Background

The term CAPTCHA was first coined by Luis von Ahn, Manuel Blum, Nicholas J. Hopper and John Langford in 2000. von Ahn, Blum and Hopper were all from Carnegie Mellon University, while Langford was from IBM. In their paper "CAPTCHA: Using Hard AI Problems for Security" [4], they introduced the theoretical concept of CAPTCHA and some examples of how they could be used. They described a CAPTCHA as "a cryptographic protocol whose underlying hardness assumption is based on an AI problem" [4]. This can be compared to standard key-based cryptography where the "underlying hardness assumption" is that factoring of large numbers is hard. Further, they concluded that a CAPTCHA is a win-win situation, as either the CAPTCHA remains unsolvable by computers and security is maintained, or it is cracked by a computer program and the field of artificial intelligence has been advanced.

Applications

In von Ahn, et al.'s 2000 paper, they gave some examples of how a CAPTCHA could be used.

Online Polls

In order to trust the results of any online poll, at the very least, only humans should be able to vote. Requiring a CAPTCHA before submitting a vote would ensure this.

Free e-mail Services

The free e-mail service offered by Yahoo! was one of the first uses of a CAPTCHA developed by von Ahn, et al. Free e-mail is just one example of an online service that is attractive to bots. As such, many bots try to sign up for as many of these accounts as possible to send spam anonymously. Using a CAPTCHA during the sign-up process prevents bots from signing up for accounts en masse.

Search Engine Bots

Sometimes one does not want a particular page indexed by a search engine. Although web pages can include a "noindex" value in a meta tag, this can easily be ignored by malicious indexers. If a page is only accessible via a CAPTCHA, search indexing bots would not be able to view the content.

Preventing Dictionary Attacks

A CAPTCHA can be used for a login system alongside a traditional password to avoid a bot trying to guess the password in a brute-force manner.

Weaknesses

A particularly poor CAPTCHA. This test is too difficult for the average user to solve in a reasonable about of time.
A CAPTCHA that has been successfully solved by a computer

Poorly Made CAPTCHA

A CAPTCHA can be described as poor in one of two ways. Either the test fails to be human-solvable in a reasonable amount of time, or it can be solved by a computer using current AI techniques.

Presented are two CAPTCHA that fall under the first category. The first image displays a CAPTCHA that requires the user to solve a difficult calculus problem in order to proceed. While this may successfully thwart a bot, it also prevents many legitimate users from using the web service. Likewise, the second example to the left CAPTCHA is simply unreadable by humans due to poor contrast.

The second category of poor CAPTCHA are those that can be solved by a computer, as it then fails to be a test that can tell computers and humans apart. To the right is an example of a program written by Casey Chesnut that successfully posted spam to 94 blogs in 10 minutes [2].

Although it can be said that these are examples of poor CAPTCHA, based on the very definition of a CAPTCHA these are not CAPTCHA at all. If a test is either not solvable by humans or solvable by computers, it is not longer a test that tells computers and humans apart

Another poor CAPTCHA. The text in this image is unreadable by a human.

Accessibility

Many CAPTCHA also suffer from poor accessibility. For example, each of the CAPTCHA shown here would be unusable by a blind user as they require the user to decipher text from a bitmap image. Text-only CAPTCHA are also equally inaccessible to users suffering from disabilities such as dyslexia. Some websites now provide an audio CAPTCHA as well, though these are sometimes equally difficult to understand by humans, or easier to crack with speech-recognition software [6]. As such, the W3C has recommended that low-volume, low-resource websites (such as blogs protecting against comment spam) replace CAPTCHA with spam-filtering heuristics [6].

Implementation

An example of reCAPTCHA, the currently recommended CAPTCHA implementation

The currently recommended CAPTCHA implementation is reCAPTCHA, developed by Carnegie Mellon University. It includes both a standard bitmap text CAPTCHA as well as an audio CAPTCHA for accessibility. As of present, the image distortion techniques used by reCAPTCHA are not computer solvable.

An interesting feature of reCAPTCHA is that the human-provided solutions are used to digitize old texts. Each word displayed in the CAPTCHA is taken from a scanned text. One of the words was not recognizable by optical character recognition (OCR), while the other was. The user is then asked to enter both words. The word that was recognized by OCR is used to grade the CAPTCHA, while the user's solution to the unrecognized word is used, together with other user's solutions of the same image, to digitize the text. reCAPTCHA is currently helping to digitize old books from the Internet Archive and old editions of the New York Times [1].

References

  1. Carnegie Mellon University. 2009. What is a CAPTCHA?.
  2. Chesnut, Casey. 2005. Using AI to beat CAPTCHA and post comment spam
  3. Luis von Ahn, Ben Maurer, Colin McMillen, David Abraham and Manuel Blum. 2008. reCAPTCHA: Human-Based Character Recognition via Web Security Measures. In Science.
  4. Luis von Ahn, Manuel Blum, Nicholas Hopper, and John Langford. CAPTCHA: Using Hard AI Problems for Security. In Eurocrypt.
  5. Luis von Ahn, Manuel Blum and John Langford. 2004. Telling Humans and Computers Apart Automatically. In Communications of the ACM.
  6. W3C. 2005. Inaccessibility of CAPTCHA.
  7. Willis, John M. 2008. Top 10 Worst Captchas.

External Links

CAPTCHA

Using AI to beat CAPTCHA and post comment spam

Digitizing Books One Word at a Time

--Dangelsm 03:12, 9 April 2009 (EDT)

Personal tools