CAPTCHA
From Computing and Software Wiki
(17 intermediate revisions not shown) | |||
Line 3: | Line 3: | ||
==Background== | ==Background== | ||
- | The term CAPTCHA was first coined by Luis von Ahn, Manuel Blum, Nicholas J. Hopper and John Langford in 2000. von Ahn, Blum and Hopper were all from Carnegie Mellon University, while Langford was from IBM. In their paper "CAPTCHA: Using Hard AI Problems for Security" [ | + | The term CAPTCHA was first coined by Luis von Ahn, Manuel Blum, Nicholas J. Hopper and John Langford in 2000. von Ahn, Blum and Hopper were all from Carnegie Mellon University, while Langford was from IBM. In their paper "CAPTCHA: Using Hard AI Problems for Security" [3], they introduced the theoretical concept of CAPTCHA and some examples of how they could be used. They described a CAPTCHA as "a cryptographic protocol whose underlying hardness assumption is based on an AI problem" [3]. This can be compared to standard key-based cryptography where the "underlying hardness assumption" is that factoring of large numbers is hard. Further, they concluded that a CAPTCHA is a win-win situation, as either the CAPTCHA remains unsolvable by computers and security is maintained, or it is cracked by a computer program and the field of artificial intelligence has been advanced. |
==Applications== | ==Applications== | ||
- | + | In von Ahn, et al.'s 2000 paper, they gave some examples of how a CAPTCHA could be used. | |
- | + | ===Online Polls=== | |
- | + | ||
- | + | In order to trust the results of any online poll, at the very least, only humans should be able to vote. Requiring a CAPTCHA before submitting a vote would ensure this. | |
- | + | ||
+ | ===Free e-mail Services=== | ||
+ | |||
+ | The free e-mail service offered by Yahoo! was one of the first uses of a CAPTCHA developed by von Ahn, et al. Free e-mail is just one example of an online service that is attractive to bots. As such, many bots try to sign up for as many of these accounts as possible to send spam anonymously. Using a CAPTCHA during the sign-up process prevents bots from signing up for accounts en masse. | ||
+ | |||
+ | ===Search Engine Bots=== | ||
+ | |||
+ | Sometimes one does not want a particular page indexed by a search engine. Although web pages can include a "noindex" value in a meta tag, this can easily be ignored by malicious indexers. If a page is only accessible via a CAPTCHA, search indexing bots would not be able to view the content. | ||
+ | |||
+ | ===Preventing Dictionary Attacks=== | ||
+ | |||
+ | A CAPTCHA can be used for a login system alongside a traditional password to avoid a bot trying to guess the password in a brute-force manner. | ||
+ | |||
+ | ==Weaknesses== | ||
+ | [[Image:Sin_captcha.jpg|thumb|A particularly poor CAPTCHA. This test is too difficult for the average user to solve in a reasonable about of time. [5]]] | ||
+ | [[Image:Aicaptcha.jpg|thumb|A CAPTCHA that has been successfully solved by a computer [2]]] | ||
===Poorly Made CAPTCHA=== | ===Poorly Made CAPTCHA=== | ||
Line 18: | Line 33: | ||
A CAPTCHA can be described as ''poor'' in one of two ways. Either the test fails to be human-solvable in a reasonable amount of time, or it can be solved by a computer using current AI techniques. | A CAPTCHA can be described as ''poor'' in one of two ways. Either the test fails to be human-solvable in a reasonable amount of time, or it can be solved by a computer using current AI techniques. | ||
- | + | Presented are two CAPTCHA that fall under the first category. The first image displays a CAPTCHA that requires the user to solve a difficult calculus problem in order to proceed. While this may successfully thwart a bot, it also prevents many legitimate users from using the web service. Likewise, the second example to the left CAPTCHA is simply unreadable by humans due to poor contrast. | |
- | + | The second category of poor CAPTCHA are those that can be solved by a computer, as it then fails to be a test that can tell computers and humans apart. To the right is an example of a program written by Casey Chesnut that successfully posted spam to 94 blogs in 10 minutes [2]. | |
- | + | ||
- | The second category of poor CAPTCHA are those that can be solved by a computer, as it then fails to be a test that can tell computers and humans apart. To the | + | |
Although it can be said that these are examples of ''poor'' CAPTCHA, based on the very definition of a CAPTCHA these are not CAPTCHA at all. If a test is either not solvable by humans or solvable by computers, it is not longer a test that tells computers and humans apart | Although it can be said that these are examples of ''poor'' CAPTCHA, based on the very definition of a CAPTCHA these are not CAPTCHA at all. If a test is either not solvable by humans or solvable by computers, it is not longer a test that tells computers and humans apart | ||
- | + | [[Image:Unreadable_captcha.jpg|thumb|left|Another poor CAPTCHA. The text in this image is unreadable by a human. [5]]] | |
===Accessibility=== | ===Accessibility=== | ||
- | Many CAPTCHA also suffer from poor accessibility. For example, each of the CAPTCHA shown | + | Many CAPTCHA also suffer from poor accessibility. For example, each of the CAPTCHA shown here would be unusable by a blind user as they require the user to decipher text from a bitmap image. Text-only CAPTCHA are also equally inaccessible to users suffering from disabilities such as dyslexia. Some websites now provide an audio CAPTCHA as well, though these are sometimes equally difficult to understand by humans, or easier to crack with speech-recognition software [4]. As such, the W3C has recommended that low-volume, low-resource websites (such as blogs protecting against comment spam) replace CAPTCHA with spam-filtering heuristics [4]. |
+ | |||
+ | <div style="clear:both;"></div> | ||
+ | |||
+ | ==Implementation== | ||
+ | |||
+ | [[Image:Recaptcha-example.gif|thumb|300px|An example of reCAPTCHA, the currently recommended CAPTCHA implementation [1]]] | ||
+ | |||
+ | The currently recommended CAPTCHA implementation is reCAPTCHA, developed by Carnegie Mellon University. It includes both a standard bitmap text CAPTCHA as well as an audio CAPTCHA for accessibility. As of present, the image distortion techniques used by reCAPTCHA are not computer solvable. | ||
+ | |||
+ | An interesting feature of reCAPTCHA is that the human-provided solutions are used to digitize old texts. Each word displayed in the CAPTCHA is taken from a scanned text. One of the words was not recognizable by optical character recognition (OCR), while the other was. The user is then asked to enter both words. The word that was recognized by OCR is used to grade the CAPTCHA, while the user's solution to the unrecognized word is used, together with other user's solutions of the same image, to digitize the text. reCAPTCHA is currently helping to digitize old books from the Internet Archive and old editions of the New York Times [1]. | ||
+ | |||
+ | <div style="clear:both;"></div> | ||
==References== | ==References== | ||
# Carnegie Mellon University. 2009. [http://recaptcha.net/captcha.html What is a CAPTCHA?]. | # Carnegie Mellon University. 2009. [http://recaptcha.net/captcha.html What is a CAPTCHA?]. | ||
# Chesnut, Casey. 2005. [http://www.brains-n-brawn.com/default.aspx?vDir=aicaptcha Using AI to beat CAPTCHA and post comment spam] | # Chesnut, Casey. 2005. [http://www.brains-n-brawn.com/default.aspx?vDir=aicaptcha Using AI to beat CAPTCHA and post comment spam] | ||
- | |||
# Luis von Ahn, Manuel Blum, Nicholas Hopper, and John Langford. [http://www.captcha.net/captcha_crypt.pdf CAPTCHA: Using Hard AI Problems for Security]. In ''Eurocrypt''. | # Luis von Ahn, Manuel Blum, Nicholas Hopper, and John Langford. [http://www.captcha.net/captcha_crypt.pdf CAPTCHA: Using Hard AI Problems for Security]. In ''Eurocrypt''. | ||
- | |||
# W3C. 2005. [http://www.w3.org/TR/turingtest/ Inaccessibility of CAPTCHA]. | # W3C. 2005. [http://www.w3.org/TR/turingtest/ Inaccessibility of CAPTCHA]. | ||
# Willis, John M. 2008. [http://www.johnmwillis.com/other/top-10-worst-captchas/ Top 10 Worst Captchas]. | # Willis, John M. 2008. [http://www.johnmwillis.com/other/top-10-worst-captchas/ Top 10 Worst Captchas]. | ||
+ | ==See Also== | ||
+ | [[Electronic Voting Systems]] | ||
+ | |||
+ | [[Cryptography in Information Security]] | ||
+ | |||
+ | [[Digital Signatures]] | ||
+ | |||
+ | ==External Links== | ||
+ | [http://en.wikipedia.org/wiki/Captcha CAPTCHA] | ||
+ | |||
+ | [http://recaptcha.net/learnmore.html Digitizing Books One Word at a Time] | ||
+ | |||
+ | [http://www.cs.cmu.edu/~biglou/reCAPTCHA_Science.pdf reCAPTCHA: Human-Based Character Recognition via Web Security Measures] | ||
+ | |||
+ | [http://www.captcha.net/captcha_cacm.pdf Telling Humans and Computers Apart Automatically] | ||
- | --[[User:Dangelsm|Dangelsm]] | + | --[[User:Dangelsm|Dangelsm]] 03:20, 9 April 2009 (EDT) |
Current revision as of 07:33, 9 April 2009
CAPTCHA is an acronym for Completely Automated Public Turing Test to Tell Computers and Humans Apart. Commonly, these tests take the form of images of scrambled text that a human is able to read, but current optical character recognition software cannot decipher. The most common use of a CAPTCHA is to protect web-accessible services from being abused by "bots".
Contents |
Background
The term CAPTCHA was first coined by Luis von Ahn, Manuel Blum, Nicholas J. Hopper and John Langford in 2000. von Ahn, Blum and Hopper were all from Carnegie Mellon University, while Langford was from IBM. In their paper "CAPTCHA: Using Hard AI Problems for Security" [3], they introduced the theoretical concept of CAPTCHA and some examples of how they could be used. They described a CAPTCHA as "a cryptographic protocol whose underlying hardness assumption is based on an AI problem" [3]. This can be compared to standard key-based cryptography where the "underlying hardness assumption" is that factoring of large numbers is hard. Further, they concluded that a CAPTCHA is a win-win situation, as either the CAPTCHA remains unsolvable by computers and security is maintained, or it is cracked by a computer program and the field of artificial intelligence has been advanced.
Applications
In von Ahn, et al.'s 2000 paper, they gave some examples of how a CAPTCHA could be used.
Online Polls
In order to trust the results of any online poll, at the very least, only humans should be able to vote. Requiring a CAPTCHA before submitting a vote would ensure this.
Free e-mail Services
The free e-mail service offered by Yahoo! was one of the first uses of a CAPTCHA developed by von Ahn, et al. Free e-mail is just one example of an online service that is attractive to bots. As such, many bots try to sign up for as many of these accounts as possible to send spam anonymously. Using a CAPTCHA during the sign-up process prevents bots from signing up for accounts en masse.
Search Engine Bots
Sometimes one does not want a particular page indexed by a search engine. Although web pages can include a "noindex" value in a meta tag, this can easily be ignored by malicious indexers. If a page is only accessible via a CAPTCHA, search indexing bots would not be able to view the content.
Preventing Dictionary Attacks
A CAPTCHA can be used for a login system alongside a traditional password to avoid a bot trying to guess the password in a brute-force manner.
Weaknesses
Poorly Made CAPTCHA
A CAPTCHA can be described as poor in one of two ways. Either the test fails to be human-solvable in a reasonable amount of time, or it can be solved by a computer using current AI techniques.
Presented are two CAPTCHA that fall under the first category. The first image displays a CAPTCHA that requires the user to solve a difficult calculus problem in order to proceed. While this may successfully thwart a bot, it also prevents many legitimate users from using the web service. Likewise, the second example to the left CAPTCHA is simply unreadable by humans due to poor contrast.
The second category of poor CAPTCHA are those that can be solved by a computer, as it then fails to be a test that can tell computers and humans apart. To the right is an example of a program written by Casey Chesnut that successfully posted spam to 94 blogs in 10 minutes [2].
Although it can be said that these are examples of poor CAPTCHA, based on the very definition of a CAPTCHA these are not CAPTCHA at all. If a test is either not solvable by humans or solvable by computers, it is not longer a test that tells computers and humans apart
Accessibility
Many CAPTCHA also suffer from poor accessibility. For example, each of the CAPTCHA shown here would be unusable by a blind user as they require the user to decipher text from a bitmap image. Text-only CAPTCHA are also equally inaccessible to users suffering from disabilities such as dyslexia. Some websites now provide an audio CAPTCHA as well, though these are sometimes equally difficult to understand by humans, or easier to crack with speech-recognition software [4]. As such, the W3C has recommended that low-volume, low-resource websites (such as blogs protecting against comment spam) replace CAPTCHA with spam-filtering heuristics [4].
Implementation
The currently recommended CAPTCHA implementation is reCAPTCHA, developed by Carnegie Mellon University. It includes both a standard bitmap text CAPTCHA as well as an audio CAPTCHA for accessibility. As of present, the image distortion techniques used by reCAPTCHA are not computer solvable.
An interesting feature of reCAPTCHA is that the human-provided solutions are used to digitize old texts. Each word displayed in the CAPTCHA is taken from a scanned text. One of the words was not recognizable by optical character recognition (OCR), while the other was. The user is then asked to enter both words. The word that was recognized by OCR is used to grade the CAPTCHA, while the user's solution to the unrecognized word is used, together with other user's solutions of the same image, to digitize the text. reCAPTCHA is currently helping to digitize old books from the Internet Archive and old editions of the New York Times [1].
References
- Carnegie Mellon University. 2009. What is a CAPTCHA?.
- Chesnut, Casey. 2005. Using AI to beat CAPTCHA and post comment spam
- Luis von Ahn, Manuel Blum, Nicholas Hopper, and John Langford. CAPTCHA: Using Hard AI Problems for Security. In Eurocrypt.
- W3C. 2005. Inaccessibility of CAPTCHA.
- Willis, John M. 2008. Top 10 Worst Captchas.
See Also
Cryptography in Information Security
External Links
Digitizing Books One Word at a Time
reCAPTCHA: Human-Based Character Recognition via Web Security Measures
Telling Humans and Computers Apart Automatically
--Dangelsm 03:20, 9 April 2009 (EDT)