DevilTyper: A Game for Quantifying the Usability of CAPTCHA Tests

(1)

DevilTyper: A Game for Quantifying the Usability of CAPTCHA Tests

ABSTRACT

CAPTCHA test is an effective and widely used solution for preventing computer programs (i.e., bots) from performing automated but often malicious actions, such as registering thousands of free email accounts or posting advertisement on web blogs. To make CAPTCHA tests robust to automatic character recognition techniques, the text on the tests are often distorted, blurred, and obscure. At the same time, those robust tests may also prevent genuine users from telling the text easily and thus distribute the cost of crime prevention among all the users. Thus, we are facing a dilemma, that is, a CAPTCHA test should be robust enough so that it cannot be broken by programs, but also needs to be so that users need not to repeatedly take tests for their wrong “guesses.”

In this paper, we attempt to resolve the dilemma by propos- ing a human computation game for quantifying the usability of CAPTCHA tests. In our game, DevilTyper, players try to defeat the devils as many as possible by solving CAPTCHA tests, and player behaviors in completing a CAPTCHA test are measured at the same time. Therefore, we can quantify CAPTCHA usability by analyzing collected player inputs.

Since DevilTyper provides entertainment purposes itself, we can conduct a large-scale study for CAPTCHA tests’ usability without the resource overhead required by traditional survey-based studies. In addition, we propose a consistent and reliable metric for assessing usability. Our evaluation results show that DevilTyper provides a fun and efficient platform for CAPTCHA designers to assess their CAPTCHA usability and thus improve the CAPTCHA design.

General Terms

Design, Human Factors

Keywords

CAPTCHA, Games with a purpose, Human computation, Usability, Optical character recognition, Human perception

1. INTRODUCTION

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.

Preventing computer programs from performing malicious automated tasks, such as registering thousands of free email accounts, has been one of the most challenging tasks of system administrators. For this issue, CAPTCHA (Completely Automated Public Turing test to tell Computer and Hu- mans Apart) [10] is known as an effective and widely used solution. Although there are many types of CAPTCHA tests [6, 5], text-based CAPTCHA, which asks users to recognize the distorted text, is the most widely used and is adopted by most commercial companies, such as Google, Ya- hoo!, and Microsoft. For this reason, we focus on text-based CAPTCHA in this work. Text-based CAPTCHA tests are machine-generated images which contain obscure text. The common operations to generate such images often include distortions, overlapping, clipping, and noise addition. The purpose of these operations is to make image recognition algorithms unable to resolve the text inside the images. How- ever, the distortion of the text should be controlled to a reasonable level so that human can still tell the text clearly.

As many CAPTCHA tests have been broken by OCR (Opti- cal Character Recognition) or other recognition techniques [7, 4, 1], it is necessary to enhance the complexity of CAPTCHA tests to make them more robust against such attacks. How- ever, some enhancement operations make the CAPTCHA tests too difficult to be recognized by human, e.g., with too noisy background or too much text distortion. Therefore, we need to seek the balance of the tradeoff between the human usability and the computational recognition challenge when designing or employing a CAPTCHA test. Since there have been a great deal of works discussing the computational recognition difficulty of the tests in the field of computer vision, we focus on quantifying the usability of CAPTCHA tests in this paper.

The most intuitive way to assess the usability of CAPTCHA tests is to ask numerous human subjects to solve assigned CAPTCHA tests repeatedly. However, this survey method may be cost-prohibitive if a large-scale study is required and the investigated CAPTCHA tests are constantly updating.

For example, to investigate the effects of how different background noise affect the user perception would require a large number of user input data, which is either too tedious for volunteers or too expensive for user surveys.

For these reasons, a human computation game is an efficient and effective approach to harness a huge amount of human resources for a large-scale experiment. Several pre-

(2)

vious works, such as ESP game [11] and Peekaboom [13], have successfully integrated the exploitation of human computation and their own experiment purposes in games. From the perspective of players, they play a game because they desire to be entertained; on the other hand, game designers collect game logs and results to achieve their experiment tasks.

In this paper, we introduce DevilTyper, a web-based typing game for evaluating the human usability of CAPTCHA tests. From the evaluation results, we demonstrate that our proposed approach is reliable for quantifying the usability of CAPTCHA tests, and we also derive a rank list for CAPTCHA tests selected in this work.

2. RELATED WORK

In recent years, works in the field of CAPTCHA tests mainly focused on their robustness against automatic recognition techniques. Researches on computer vision [7] and machine learning [4] have shown the vulnerability of the CAPTCHA design. New methods for generating CAPTCHAs are also proposed to enhance the robustness of CAPTCHAs [5, 8].

However, only a few works studied the usability issues of CAPTCHAs. In [15], the authors discussed a variety of factors that should be discussed when designing CAPTCHAs.

[3] has examined the impact of different CAPTCHA techniques, including distortion and background noise. Through conducting two user studies, which consist of 76 and 29 users respectively, they calculate the user accuracy rate to measure the CAPTCHA usabilities under different distortion/noise settings. [14, 2] discussed the human familiarity of words and degradation of images in CAPTCHA tests, and [3] evaluate the usability and computer vision techniques in single character recognition of CAPTCHA tests.

The above works only focus on small-scale studies in usability of CAPTCHA tests or the tradeoff between usability and robustness against computer recognition techniques. In our work, we seamlessly integrate our research and an ap- pealing web-based typing game, which can help us collect a large amount of data while gamers obtain much pleasure from playing it. The concept of using game to collect large scale dataset is called Games With A Purpose (GWAP) [9].

In GWAP, users play games for fun and help the develop- ers to achieve their tasks at the same time. The concept of GWAP has been applied to many computational hard problems, such as image annotation [11], locating object in images [13], and commonsense collection [12].

3. CAPTCHA USABILITY EVALUATION 3.1 CAPTCHA Usability

CAPTCHA tests are designed to distinguish computer programs from human beings. While a CAPTCHA test should be hard enough for computers such that bots cannot break the test automatically, it should also be easy enough for humans in order not to stop genuine users from using the ser- vices. In an extreme case, a human-unsolvable CAPTCHA test would result in a totally unusable system.

In this paper, we focus mainly on text-based CAPTCHAs, which require users to recognize distorted text, since they

are widely used in most major websites, such as Google, Yahoo!, and Microsift. Text-based CAPTCHAs are favored for several reasons: 1) intuitive and few localization issues (users need only to type corresponding keystrokes), and 2) large problem space and well-studied OCR techniques (i.e., providing strong security).

In text-based CAPTCHAs, usability is usually considered as readability. To assess readability, we propose an intuitive method: give users text-based CAPTCHAs and ask them to type the correct results as soon as possible. By collecting the user inputs, we can calculate the accuracy and response time for solving a CAPTCHA test. However, this survey method may require a huge amount of monetary cost if we would like to conduct a large-scale survey constantly. For this reason, we adopt the concept of human computation, which uses fun as the incentive to engage players to complete specified tasks.

We propose an interesting game, DevilTyper, to collect user actions/responses in solving a CAPTCHA test.

3.2 The DevilTyper Game 3.2.1 Game Concepts

Currently, DevilTyper is designed as a single player game, in which players try to defeat devils appearing in the game by recognizing and typing the correct letters attached to each devil. When players enjoy in the game, we collect each keystroke of the players along with the correctness of each keystroke for evaluating the human readability of CAPTCHA tests. In order to obtain as many subjects and game logs as possible for a large-scale analysis, the game should be interesting enough to attract people and encourage players continue their play. In the following sections, we describe the design and implementation details of our game.

Figure 1: The CAPTCHA test is attached to the devil. Players can defeat the devils by typing the distorted text.

3.2.2 Game Description

In this game, the mission of a player is to destroy each devil generated as soon as possible. The only way to destroy a devil is to type the 3 to 5 character word attached to the devil, as shown in Figure 1. Therefore, players need to recognize the texts attached to the devils and type them out before devils come to hurt the player character.

Prior a game

After entering a game, a player is asked to choose the skill level he would like to play. We provide three levels: Hu- man, Hero, and Devil Buster, where the human level is the easiest level and the devil buster level is the most difficult one. From the easiest to the hardest level, the number of letters in each CAPTCHA test increases from 3 to 5, and the maximum number of devils at any time also increases.

(3)

Figure 2: The screenshot of the game. The devils move from the upside to the downside. Players will lost life points if the devils reach the bottom of the game screen.

Furthermore, each level is comprised of 10 mini-missions, each of which introduces different numbers of devils that the player needs to destroy to finish the mission. As the player passes a mission, the number of devils he needs to complete will increase in the next mission.

During a game

As shown in Fig. 1, devils move from the upside to the downside, and if a devil is not defeated by the player, it will keep moving down out of the screen and harm the player control. A devil is targeted when the player types the first letter of the CAPTCHA test accompanied by it. Once the player targets a devil, he is supposed to complete the re- mainder of the word to defeat it; otherwise, the player can release the target by pressing the SPACE key if he is unable to recognize the letters or want to defeat other devils first.

In the game, the player character is associated two impor- tant attributes: the score and the HP (health points), where the HP is represented by a rectangle life bar and should be higher than zero to keep playing. If a devil finally reaches the player character before being destroyed, the player is penalized by decreasing the HP value. On the other hand, the score accumulates if a devil is defeated. After defeating all the devils in a mission, the current mission is completed and then a new mission will be started after a short pause.

After a game

To motivate the involvement of players, the name and score of players with high scores will be shown on the high score list. From the perspective of players, the ultimate goal of the game is to defeat as many devils as possible and therefore achieve a high ranking and score. Also, different skill levels introduces different levels of challenge so that users would continue the play with higher levels and feel accom- plished. In [10], the authors mentioned that game features such as skill levels, score keeping, and high score list significantly increase the joyfulness of game play. As a result, we can achieve our study of CAPTCHA tests’ readability while having players entertained through our game.

Figure 3: The screenshot of the score board of Dev- ilTyper.

3.2.3 Implementation

To facilitate the game deployment, we implement DevilTyper using the Flash technology. The game is now publicly available¹. Currently we provide six different CAPTCHA types with different settings and use a plain text as a reference.

The word beside each devil is randomly selected from one type of these CAPTCHA tests. Notice that the player’s performance of plain text can serve as a baseline when evaluating the readability of various tests. We also log all players’

actions, including each key pressed by the player, the time of keystrokes, the correctness of each keystroke, and the aiming and release of a devil, to perform further statistical analysis.

4. EXPERIMENTS 4.1 Used CAPTCHAs

In our experiments, we provide six different types of CAPTCHA schemes in the evaluation and use a plain text test as a reference. As shown in Figure 4, the CAPTCHAs we provide are AuthImage, Captcher, Kiranvj, Secur Image, Cool- CAPTCHA, and TgCAPTCHA.

These CAPTCHA schemes adopt different methods to generate CAPTCHA tests, including different background noise and distortion. In addition to the analysis of different CAPTCHA types, we also explore how noise and distortion affect the CAPTCHA usability. We generate CAPTCHA tests in Cool- CAPTCHA and TgCAPTCHA with different levels of distortion and noise and record how users’ behaviors change.

The analysis is discussed in Section 6.

4.2 Performance Metrics

In previous studies, there is no standard metric for quantifying CAPTCHA usability. One of the most intuitive and commonly used metric is the accuracy, which is the ratio for users to successfully finish the CAPTCHA test. In Dev- ilTyper, we record all the actions taken by players and therefore have different performance metrics, including the following:

1http://mmnet.iis.sinica.edu.tw/proj/deviltyper

(4)

A B C D E F G Finish time 02004006008001000

(a) Finish Time.

A B C D E F G

Rate of typing error 0.000.050.100.150.20

(b) Rate of Typing Error.

A B C D E F G

Rate of timeout 0.000.020.040.060.080.100.120.14

(c) Rate of Timeout.

A B C D E F G

Rate of giving up 0.000.010.020.030.040.050.060.07

(d) Rate of Giving Up.

A B C D E F G

Rate of repeatedlly typing 0.000.050.100.150.20

(e) Rate of Repeatedly Typing.

Figure 5: CAPTCHA type comparions with different performance metrics. The notations of CAPTCHAs stand for (A) AuthImage, (B) Captcher, (C) Kinravj, (D) SecurImage, (E) Plane text, (F) CoolCAPTCHA, (G) TgCAPTCHA.

(a) AuthImage (b) Captcher

(c) Kiranvj (d) SecurImage

(e) CoolCAPTCHA (f) TgCAPTCHA Figure 4: CAPTCHA tests used in this work.

• Finish Time: the total time period to finish the CAPTCHA test.

• Rate of Typing Error: the number of CAPTCHA tests with wrong keystrokes divided by the number of all tests.

• Rate of Timeout: the number of timeout tests divided by the number of all tests. The test is regarded as

“timeout” if players did not type any character in 3 seconds.

• Rate of Giving Up: the number of tests given up di- vided by the number of all tests. Players can choose to give up the test before timeout.

• Rate of Repeatedly Typing: the number of repeated typing keystrokes divided by the number of all keystrokes.

In the game, players can choose to type the same character repeatedly to avoid timeout.

After collecting the player actions to the CAPTCHA tests, we quantify the CAPTCHA usability according to the above performance metrics. To show the reliability of the metrics, we also conduct a small scale user study on Amazon Me- chanical Turk and adopt the traditional accuracy metric.

4.3 Experiment Setup

To encourage user participations, we published the game on a popular social platform PTT² and held a four-week

2PTT has the largest social network in Taiwan (over 800,000 log-ins per day on average).

(5)

campaign. The campaign awards virtual money to top 5 players in each level and 5 randomly chosen players who complete the whole game for each week. These players are awarded 1,000 to 10,000 PTT money, which is about US$0.1 to US$1. The total cost of the campaign is about US$30.

In the four-week campaign, the DevilTyper game has been played for over 5000 times, and 1,407,055 CAPTCHA tests are collected.

5. CAPTCHA COMPARISONS 5.1 CAPTCHA Usability

First, we explore how different CAPTCHA types affect the usabilities by calculating their corresponding performance metrics. The results are shown in Figure 5. It is clear that the relative ordering of CAPTCHA usability under different performance metrics is consistent. Therefore, we can conduct a usability ranking among these CAPTCHAs. Se- curImage and Captcher take players longest time to solve the tests, their rates of typing error, timeout, giving up, and repeatedly typing are also the highest. Followed by the highest two CAPTCHAs are AuthImage, TgCAPTCHA, Cool- CAPTCHA, Kiranvj, and plain text.

5.2 Consistency of Performance Metrics

To demonstrate the consistency among different performance metrics, we normalize the results for each performance metric to the range between 0 to 1. For each performance metric, the highest value is set to 1, the lowest value is set to 0, and others are intercepted linearly. The normalized graph are plotted as shown in Figure 6.

A B C D E F G

Finish time Rate of typing error Rate of timeout Rate of repeatedlly typing Rate of giving up

0.00.20.40.60.81.0

Figure 6: Type statistics with normalized results.

Since the results of different performance metrics are consistent, for simplicity, we use “Rate of Typing Error” only as the evaluation metric for CAPTCHA usability in the following discussion.

5.3 Reliability of Performance Metrics

To evaluate the reliability of our performance metrics, we conduct the usability evaluation by crowdsourcing for com- parison. We publish a task, which requires users to complete 10 CAPTCHA tests, on Amazon Mechanical Turk³. The set

3Amazon Mechanical Turk is a crowdsourcing platform. De- velopers can publish tasks and pay small amount of money,

of CAPTCHA tests used in the Mechanical Turk experiment is the same as the set in DevilTyper. The task is accom- plished for 498 times by 44 workers, and 4,980 CAPTCHA tests are completed. After collecting these results, we calculate the typing error rate for each CAPCHA type and compare with the results in DevilTyper.

AuthImage Captcher Kinravj SecurImage TgCAPTCHA DevilTyper MTurk Normalized result

0.00.20.40.60.81.0

Figure 7: Error rate of CAPTCHA tests on Me- chanical Turk.

We also normalize the result in order for direct compari- son. As shown in Figure 7, the relative ordering of the rate of typing error of DevilTyper is consistent with the results collected from Mechanical Turk.

5.4 Effect of Characters in CAPTCHA

In addition to analyzing the usability of different CAPTCHA schemes, we explore the effects of CAPTCHA characters in this section. Each CAPTCHA scheme has its own algorithm to distort the displayed text and cause different recognizing efforts for different characters. For example, as shown in Figure 8, “i” is hardly recognizable in TgCAPTCHA since players would be confused to identify whether it’s a character or a random noise line. However, in CoolCAPTCHA, “i” is relatively easy to recognize. Understanding the effects of characters in different CAPTCHAs can help designers avoid the confusing characters and improve the system usability.

(a) “LITER”. (b) “BASIC”.

Figure 8: Examples illustrating the character effects in CAPTCHAs. While “i” is clearly recognizable in CoolCAPTCHA (right-hand side), it is often con- fused with the short lines in background noise in TgCAPTCHA (left-hand side).

To evaluate the usability of characters under different CAPTCHA schemes, we first show how players react for characters in plain text in Figure 9. As the graph shows, players have different typing time even there is no problem in recognizing e.g., $0.01 to get their tasks done. Workers, anonymous in- ternet users, would try to complete the tasks with reasonable pay and difficulty. The official website is http://mturk.com

(6)

characters. This result may be due to the keyboard layout or user typing habits. To eliminate these factors, we use the plain text as baseline and see the increase ratio for typing each character when different CAPTCHA schemes are applied. Assuming the error rate for character c in plain text is Ec, and the error rate for character c in CAPTCHA type X is Ec⁰, then the increase ratio is defined by ^E^c⁰_E^−E^c

c . The effects of CAPTCHA algorithms to different characters are shown in Figure 10.

6. CAPTCHA DESIGN FACTOR ANALYSIS

In this section, we demonstrate the analysis results of CAPTCHA usability under two common CAPTCHA techniques: distortion and noise. We generated CAPTCHA tests under different distortion/noise methods and levels. The player reactions are measured for quantifying the usability.

6.1 Distortion

We adopt CoolCAPTCHA, which is similar to Google CAPTCHA, as the target for testing the distortion features. CoolCAPTCHA provides three kinds of parameters for distortion:

X Interval

“X interval” is the distance between characters. In the experiment, we increase the parameter of “X Interval” from 0.8 to 1.3. The larger the parameter, the closer the character distance is. The example CAPTCHA tests are demonstrated in Figure 11. As the analysis result shows, the distance between characters slightly influences the CAPTCHA usability.

(a) Raw Text. (b) “X Interval” distortion.

0.80 0.85 0.90 0.95 1.00 1.05 1.10

0.000.020.040.060.080.100.12

X_Space vs. Error Rate

X_Space

Error Rate

(c) Error rate.

Figure 11: Example and results of CAPTCHA test with “X Interval” distortion.

X Period

“X Period” is the parameter for sine-wave distortion along the x-axis, as shown in Figure 12. In the experiment, we

increase the parameter of “X Period” from 0.5 to 1.2. The larger the parameter, the stronger the distortion is. The x-axis distortion does not make a significant difference in CAPTCHA usability. This might be also due to the limited range of parameter settings.

(a) Raw text. (b) “X Period” distortion.

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

0.000.020.040.060.080.100.12

X_Period vs. Error Rate

X_Period

Error Rate

(c) The error rate of CAPTCHAs with change of “X Period”

distortion.

Figure 12: Example and results of CAPTCHA test with “X Period” distortion.

Y Period

Similar to “X Period”, “Y Period” is the parameter for sine- wave distortion along the y-axis as in Figure 13. We increase the parameter of “Y Period” from 0.5 to 1.2. The results shows there is a difference of CAPTCHA usability when distortion of “Y Period” is applied to CAPTCHAs.

6.2 Noise

We adopt TgCAPTCHA, which is similar to previous Mi- crosoft CAPTCHA, as the target for testing the noise features. TgCAPTCHA provides three kinds of parameters for generating noise:

Long Arcs

The parameter of long arcs is the number of long arcs added in the background noise. The start position and end position of the arc is randomly chosen. In the experiment, we increase the number of long arcs from 0 to 5. The example of the CAPTCHAs applying long arcs is in Figure 14. There is no significant difference of the CAPTCHA usability when long arcs is applied to the background noise.

Short Arcs

The parameter of short arcs is the number of short arcs added in the background noise, as shown in Figure 15. In the experiment, we increase the number of long arcs from 0 to 20. The usability analysis shows the addition of short arcs significantly influences the CAPTCHA usability.

(7)

(a) Raw text. (b) “Y Period” distortion.

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

0.000.020.040.060.080.100.12

Y_Period vs. Error Rate

Y_Period

Error Rate

(c) The error rate of CAPTCHAs with change of “Y period”

distribution.

Figure 13: Example and results of CAPTCHA test with “Y Period” distortion.

(a) “Long Arcs” noise

0 1 2 3 4 5

0.000.050.100.15

SweepArcs vs. Error Rate

SweepArcs

Error Rate

(b) The error rate of CAPTCHAs with change of “Long Arcs”

noise.

Figure 14: Example and results of CAPTCHA test with “Long Arcs” noise.

(a) “Short Arcs” noise

0 5 10 15 20

0.000.050.100.15

LittleArcs vs. Error Rate

LittleArcs

Error Rate

(b) The error rate of CAPTCHAs with change of “Short Arcs”

noise.

Figure 15: Example and results of CAPTCHA test with “Short Arcs” noise.

Short Lines

The parameter of short lines is the number of short lines added in the background noise. The start position, end position of the arc, and length of the line are randomly chosen.

In the experiment, we increase the number of short lines from 0 to 20. The result shows that addition of short lines would influence the CAPTCHA usability.

7. CONCLUSION

In this paper, we design a human computation game, Dev- ilTyper, to evaluate the usability of CAPTCHA tests. Dev- ilTyper is fun to play: over five thousand times of play within four weeks, and collect large-scale data: over one million CAPTCHA tests are completed. Players play DevilTyper for fun and help us quantify the usability of CAPTCHA.

A consistent and reliable performance metric is proposed for DevilTyper to quantify usability. Evaluations have been conducted to rank different types of CAPTCHAs, analyze the effects of characters in different CAPTCHA schemes, and discuss different distortion methods.

Our method is especially useful when large-scale data is required, such as quantifying the usability in character-level, or discussing the effects of different methods for generating CAPTCHAs. DevilTyper is now public available at http://deviltyper.iis.sinica.edu.tw. CAPTCHA designers are welcome to send us their CAPTCHAs and get the usability results. We believe DevilTyper, as a open platform, can provide a testbed for CAPTCHA designers to assess their CAPTCHA usability and help develop more user-friendly CAPTCHAs.

8. REFERENCES

(8)

(a) “Short Lines” noise

0 5 10 15 20

0.000.050.100.15

LittleLines vs. Error Rate

LittleLines

Error Rate

(b) The error rate of CAPTCHAs with change of “Short Lines” noise.

Figure 16: Example and results of CAPTCHA test with “Short Lines” noise.

[1] PWNtcha: CAPTCHA decoder.

http://libcaca.zoy.org/wiki/PWNtcha/.

[2] H. S. Baird and T. Riopka. ScatterType: a reading CAPTCHA resistant to segmentation attack. In Proceedings of SPIE Document Recognition and Retrieval Conference, pages 16–20, 2005.

[3] K. Chellapilla, K. Larson, P. Simard, and M. Czerwinski. Designing human friendly human interaction proofs (HIPs). In CHI ’05: Proceedings of the SIGCHI conference on Human factors in

computing systems, pages 711–720, 2005.

[4] K. Chellapilla and P. Simard. Using machine learning to break visual human interaction proofs (hips). In Advances in Neural Information Processing Systems 17, Neural Information Processing Systems

(NIPS’2004). MIT Press, 2004.

[5] R. Datta, J. Li, and J. Z. Wang. IMAGINATION: a robust image-based CAPTCHA generation system. In MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia, pages 331–334, 2005.

[6] G. Kochanski, D. Lopresti, and C. Shih. A reverse turing test using speech. In Proc. of the Intl. Conf. on Spoken Language Processing (ICSLP ’02), pages 1357–1360, 2002.

[7] G. Mori and J. Malik. Recognizing Objects in

Adversarial Clutter: Breaking a Visual CAPTCHA. In Proceedings of Computer Vision and Pattern

Recognition Conference, pages 134–141, 2003.

[8] A. Rusu and V. Govindaraju. Handwritten

CAPTCHA: Using the Difference in the Abilities of Humans and Machines in Reading Handwritten Words. In IWFHR ’04: Proceedings of the Ninth

International Workshop on Frontiers in Handwriting Recognition, pages 226–231, 2004.

[9] L. von Ahn. Games with a purpose. Computer, 39(6):92–94, 2006.

[10] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford.

CAPTCHA: Using Hard AI Problems For Security. In In Proceedings of Eurocrypt, pages 294–311, 2003.

[11] L. von Ahn and L. Dabbish. Labeling images with a computer game. In CHI ’04: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 319–326, 2004.

[12] L. von Ahn, M. Kedia, and M. Blum. Verbosity: a game for collecting common-sense facts. In CHI ’06:

Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 75–78. ACM Press, 2006.

[13] L. von Ahn, R. Liu, and M. Blum. Peekaboom: a game for locating objects in images. In CHI ’06:

Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 55–64, New York, NY, USA, 2006. ACM.

[14] S.-Y. Wang and J. L. Bentley. Captcha challenge tradeoffs: Familiarity of strings versus degradation of images. In ICPR ’06: Proceedings of the 18th International Conference on Pattern Recognition, pages 164–167, 2006.

[15] J. Yan and A. S. El Ahmad. Usability of captchas or usability issues in captcha design. In SOUPS ’08:

Proceedings of the 4th symposium on Usable privacy and security, pages 44–52, New York, NY, USA, 2008.

ACM.

(9)

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z PlaneText

Rate of Typing Error 0100200300400

Figure 9: Player typing time for characters in plain text.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

AuthImage

The increase ratio for the CAPTCHA −0.10−0.050.000.050.100.15

(a) AuthImage.

Captcher

The increase ratio for the CAPTCHA 0.00.20.40.60.8

(b) Captcher.

Kiranvj

The increase ratio for the CAPTCHA −0.050.000.050.100.150.200.25

(c) Kiranvj.

SecurImage

The increase ratio for the CAPTCHA 0.00.20.40.60.81.01.2

(d) SecurImage.

CoolCAPTCHA

The increase ratio for the CAPTCHA 0.000.050.100.150.200.250.300.35

(e) CoolCAPTCHA.

TgCAPTCHA

The increase ratio for the CAPTCHA 0.00.10.20.30.40.5

(f) TgCAPTCHA.

Figure 10: Single character analysis.