Captchas: Why We Need Them, How They’re Evolving, and How You Can Solve Them More Easily

As humans living in the modern world, we have to claim we’re not robots fairly often, and we’re not even living in some futuristic science-fiction dystopia. Regardless, whether you’re checking an “I am not a robot” box, teaching an AI what constitutes a “sign” (Hint: the “correct” answer is whatever the majority of other users have picked; try to think like the crowd), or solving math problems, the goal is always the same: stop bots from messing up websites, and maybe use the humans solving the captchas to digitize some books, train some image-recognition software, or generate some ad revenue. But there’s more to captchas than meets the eye, and they’re far from foolproof.

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart,” which, aside from being a truly elegant acronym, tells you most of what you need to know. The idea is, as Google’s reCAPTCHA motto goes, to create a task that is “Easy for people, hard for bots.”

captcha-turing-test

“Bot” generally refers to any program that is set to automatically complete some process, whether it’s posting news on Twitter or leaving spam in website comment sections. Used correctly, these programs are fairly useful, but they can also be used to generate useless/ad-ridden/malicious content, overwhelm a site with signups, rig online poll results, scrape email addresses, or do any number of other unpleasant things. It’s just best not to let them in.

captcha-distorted-text

If you’ve been around the Internet for a while, you’ll remember that for most of the 2000s, the most common captcha type was a strip of distorted text with some string of alphanumeric characters in it. This is no longer a very secure form of captcha, but when Google acquired reCAPTCHA in 2009, it was still good enough to get most bots. Since then, Google has switched to the more secure “I am not a robot” boxes (which actually monitor behavior like mouse movement and browser information to check if you’re a bot) and image-recognition challenges. Audio-based captchas are still around, though, and they are surprisingly easy to break with speech-recognition software.

captcha-image-id

The image-recognition captchas have had their own set of issues, as they can be a little ambiguous for human respondents. As mentioned above, though, there is no right answer – since the computer doesn’t know which pictures are storefronts and which are schnauzers – it just accepts the majority human opinion as correct. If 75/100 humans decide to label a blurry picture of a mop as a schnauzer, the computer will assume that the mop is a schnauzer and will mark you wrong if you don’t label it as such.

But there are plenty of other captcha options as well, and they can get pretty creative. These are just a few of the ideas that have made it onto various websites.

The slide-lock captcha:

captcha-slidelock

The math problem captcha:

captcha-math

The drag-and-drop captcha:

captcha-puzzle

The image orientation captcha:

captcha-orientation

The logic/grammar captcha:

captcha-egglue

There are also some captchas you never see, such as the honeypot captcha, which involves adding an invisible field to a webpage, waiting for a bot to fill it out (humans won’t, as they can’t see it), and subsequently kicking the bot off. Then there’s Google’s “invisible captcha,” often paired with their box, which watches how you browse around a webpage (mouse movements, scrolling, clicking, general behavior) to see if it should give you an image-recognition captcha as a double-check.

You may not know it, but the cumulative hours you’ve spent proving you’re not a robot may have actually made a difference. reCAPTCHA, now Google’s captcha service, was originally designed by Luis von Ahn (now better-known as the founder of Duolingo), as a way to use wasted brainpower to digitize books. By presenting users with a scanned word from a book or newspaper, this system could both confirm a user’s identity and take a sort of opinion poll on what the word was. If enough people agreed on the word, the digitization system would accept the answer into the ebook version.

After implementing this system, it only took two years to digitize the entire Google Books library and the entire New York Times archive. By 2012 they switched over to using humans to input house numbers pulled from Google Street View.

captcha-digitization

In 2014, things took an ironic turn towards the robotic: image-recognition captchas. These work on the principle that machines aren’t very good at figuring out what’s in a picture, but, as described above, they’ve been pretty effective at training AIs to do just that. Since this captcha will eventually work itself out of a job, it is being phased out in favor of the less-visible behavioral/tracking-oriented ones.

As artificial intelligence, deep learning, and a host of other advancements come down the pipe in the next few decades, captchas are going to have to evolve as well. Most captchas in existence have already been cracked, and it’s only getting easier. Training a machine to read distorted-text captchas now takes about fifteen minutes. Perhaps the only thing left in the future will be biometric captchas (hope you like facial recognition scans!), or perhaps we’ll wake up and discover that the singularity has already been reached, and we were the bots the whole time.

Image credit: Chippee via bad google recaptcha house numbers

Leave a Reply

Yeah! You've decided to leave a comment. That's fantastic! Check out our comment policy here. Let's have a personal and meaningful conversation.