March 31, 2011

Did you know that you're helping Google decipher old books? + A chat with Suzette.

by

Every day 200 million people fill out a “Captcha” to prove that they are not a computer. You know what I’m talking about, these things:


Captcha stands for “completely automated public Turing test to tell computers and humans apart,” in reference to Alan Turing‘s famous test. The New York Times has a fascinating article about how reCaptcha uses this technology to help Google “authenticate text in Google Books, its vast project to digitize and disseminate rare and out-of-print texts on the Internet.” The Optical Character Recognition software used to “read” and digitize books “mess up or miss 10 percent to 30 percent of the words. Only humans can fix the errors.”

That’s where you come in. When you’re asked to fill out a Captcha to prove you’re a human one of the words is a “control” word where the answer is known. This is the word used to prove you’re a human. The second word is a word that has been “flagged” because different OCR programs have deciphered the words differently or their answer does not appear in the dictionary. By having you write the word, you help the program figure out what it is. By using this massive knowledge base, reCaptcha “achieves an accuracy rate above 99 percent, which compares favorably with professional human transcribers.”

And speaking of Turing tests, here’s a link to Suzette, the chatbot that won the 2010 Loebner Prize by convincing one judge (out of four) that “she” was a human. I asked for a book recommendation and got into a somewhat irritating but not completely inhuman conversation. It turns out Suzette has a very low opinion of physical books and is full of pop philosophy.

MobyLives