asp tutorials, asp.net tutorials, sample code, and Microsoft news from 15Seconds
Data Access  |   Troubleshooting  |   Security  |   Performance  |   ADSI  |   Upload  |   Email  |   Control Building  |   Component Building  |   Forms  |   XML  |   Web Services  |   ASP.NET  |   .NET Features  |   .NET 2.0  |   App Development  |   App Architecture  |   IIS  |   Wireless
 
Pioneering Active Server
 Power Search





Active News
15 Seconds Weekly Newsletter
• Complete Coverage
• Site Updates
• Upcoming Features

More Free Newsletters
Reference
News
Articles
Archive
Writers
Code Samples
Components
Tools
FAQ
Feedback
Books
Links
DL Archives
Community
Messageboard
List Servers
Mailing List
WebHosts
Consultants
Tech Jobs
15 Seconds
Home
Site Map
Press
Legal
Privacy Policy
internet.commerce














internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

HardwareCentral
Compare products, prices, and stores at Hardware Central!

Fighting Spambots with .NET and AI
By Adnan Masood
Rating: 4.4 out of 5
Rate this article


  • email this article to a colleague
  • suggest an article

    Introduction


    This article explains how intelligent applications from Carnegie Mellon University and Berkeley researchers counter auto registration spam programs and how to build your own using ASP.NET & XML Web services.

  • download source code
  • view demo

    "We can only see a short distance ahead, but we can see plenty there that needs to be done."
    - A. M. Turing, Father of AI. British Mathematician (1912-1954)

    Scientific research in academia is tightly coupled with today's technological revolution. In this article we will discuss the design, development, and use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart). We experience various forms of CAPTCHAs in our everyday lives, for instance, signing up for an email account, performing DNS lookup (whois), or using images to differentiate between a person and a software program. All major vendors, Web portals, and email providers use CAPTCHA to improve their quality of service. Search engines and Web directories are utilizing CAPTCHA to avoid skews in their listings, possibly caused by autonomous rogue submission programs. Online polls use this technique to avoid multiple voting, as proxy addressing and/or IP spoofing makes it difficult to maintain the integrity of online polls. Protection from brute force or dictionary-based password attacks are also provided using this simple but effective practice.

    First I'll describe a short history of CAPTCHA and provide a definition of Turing's test and machine vision. Then I'll define how Yahoo!, AltaVista, PayPal, and other portals use the CAPTCHA approaches in various ways to protect their digital assets. Finally I'll explain how to write a program in ASP.NET to protect a Web application from autonomous bots. Apart from the theoretical discussion, I'll explain the code snippets for manipulating images in ASP.NET and C#. Three in-depth examples will cover dynamic image generation, dictionary-based CAPTCHA style imaging, and Web services to return such images. Besides CAPTCHAs, this article will enhance our knowledge about .NET imaging libraries, on the fly image generation, and serving binary data using XML Web services.

    CAPTCHA is an acronym for "Completely Automated Public Turing Test to Tell Computers and Humans Apart". As the name suggests, it's a test to distinguish the degree of being human. As defined on the CAPTCHA home page at the Carnegie Melon University School of Computer Science's Web site:

    CAPTCHA is a program that can generate and grade tests that
    • Most humans can pass.
    • Current computer programs can't pass.

    For instance, the following image, which is generated by the Web service we'll see later in this article, is difficult to be read by a computer program. However a seven year old can easily figure it out.

    Figure 1.1: A visual CAPTCHA generated by captchaWebservice (listing at end of article).

    With the exponential growth of services and businesses over the Internet, online security has become a real concern for software developers, architects, managers, and vendors. Software programs are written to impersonate human beings, mimic their surfing patters, and imitate online activities. These "pretending to be human" programs are referred to as robots or virtual agents. Imagine a software program performing a brute force attack (exhaustive search) on your e-mail address, an attack which requires trying all possible permutations and combinations of password values until the right one is found. Digital assets are at risk from spam bots on various fronts -- Web polls, Web registrations, automated services, and search engine submission to name a few.

    What is Turing Test?

    Turing test was introduced by Alan M. Turing (1912-1954) as "the imitation game" in his paper. This test is the foundation to determine if a computer program has intelligence or more precisely, can it make interrogator believe it's a human being when it is actually a machine.

    Turing test is described as "The new form of the problem can be described in terms of a game which we call the "imitation game." It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart from the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either "X is A and Y is B" or "X is B and Y is A." The interrogator is allowed to put questions to A and B."

    In spite of much criticism (Chinese Room Argument (Searle,1980), variant definitions of 'thinking' and 'intelligence' Turing test is an essential foundation approach in AI, philosophy and cognitive science.

    Courtesy CAPTCHA.NET. The CAPTCHA Project is a project of the School of Computer Science at Carnegie Mellon University. It is funded by the NSF Aladdin Center.

    This is where scientists felt automated testing to differentiate between humans and machines was necessary. Udi Manber, Yahoo's chief scientist, Manuel Blum and his graduate students at the School of Computer Science at Carnegie Mellon University developed CAPTCHA support for Yahoo! so a rogue routine using the HTTP POST to re-submit a form over and over again would not go through. It was a challenging problem for this group of intelligent people. Yahoo's chief scientist Udi Manber said "If you're in academia, you're always looking for interesting problems. If you're in industry, like me, you've got too many interesting problems." Yahoo protected several of their services including yahoo briefcase, yahoo mail and groups from automated registration abuse by introducing CAPTCHAs in it.

    An example of dictionary-based attack protection is shown below.

    Figure 1.2: Yahoo! Countering the brute force attack.

    As shown in the figure above, Yahoo! would not let a software program continue unless it fills in the image value. This image is dynamic and will be different every time. Even if the user ID and password are correct, a legitimate user can't login unless he provides the text in the image. This is how it differentiates between a human being and a computer.

    An average service exploiter or brute attack bot (short for robot, the autonomous agent) can't read this image. To read this it needs OCR (optical character recognition). Even if we consider an OCR with excellent image recognition capabilities, it will be difficult to read even this image's filename in HTML!

    http://reg.yimg.com/i/retcQ.dZFemtHS_cf_8Qk12i.XyVGZ2Ej2qW7dKNiIqt0C1AF6mlqmWnUuLe.jpg

    The filename is a long random string which contains the hash encoding of a challenge file so the string could be matched. We will discuss these details later.

    Also the HTML form contains the following hidden value regarding the challenge.

    <input type=hidden name=".challenge" value="c9gLhuwLilq7KGFDsNBjac2ZSvWL" 
    >
    

    Figure 1.3: Accessing Yahoo! CAPTCHA image.

    This image is stored on disk and can be accessed as shown in the figure above, but it's still difficult to read with an optical character recognizer. The affine transformations (skew, stretch, scale) has made this text difficult for an OCR, which weighs the neural output on the basis of pattern matching, to read. It's entirely different from how humans read. Humans just don't read text; they also have contextual background with it to pick it very clearly. This is not the case with machine vision. With letters "t" and "i" mingling together against the fuzzy background, blending gradient and noise, it will be very difficult for an OCR to read.

    Introducing images wasn't able to hinder the bots from automated registration for long. Bot writers took it as challenge and using optical character recognition techniques, such images were read, and automation has continued. But reading a simple text-based CAPTCHA image with a predictable grid was much easier than a skewed, twisted, and distorted image built to baffle the bots.

    It's difficult for machines to read noise, affine transformed characters (especially mirror effects and xy-sheering), segmentation, gradient, occlusion, degradations, etc. Machines don't think in a social context. The way they watch an image is like blind men and an elephant; they all will have separate interpretations. For increasingly intelligent bots, CMU created various kinds of tests, namely Gimpy, Bongo, Pix, Sounds and Byan.

    Gimpy
    They state gimpy as their most reliable system. Furthermore "It was originally built for (and in collaboration with) Yahoo! to keep bots out of their chat rooms, to prevent scripts from obtaining an excessive number of their e-mail addresses, and to prevent computer programs from publishing classified ads."

    It chooses a certain number of words from a dictionary, as our Web service and application at the end of this article does, and displays them corrupted and distorted in an image. It challenges users to type the text in the image, which humans can do and a bot can't.

    Pictures below demonstrate CAPTCHAs generated by gimpy.

    Figure 1.4: Images generated by Gimpy. (Courtesy CMU)

    Figure 1.5: A multiple word test by Gimpy. (Courtesy CMU)

    Bongo
    Bongo is a program that asks the user to solve a visual pattern recognition problem like the one below. Further details can be obtained from CMU CAPTCHA homepage.

    Pix
    This program selects random images with certain objects in common and asks users what is common among them. This novel and intelligent approach, I believe, will keep the bots baffled for quite some time. Figure below demonstrates how pix works.

    Figure 1.6: A picture recognition test by Pix. (Courtesy CMU)

    Sounds
    As its name suggests, this is an audio based version of gimpy. It randomly selects a word or numbers and generates a sound clip with quality degradation. To overcome an application like this, bots would have to be equipped with not only OCR but voice recognition as well.

    Breaking a Visual CAPTCHA

    In order to improve the quality, developers and scientists kept developing, testing, and breaking CAPTCHAs. It was just like making tough automated tests for Turing test machines. Greg Mori and Jitendra Malik of the University of California at Berkeley have written a program that can solve ez-gimpy with 83% accuracy (see Breaking a Visual CAPTCHA for details). Thayananthan, Stenger, Torr, and Cipolla of the Cambridge vision group have written a program that can achieve 93% correct recognition rate against ez-gimpy, and Malik and Mori have matched their accuracy.

    Gabriel Moy, Nathan Jones, Curt Harkless, and Randy Potter of Areté Associates have written a program that can achieve 78% accuracy against gimpy-r. We therefore consider the gimpy-r challenge to be broken. Congratulations to Gabriel, Nathan, Curt and Randy! More challenges will come soon.

    Before starting sections on industry usage and CAPTCHA's practical implementation, there is an interesting story about online polls on CMU's CAPTCHA home page that I'd like to share with readers.

    In November 1999, http://www.slashdot.com released an online poll asking which was the best graduate school in computer science (a dangerous question to ask over the Web!). As is the case with most online polls, IP addresses of voters were recorded in order to prevent single users from voting more than once. However, students at Carnegie Mellon found a way to stuff the ballots using programs that voted for CMU thousands of times. CMU's score grew rapidly. The next day, students at MIT wrote their own program and the poll became a contest between voting 'bots'. MIT finished with 21,156 votes, Carnegie Mellon with 21,032 and every other school with less than 1,000. Can the result of any online poll be trusted? Not!

    Industry Practice - CAPTCHAs in Action

    Below are some screenshots showing how CAPTCHAs are being used ubiquitously in Web applications to protect automated submissions and registrations.

    Yahoo! Signup

    Figure 1.7: Image: Yahoo Signup form using CAPTCHA

    Yahoo! Mail

    Figure 1.7b: Image: Yahoo mail sending form using CAPTCHA

    Yahoo! Briefcase

    Figure 1.8: Yahoo Briefcase: Yahoo briefcase enablement form using CAPTCHA

    AltaVista

    Figure 1.9: AltaVista: Search engine uses random characters image based protection to avoid automated spam registrations and therefore avoid skew in its listings.

    Hotmail

    Figure 1.10: Hotmail.com: Email service provider uses CAPTCHA to avoid automated registrations.

    In their FAQ's section under why do I need to type characters from a picture to register?, it says:

    "Typing the characters from a picture helps ensure that a person-not an automated program-is completing the registration form.

    This is important because attackers use harmful programs to try to register large numbers of accounts with Web services such as Passport. Attackers can use these accounts to cause problems for other users, such as sending junk e-mail messages or slowing down the service by repeatedly signing in to multiple accounts simultaneously.

    In most cases, an automated registration program can't recognize the characters in the picture."

    Hotmail goes a step further and introduces voice verification for disabled people and those having problems reading distorted images.

    Figure 1.11: Hotmail.com Hotmail provides support for the disabled so they can listen to the word.

    PayPal

    Figure 1.12: paypal.com: PayPal, the online transaction handling portal's protection for service abuse.

    Network Solutions

    Figure 1.13: Network Solutions Protection for Whois Lookup

    Figure 1.14. Network solutions whois check screenshot
    Network Solutions Protection for Whois Lookup - Entering the word Screen

    Figure 1.15: Network Solutions.com
    network solutions, the domain name registrant and various hosting services provider's protection for Whois Lookup

    Domain name Registrant (godaddy.com)

    Figure 1.16: goDaddy.com
    goDaddy, the domain name registrant and various hosting services provider's protection for Whois Lookup

    MSN

    Figure 1.17: MSN.com
    MSN portal protects its services abuse by CAPTCHA protection in signup forms.

    Parasoft Forums

    Figure 1.18: Parasoft.com
    Parasoft's portal protects its services by CAPTCHA protection in signup forms.

    Writing Your Own CAPTCHA Application -->

  • Rate This Article
    Not HelpfulMost Helpful
    1 2 3 4 5
    Supporting Products/Tools
    AspEncrypt
    Built around the Microsoft CryptoAPI, AspEncrypt helps you harness all major encryption and hashing algorithms such as DES, Triple-DES, RC2, RC4, RSA, MD5 and SHA1 in just a few lines of code. The component can be used in tandem with AspEmail to send encrypted and signed mail in the industry-standard S/MIME format, or with AspUpload to encrypt files as they are being uploaded. AspEncrypt can also be used to issue and manage X.509 digital certificates.
    [Top]
    AspPDF
    AspPDF is an ASP/ASP.NET component which enables generation and management of documents in PDF format. Features include advanced text formatting, font embedding, form fill-in, images, tables, content and page extraction, document stitching, encryption, digital signatures, and more.
    [Top]
    Other Articles
    Feb 3, 2005 - ASP.NET Mixed Mode Authentication
    In many web applications it is desirable for both intranet users and external parties to be able to seamlessly log onto the system. The problem this raises is that it is not easy to allow intranet users to log in via Windows integrated authentication while also allowing external parties to log in to the same application using standard forms authentication. This article will show you one way to achieve the best of both worlds when it comes to authentication.
    [Read This Article]  [Top]
    Dec 8, 2004 - Designing Role-Based Security Models for .NET
    In this article, Michele Leroux Bustamante discusses authentication, authorization and role-based security in .NET. Along the way, he provides some best practices for implementing role-based security in some typical .NET application scenarios including rich clients, Web applications, and Web services.
    [Read This Article]  [Top]
    May 11, 2004 - SharePoint Security and .NET Impersonation
    When implementing custom components that require access to restricted resources, implicit impersonation must be used. Jay Nathan shows how to create a class that makes using .NET Impersonation a snap.
    [Read This Article]  [Top]
    Mar 10, 2004 - Intellectual Property Protection and Code Obfuscation
    Learn about the execution process of CLR-based programs and how to protect your applications from being easily disassembled back into source code.
    [Read This Article]  [Top]
    Feb 24, 2004 - How to Send Secure Mail in ASP-Based E-Commerce Applications - Part II
    Businesses that utilize encrypted e-mail may find Secure Multipurpose Internet Mail Extensions (S/MIME) to be somewhat restrictive. This article shows how to use security features in PDF as an alternative to S/MIME.
    [Read This Article]  [Top]
    Jan 21, 2004 - Configuring .NET Code Access Security
    Code Access Security (CAS) is the .NET Framework security model that grants code permission to resources based on "evidence" pertaining to the encapsulating assembly. In this article, David Myers examines CAS and explains different configuration methods.
    [Read This Article]  [Top]
    Mar 10, 2003 - Platform Neutral and Transparent Encryption of Sensitive Customer Information
    Zhenlei Cai combines an open source C++ encryption library with SQL Server extended stored procedures to create a platform neutral, transparent encryption solution that resides at the database layer.
    [Read This Article]  [Top]
    Jan 15, 2003 - Exploring Machine.Config - User Security and More
    Christopher Spann offers a .NET configuration tip that should help ease system administrators' fears of security compromise and thus assuage growing developer demand for a .NET environment.
    [Read This Article]  [Top]
    Dec 10, 2002 - Encrypting Cookie Data with ASP.NET
    You don't have to be a cryptography expert or spend lots of money on third-party components to secure sensitive data in .NET. In this article, Wayne Plourde shows just how easy it is to encrypt cookie data using encryption classes in the .NET System.Security.Cryptography namespace.
    [Read This Article]  [Top]
    Aug 21, 2002 - Web Application Error Handling and Logging For ASP
    One of the most important aspects of an application is how well it responds to the user, and this includes response to errors. In this article, Adam Tuliper shares techniques for catching ASP errors and shows how to create a notification system that is sure to keep customers at bay.
    [Read This Article]  [Top]
    Mailing List
    Want to receive email when the next article is published? Just Click Here to sign up.

    Support the Active Server Industry