Sunday, February 11, 2007

On the Use of Captchas

In an earlier post, PHP Form Signature I developed a bit of code to render form attacking bots harmless. On my own web sites I did expand the code to create a random hidden form field with the signature information, and on all the web sites I have implemented the signature methodology the obviously scripted form attacks were completely thwarted. At the same time, there is no usability impact since the mechanism works behind the scenes, completely hidden from the site visitor.

However, as I predicted, one of my forms was spammed by a bot that simply harvested the form signature and reposted it along with its spam payload. Looking at the contents of the spam I immediately recognized that it was coded to work in a blog. This was not a targeted attack by someone who ran across a form on the site and developed a script to try to exploit it. Instead, this was a bot that was programmed to roam the web looking for any form that has a text area in it, and post there hoping that it will show up as a comment in someone's blog. This kind of attacker doesn't care whether any particular form submission works or fails, knowing only that there are enough unprotected blogs and similar public forums that will instantly display anonymous posts.

This is exactly the kind of thing captchas were developed to prevent. However, as I have stated before, a captcha places a stumbling block in the way of innocent site visitors who truly do want to communicate. I have been on several sites that use a captcha on every form submission, which gets very annoying. The dilemma, then, is how one tells the difference between a robot and a human. After all, a robot can post HTTP header information to make it look like the POST is coming from an ordinary web browser.

The main difference, that I can tell, is that nearly none of my human correspondents place links in their form submissions, but every one of the spam posts I have recorded contain one or more links. So, I have added to my form validation sequence a check for a link reference in posts. If it detects one, it re-presents the form with a captcha and prompts the visitor to fill in the letters they see in the image. A robot will never see that page, of course, but neither will their form POST be completely processed. The human visitor, who has likely seen captchas before, may still be annoyed with this interruption, but are much more likely to fill in the field as requested in order to complete their correspondence. In this way, I, as the web developer, protect my server and e-mail box, while providing a normal user experience to most of the visitors who use my forms.

For the purposes of this example, I am using Ed Eliot's Visual and Audio PHP CAPTCHA Generation Class. The particular example I am giving also involves a form that posts to its same address ($_SERVER['SCRIPT_NAME']). This means that the model, or processing code, first checks to see if the request is a GET (automatically resulting in the form being presented) or a POST (which initiates the validation cycle).

At the beginning of the model section of code, then, before any processing takes place, I have the following lines of code:


$useCaptcha = 1; // flag to indicate captcha can be displayed if needed
$requireCaptcha = 0; // flag to indicate that a captcha condition has been met
require_once('php-captcha.inc.php'); // Ed Elliot's captcha class


Before I run the validation code, I fill in the $captchaTriggers array with any string that should cause a captcha to be displayed. The following will capture all links and images pasted in the form fields. You can add other rules, of course.


$captchaTriggers[] = 'http';
$captchaTriggers[] = '<img';


As part of my validation, I loop through the expected field names (I will likely cover details of my validation procedure in a later post, but this should give you the general idea). I check all the text fields for any of the captcha triggers. If one is found, then it's position is added to the $requireCaptcha variable.


if($_POST[$thisFieldName] > ''){
    for ($j=0; $j< sizeof($captchaTriggers); $j++) {
        $requireCaptcha += strpos(strtolower(' ' . $_POST[$thisFieldName]), strtolower($captchaTriggers[$j]));
    }
...


At the end of the validation phase, I have the following code. If any of the captcha trigger strings was found in the above code, the value of $requireCaptcha will be greater than zero, which is one of the two conditions that need to be met ($useCaptcha being the second). If the conditions are met, the code then checks to see if the $_POST['captchaCode'] has been set, and if it validates against Elliot's captcha class. The first pass through, the captchaCode variable would not be present, of course, which then leads to the errorMessages array being set. On subsequent passes through, invalid captcha codes would continue to fail while a valid one will complete the processing (unless there are other validation errors, of course).


if($requireCaptcha > 0 && isset($useCaptcha)){
    if(!isset($_POST['captchaCode'])) {
        $_POST['captchaCode'] = '';
    }
    if (!PhpCaptcha::Validate($_POST['captchaCode'])) {
        $errorMessages[] = "Please enter the letters you see in the graphic.";
        $errorFields[] = 'captchaCode';
    }
}


All of my validation checks use the $errorMessages array, so my check on whether or not there were any validation errors is the determining factor on whether or not to display the form for corrections. In the form, I do another check for $requireCaptcha to determine whether or not to display the captcha image and field.


<? if($requireCaptcha > 0) { ?>
        <div class="formRow">
            <div class="formLabel"><label for="captchaCode"><img src="/assets/images/visual-captcha.php" width="100" height="40" alt="Visual CAPTCHA" /></label></div>
            <div class="formField"><input type="text" id="captchaCode" name="captchaCode" value="" style="width:10em;" maxlength="10" accesskey="c" tabindex="5" /></div>
        </div>
<? } ?>


Since I have implemented this code, I have not received any robot generated spam. One possible improvement is to set a cookie in the client browser when a captcha challenge is successfully answered. The cookie would essentially say "this user has already proven that they are human, so you do not need to challenge them in the future."

Hope this helps you!