John's Technical Blog: 12/01/2006

In my previous post, I began developing the concept of creating a checksum for a form to help prevent spammers from abusing forms on web sites. A malicious spammer will first attempt to inject code into a form in an effort to send e-mails through the web site's host. Those attempts can usually be foiled by properly validating the form fields. However, a spammer may still inject their marketing materials into a form that appears to send an e-mail to someone, not really caring where that e-mail might wind up. If these e-mails are directed at a server administrator or the customer support staff, they can get very annoying, even if they are otherwise rendered harmless by the validation rules. The goal then is to try to ensure that a given form post if from a human being, and not from a robot. Visual and verbal "captchas" can be used, but that tends to annoy the human visitor, which we don't want to do.

The idea of using a checksum and a form timeout is one way a web developer might attempt to prevent a spammer from writing bots against forms. I am expanding on the idea of the checksum and will now call it a signature. This example is in PHP, and will work as-is, but I certainly encourage you to make your own variations.

In my previous post, I developed an example with two hidden fields which were used to validate the form post. In this example I do two things: 1) combine the timestamp and the hash to create one "signature" field, and 2) encrypt the timestamp to make it less obvious. These measures should make it more difficult for a robot to spoof posts to your forms. Note that with any encryption and hash scheme, it is possible to break the code, but the objective here is to make it so arduous that the spammer would rather look elsewhere for targets than your forms. My opinion is that it would probably be easier for someone to hack the server than this form validation technique, so one must not overlook hardening the rest of the server, of course.

Since you might use this code in several places, consider making it an include:

<?
// http://rc4crypt.devhome.org/
include_once 'cryptography.php';

function formSignatureCreate($formPassword){
    $formTime = array_sum(explode(' ', microtime()));
    $encryptedTime = encrypt($formTime, $formPassword);
    $hash = formSignatureHash($formTime, $formPassword . $encryptedTime);
    return $hash . $encryptedTime;
}

function formSignatureValidate($formSignature, $formPassword){
    $encryptedTime = substr($formSignature, 32);
    $formTime = decrypt($encryptedTime, $formPassword);
    $hash = formSignatureHash($formTime, $formPassword . $encryptedTime);
    if(!is_numeric($formTime)) {
        return false;
    }
    $currentTime = array_sum(explode(' ', microtime()));
    if($currentTime - $formTime > 1800) {
        return false;
    }
    if($_SERVER['HTTP_USER_AGENT'] == ''){
        return false;    
    }
    if($hash == substr($formSignature, 0, 32)) {
        return true;
    }
    return false;
}

function formSignatureHash($formTime, $formPassword){
    return md5($formTime . $_SERVER['REMOTE_ADDR'] . $_SERVER['HTTP_USER_AGENT'] . $_SERVER['SERVER_NAME'] . $formPassword);
}
?>

The encrypt and decrypt functions are wrappers I have placed around the rc4crypt code. You may use any encryption technique you want, of course, as I am only showing this as an example. These functions include a conversion of the encrypted string to base 64, so I do not have to deal with making the string web-friendly at this level.

For the time stamp ($formTime) , I have used the microtime() PHP function for simplicity, but you could use a formatted time as well, along with appropriate changes in the formSignatureValidate() function. This time stamp is encrypted with a password designated by the programmer. It is probably a best practice to have a unique password with each form, and since it is on the programming side, it can be a random sequence of characters.

The hash is generated in the formSignatureHash() function, which combines the following elements:

$formTime - The time the form was created, which allows us to validate that the form POST was submitted within a certain time frame.

REMOTE_ADDR - To tie the form post with the client's computer.

HTTP_USER_AGENT - To tie the form post with the client's browser.

SERVER_NAME - To tie the form to a given host server.

$formPassword - To tie the form POST to a given form.

Just to further confound any reverse-engineering, I combine the encrypted time with the given form password when the hash function is called.

Within the form, add a hidden field for the signature:

<? $signature = formSignatureCreate('thisFormPassword'); ?>
<input type="hidden" name="signature" value="<? print $signature; ?>" />

The way I have set this example up, a new signature would be created if other validation checks failed. This eliminates the need to validate the basic form (i.e., alpha-numeric) of the signature itself before printing it back into the form. We would not want someone attempting to inject code into an unvalidated signature variable!

In your validation section, add the signature check:

if(formSignatureValidate($signature, 'thisFormPassword')) {
    // other form validation
} else {
    // spawn error message
}

During validation, I make this my primary check. If the post does not pass the signature check, there is no reason to do any other validation, since it will always be rejected. In my error messaging system, I do not even display the input form again, forcing the visitor to click on a link to start the original form over. You may wish to restart a little more graciously with a blank form.

The formSignatureValidate() function first decrypts the time portion of the signature. In this example, the hash portion is always 32 characters long, so it is easy to pull the encrypted time off the end. Since I have used the microtime() function, it is easy to check to be sure the timestamp is 1) numeric and 2) within the 30 minute (1,800 second) time frame allowed.

I do an additional check on HTTP_USER_AGENT only because one of my sites was once attacked by a robot without a name. I don't bother checking during the signature creation phase, since it is really only important in the validation phase. If you know of specific rogue agents or IP addresses, those can be globally addressed in an .htaccess file or in the server configuration.

The final check is that the regenerated hash matches the hash at the beginning of the signature.

If all these checks pass, then it is considered a genuine post.

I have thought of a way to defeat this methodology. Someone could certainly develop a script to "screen scrape" the signature dynamically, but hopefully, a spammer will consider that too much effort. If the need arises, it is likely one could come up with a methodology to dynamically create the signature name as well. Time will tell. In the meantime, I hope this helps someone. I hope to develop a Java/JSP version of this in the near future.

John's Technical Blog

Saturday, December 30, 2006

PHP Form Signature