John's Technical Blog: 11/01/2006

As I have posted on before, spammers are more and more attempting to use form posting methods to send emails through unsuspecting web hosts. The most dangerous form, of course, are posts through which the spammer injects code that is run invisibly on the server. This approach can often be foiled by verifying form post data and properly escaping characters before they are displayed anywhere.

I have added validation to all the forms on the web site for the main company I work for, but one spammer is still attempting to use a form. However, the only destination is an administrative e-mail box, so the spammer has expended a lot of effort to effectively annoy only a few people. We have been receiving about 100 spam messages a day in the box, so further action was needed to stop this spammer and any other who might follow after.

One common solution has been the "captcha," a graphic with hard-to-read text that only the human brain can theoretically interpret and relay back to the server to prove the post did not originate from a robot. For our purposes, however, this approach is undesirable since it makes forms less usable. I wanted to come up with a verification process that is invisible to humans and would be (nearly?) impossible for a robot to reproduce.

My solution is to create a checksum value that would consist of data from the server, from the client, and from the form that could be matched up before and after the post to "prove" it to be authentic. Taking a page from our login guidelines, I thought it also appropriate to have forms time stamped, further hindering the likelihood that the checksum could be duplicated, and would certainly make it impossible for a "good" post to be reused at a later time.

This first solution is in ColdFusion since that is what this company uses the most right now. I plan to follow shortly with a PHP example followed by a Java/JSP example.

Creating the Checksum

The following code creates the checksum value. Any number of variables can be used to create the checksum, with the only requirement being that the values at the time of validation have to be the same as the time of posting.

This code uses the following four values:

dateTime - The time the form was created, which allows us to validate that the form POST was submitted within a certain time frame.

remoteAddress - The client machine's address, which allows us to validate that the form POST arrived from the same machine that requested the original form. Specifically, this would thwart spammers who use Trojans to hijack unprotected computers and make them spam zombies.

serverName - This is the web server's domain name in this example. If your servers are inside a firewall, and/or if you have multiple, load balanced servers with "sticky persistence" (i.e., a client is guaranteed to always communicate with one particular server during its session), you might use either the server's IP address or a global variable specific to a given server.

formName - On sites that I program, each form is given a unique name. Originally, this was for statistical usage purposes, but it can also come in handy here. This is essentially a "password" for the form data, preventing a given set of validation code from attempting to validate data from a different form.

Overall, this system attempt to guarantee that communication about one form is transferred between a known host and a known client within a known time span. Any detected violation prevents validation from proceeding. Hashing this data using a technique like MD5 creates a checksum that should be virtually impossible to reverse-engineer. Even though spammers are persistent, they are more likely to hack someone else's less protected site than tackle this problem.

<cfparam name="form.dateTime" default="#DateFormat(Now(),'yyyy-mm-dd')# #TimeFormat(Now(),'HH:mm:ss')#">
<cfset remoteAddress = cgi.REMOTE_ADDR>
<cfset serverName = cgi.SERVER_NAME>
<cfset formName = "formSomeName">
<cfset form.checksum = hash("#form.dateTime#~#formName#~#remoteAddress#~#serverName#")>

The next piece of code is added into the form. The only two pieces of information in clear text are the dateTime and checksum values (now that I think about it, these two values could actually be encrypted and given different names to further enhance security - I will most likely do that in further form work). All of the other variables live on the server, and the spammer would have to know all this, including your separators and the exact sequence of variables used, in order to forge the checksum value (not likely without inside help or an outright takeover of your server).

<input type="hidden" name="dateTime" value="#form.dateTime#" />

<input type="hidden" name="checksum" value="#form.checksum#" />

On the validation side, you want to add the following code at the top.

<cfif cgi.REQUEST_METHOD IS NOT "POST" OR NOT isDefined("form.checksum")>
    <cflocation url="/" addtoken="No">
</cfif>
<cfset remoteAddress = cgi.REMOTE_ADDR>
<cfset serverName = cgi.SERVER_NAME>
<cfset formName = "formSomeName">
<cfset checksumTest = hash("#form.dateTime#~#formName#~#remoteAddress#~#serverName#")> 
<cfif checksumTest IS NOT form.checksum>
        <cfset errorMessage = "The form data was corrupted. We are sorry, but we cannot send your request at this time.">
</cfif>
<cfif isDate(form.dateTime)>
    <cfif checksumTest IS form.checksum AND DateDiff("n", form.dateTime, now()) GT 30>
        <cfset errorMessage = "The form data has expired. We are sorry, but we cannot send your request at this time.">
    </cfif>
<cfelse>
    <cfset errorMessage = "The form date was corrupted. We are sorry, but we cannot send your request at this time.">
</cfif>

If the errorMessages is set, you most likely want to halt further validation to prevent any harmful code from affecting anything you have missed validating. At this point you are already aware that something is amiss! This should not be your only line of defense, however. You still want to validate every field to make sure it contains data of the right datatype, length, and is free from characters that could lead to other hacks.

I discovered this quite by accident, and thought this might be helpful for those who find they need to set up different stylesheet rules for formatting in Internet Explorer 6 and IE 7. The crux of this solution is the CSS attribute selector (dentoned as square brackets [] in CSS). It turns out that IE6 does not understand attribute selectors, and, in fact, will ignore the entire associated rule (including rules with more than one selector).

To leverage this difference, two rules can be defined. A normal rule that will be rendered in IE6, and a duplicate rule containing an attribute selector that will be rendered in IE7. In the example below, I use the IE-specific "if" statement to show how you can further differentiate between IE and "other browsers." The solution below was tested in the Windows environment using IE6, IE7, Oprah 9, and Firefox 2.

If you use a different browser or environment, try the code out and let me know what you see.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>CSS hack to differentiate between IE 6 and 7</title>
    <meta name="author" content="John A. Marsh" />
    <meta name="copyright" content="&copy; 2006 ThreeLeaf.com" />
    <style type="text/css">
        P {
            background:green;
        }
    </style>
    <!--[if IE]>
        <style type="text/css">
            P {
                background:red;
            }
            P[],P {
                background:blue;
            }
        </style>
    <![endif]-->
</head>
<body>
    <p>Blue in IE 6, Red in IE 7, Green in non-IE browsers.</p>
</body>
</html>

John's Technical Blog

Thursday, November 30, 2006

Thwarting form spammers in ColdFusion

Creating the Checksum

CSS hack to differentiate between IE 6 and 7