At the height of its hype cycle, XML was supposed to solve the "interoperability problem," but in the end, only had a marginal level of success that was better than any other file format. In much the same way, many legacy spam detection techniques promised to rid us of much or all spam. Instead, they fell short of their promise and, in many cases, just did not work.

First, here are two detection methods that, when put into practice, their weaknesses simply outweighed their strengths.Graylisting
Let's look first at graylisting. When graylisting is implemented, each incoming email is categorized upon receipt as known or unknown based on sender IP, sender address and recipient address. A net new email (based on the triplet) is rejected and returned with a temporary delivery error to the sender. Generally, a legitimate mail server will attempt to resend a rejected email a second time, at which time the message is allowed to pass by the graylisting filter and is delivered to its intended recipient. Graylisting relies heavily on this initial interaction, trusting that a legitimate mail server will act as a persistent suitor, while a spammer's delivery tool will give up and go hit on someone else.

Herein lies the huge fault of graylisting - it assumes that the delivery tools used by spammers act differently than legitimate email servers. It was only a short time before spammers adapted their behavior to act like legitimate mail servers (think Exchange, Domino, XMail, SendMail). At this point, graylisting is rendered useless. Currently, a lot of spam is sent twice back-to-back, so that current graylisting filters will allow the second message through. Another drawback of graylisting is the risk of over-blocking valid messages if the sending mail server resends the message very quickly, which may confuse the filter.

If you can't prohibit them, why not challenge them? Realizing that spammers send spam in an automated fashion, the challenge-response method responds to questionable incoming messages with a required action. If the action is not performed, the message is not delivered. The idea is to automatically send an email response that requires human interaction, which cannot be performed by an automated spammer.

When announced, this technology seemed to be a 100-percent cure from spam. Practically, it has some very severe drawbacks that prevent its everyday usage:

· A lot of spam is sent from faked sender addresses. If their system is compromised, legitimate users are highly harassed by these emails.

· This method at least doubles email traffic, because for every spam a challenge has to be sent.

· If the spoofed sender responds with an equivalent challenge, communications can end up in an endless loop of response emails or can block the system if there is no automatic detection of such automatically generated responses (which could also open a backdoor for spammers).

· Action messages may also resemble phishing messages. Fearful of responding to a phishing email, a legitimate user may ignore the email, preventing his original message from being delivered. 

Let's also look at several detection techniques that only prove valuable when part of a complete multi-layered spam control and email security solution. Some of these techniques were originally touted as the best and most effective at one time, but they were quickly overcome individually, so they now only work in combination with other techniques.

Individually-trained bayes filters
Bayes filters are very powerful tools used to detect spam in the mail flow. These filters are based on a mathematical algorithm and, to be effective, must be trained over time with a combination of spam and legitimate email, as it manifests itself in an organization's environment. Its biggest strength - the customized tuning - is also the biggest disadvantage. The total cost of ownership can rapidly exceed benefits. A messaging administrator must regularly fine-tune the filter (often daily) and simple oversights or miscalculations can result in dangerous over-blocking.

Pure keyword lists
Keyword searches provide immense value in supporting compliance and internal security policies. For example, keyword searches can be customized to alert managers of any foul language used in email. However, using keywords to block spam is old-fashioned and inefficient. Realistically, pure keyword lists are less effective than a bayes filter but have similar drawbacks. Furthermore, a deep knowledge in regular expressions is needed to make it more efficient.

SPF/Sender/Caller ID/Reverse DNS
These techniques are all based on the attempt to backfill the inherent weaknesses of the SMTP protocol. It is these inherent weaknesses that make it easy to forge a sender email address. The biggest handicap of these techniques is the absence of a commonly accepted and widely used standard. As long as a filter cannot rely on the presence of a SPF (or similar) record, it cannot be reliably used. Moreover, a lot of spammers just insert SPF records for their domains, and thus are free to pass this filter.

SMTP address-based block/allow lists
This method also has a high total cost of ownership because the administrator has to manually maintain email address lists (e.g. of all business-related contacts). With evolving spammer techniques - like social engineering - it has become futile.

RBL servers work like DNS servers and list IP addresses of well-known spam servers. The usage of RBL servers, while still valuable, includes the risk of over-blocking, since spammers more and more use botnets to distribute spam. Often, whole IP ranges of service providers are listed on RBL servers because some of their customers are infected by such bots.

None of the techniques discussed above were able to solve the spam problem. All approaches that require administrative interaction or maintenance work have a high rate of failure because of fast-moving, targeted spam and the ingenuity of today's spammers. In reality, if the technique is easy to understand and implement, it is often also easy for a spammer to circumvent the barrier.

Only a multi-layered, proactive approach to spam control, using a combination of spam detection techniques, can effectively fight the spam threat.

Carsten Dietrich is director of content security for IBM Internet Security Systems