Showing posts with label filter. Show all posts
Showing posts with label filter. Show all posts

Friday, March 28, 2008

Post #1 on Why Spam Filters Suck "trickle blog" series



A Short History of Spam Protection

While methods have changed, spam continues to be the misuse of an open communication network for financial gain. What was once a harmless annoyance has led to serious conditions where high spam traffic can clog email servers to the detriment of legitimate mail.

How did we get here? And what can we change to solve the problem?

The first spam email ever was used to promote a seminar from Digital Equipment Corporation (DEC) in 1978. I'd call it spam because it was a mass emailing harvested from a printed directory of ARPAnet to recipients who had not requested any contact.

Spam didn't become a huge problem until around 2002 when there were enough active email users worldwide to make spamming profitable. In response, the first commercial and open source spam filters arrived in Brightmail, PureMessage, and SpamAssassin to name a few. The first generation of filters applied sets of rules to each message received, identifying features within messages which might indicate the likelihood
of being spam.

Spammers countered rule-based filters by obfuscating the content of their messages. Rather than sending a text message advertising Viagra, for example, the spammer might chop the message into small HTML pieces which, while unrecognizable to the spam filter, would still render into legible text for the message recipient. The rule-based filters added more rules to catch these obfuscations, causing the spammers to further innovate. This pattern of content obfuscation continues to the present day, the most recent example of which is probably MP3 spam (i.e. spam message contained in an audio file).

Anti-spam is one of those areas of IT where you're "damned if you don't." If email is flowing free of spam, you hear nothing. But when spam is getting through or emails are backlogged on the server, there's hell to pay.

Why is spam causing backlogs? Why is all mail treated equally? And do we need to keep adding what are effectively junk processing servers?

As the sophistication of spam has increased so has the need for processing power to analyze those messages. Today, with email servers under high traffic loads, the ever increasing computational cost and processing overhead of analyzing the content of every email often results in service disruptions for legitimate email. This has to change. IT infrastructure costs should be a function of legitimate activity not spammer driven loads.

To solve the loading problem imposed by the current method of spam filtering where all incoming email messages are accepted by the server, buffered in a common queue on a first-come first-served basis, there needs to be a shift away from a single-queue of email traffic towards a prioritized system that can expedite legitimate mail first.

But there's more that needs to be considered...

UPDATE: On the subject of the history of spam, Christopher Nickson writes that the word "spam" to describe unsolicited commercial email recently celebrated it's 15th anniversary.

NEXT: Post #2 Prohibition Induces "Botlegging"

Tuesday, February 26, 2008

Introducing "the Dip"

Our presentation at the recent MAAWG meetings focused on the effectiveness of Inbound Traffic Control in dealing with spam from unknown senders that represent most of drops seen in anti-spam effectiveness.

Two parts of the presentation really stood out with the audience, the second was a look at what a 98% capture-rate really means to an anti-spam lab.

Introducing "the Dip"



Despite 98% long term capture rates leading anti-spam systems experience significant drops in effectiveness when both sender and content are unknown, the most common times being the use of botnets, targeted campaigns not passing through a central lab and new spam approaches.

Any anti-spam lab worth its salt has a display that looks something like this graph in their lab showing their capture rate over time. Most of the time the capture rate is acceptably high, but once in a while – typically several times a day – the spam starts flooding through and then it’s all hands on deck while the lab figures out where that mail is coming from and how to plug the dike this time.
Sometimes the fix is elegant and long lasting, and sometimes its not.
The new technique can be network oriented or content oriented, and in either case the dip is what results.


From an end users and a service provider’s perspective you can flip this curve upside down and the dips become peak traffic loads, spam outbreaks, help desk calls and flooded inboxes.

Dips happen because anti-spam companies cannot have perfect insight into the spamming world.

  • It takes enormous visibility and time to turn a new attack into the actionable quantities of known content and known senders.
  • It takes the best filters 10 minutes to widely deploy a new filter rule capable of really making a dent in a new spam campaign.
  • The blacklists take between 15 and 30 minutes to set up and distribute a new IP block.
Wouldn’t it be great if we could make the unknown senders wait around for a while – at least until we’ve had a chance to set up a filter rule?

In fact we can, this is one of the benefits of Inbound Traffic Control, messages from unknown senders are forced to wait for better anti-spam information. Taking away the spammers head start.