Showing posts with label content-filters. Show all posts
Showing posts with label content-filters. Show all posts

Wednesday, April 30, 2008

Post #7 on Why Spam Filters Suck "trickle blog" series



Slowing Things Down

The problem is, typical email systems work in a queue. This means that high spam traffic clogs your network and crowds out legitimate mail. Botnets pour messages into your network, and mail servers receive the messages as quickly as they can. Next, the spam filter analyzes and tries to filter out any messages that appear to be spam.

Filters are effective at separating spam from email but do nothing to stop the rising volume of SMTP connections hammering the server. When spam traffic rises, the server becomes overloaded and results in delivery delays for all email, similar to how a backlogged exit ramp can impede the flow of traffic on a highway during peak hours.

Today, Internet facing email servers accept thousands of emails per minute. As spam volume increases, so too does the CPU required to process all that mail. The blunt solution is to scale hardware to keep up with volume but this is a one-to-one cost -- ­ the more volume, the more servers are needed.

The fact is spam filters aren't getting a whole lot more accurate, and it certainly doesn't help that blocking spam is a reactive approach­ -- a sender needs to be identified first before rules or signatures are updated. Filters will always be playing catch up with the spammers.

If you block based on reputation, what do you do when a new spam campaign breaks out and the sender has never been seen before?

What is needed is a way to get rid of the spam and prioritize legitimate mail without having to receive all the messages first or know who the bad senders are before hand.

To use the highway analogy, what if you could put good senders in an express lane and the spammers in the slow lane so that legitimate email can be delivered first?


NEXT: Post #8 Dealing a Blow to Spammers

PREVIOUS: Post #6 Blocking Spam in 2008

Monday, April 7, 2008

Post #3 on Why Spam Filters Suck “trickle blog” series



Once Promising Proposals for a Final Ultimate Solution to the Spam Problem (FUSSP)


"Two years from now, spam will be solved."


That was Bill Gates' famous pronouncement back in 2004. Microsoft, Yahoo and the open source community devised two techniques that they believed would eradicate spam. The first was sender authentication, which allowed email senders to provide a list of the servers permitted to send email for users within their domain. The idea was that sender authentication would eliminate spammers spoofing legitimate email addresses, and allow for the creation of a permanent, ironclad white list of trustworthy domains that never send spam, thus allowing recipients to simply block everything not on the white list and end spam forever.


Another idea pitched in 2004 was the computational challenge. Senders would, upon connecting to a receiving email server, have to spend considerable CPU cycles computing the answer to a mathematical challenge provided by the receiving server. Bill Gates believed this approach would stop spam by making it cost too much to send the high volumes of email required to make spamming profitable.


Unfortunately, neither sender authentication nor the computational challenge technique resolved the spam problem. Computational challenges were rejected as being too costly for legitimate bulk email senders (airlines, banks, open source mailing lists, etc.) And sender authentication while eventually enjoying wide-spread adoption in the form of DKIM and SenderID, proved prone to errors. As as result it has remained useful mostly for the acceptance of legitimate email and phishing protection rather than the rejection of spam.


By 2005, what the anti-spam community was getting right was content filtering. When spam filters had reached above the 90 per cent accuracy level, spam transitioned from a problem of content to a problem of volume, the spammers simply send more spam. And they can do this because the recipient pays the cost of content filtering rather than the spammer.


The cost of a resource-consuming filtering system increases during high traffic loads. If you block spam content, spammers will find new ways to get around it. Bill Gates was right, the only way to stop them is to create difficulty by making spam too costly to send. If you do spammers are left to find new targets that are easier to hit.


NEXT: Post #4 Spamonomics: The Economics of Spamming

PREVIOUS: Post #2 Prohibition Induces "Botlegging"

Thursday, November 22, 2007

Taking the text out of the Spam

For spammers, the trouble with image and video spam is that they have to ultimately give you information. Selling Viagra - you need to know where to buy it. Stock pump and dump - you need to know what stock you need to buy. So, leaving audio based spam aside for now, the text has to be given to you and it has to be readable, or it's of no use to the spammers. Text-based spam content-filtering has come a long way, so as long as we can extract the text from the images and video, we should be able to run that text through the existing text filters.

Many top websites use "Captchas". This is distorted text, designed to be unreadable by computers (used in automation by spammers), but is readable by humans. This is exactly what spammers are trying to do with image-spam. Whilst websites are using distorted image text to stop spammers, the spammers are using distorted image text to bypass email spam filters. The irony is that as spammers seem to be using social engineering and a little ingenuity to defeat catchas, image-based spam filters are still struggling. So what if we used spam images as captchas? Could we somehow use this to get spammers to unwittingly convert their images back to text?