Showing posts with label effectiveness. Show all posts
Showing posts with label effectiveness. Show all posts

Tuesday, June 24, 2008

Day Zero Anti-Spam?


Many of you will already be familiar with the concept of a Day Zero Virus attack. Whenever a new vulnerability is discovered, it's likely that never-before-seen malware, without existing signatures, will start to appear. Given the danger of new attacks, AV vendors have developed various Day-Zero Anti-Virus solutions. For example, one e-mail security vendor delays messages with executable attachments for a number of hours to allow time for new AV signatures to be propagated.

The Anti-Virus companies are very aware that new virus campaigns will emerge, without signatures. They have solutions in place. However, in the world of Anti-Spam I don't hear much discussion of new spam campaigns and what companies are doing to help protect their customer base against these attacks. A dip in effectiveness occurs when a new spam campign is launched and filters are not yet in place to block it i.e. Day Zero Spam! In February, we discussed the idea of "The Dip" with regard to AS effectiveness and I thought it worth further discussion.

Anti-Spam rules can be pre-emptive or reactive. For example, heuristic rules look for generic spam indicators in a message that could catch a small percentage of spam e-mail from new campaigns. However, spammers can easily setup drop boxes at many ISP's to confirm successful delivery of the e-mail, before commencing the campaign. Reactive rules respond to active campaigns by creating targeted rules. Collected samples are required to write the rules against.

Typically, an Anti-Spam Operations center will have visibility into spam attacks via the use of honey pots to collect samples, as well as end user missed spam submissions. There's a delay in the spam sample being reported to the operations center as it may take some time for the end user to report it. Also, the honey pot may not detect the message until long after the campaign has commenced. As the number of submissions to the center is huge, there's a delay before the sample is prioritized to be processed by automated or human rule writers. Finally, after the rule has been created, there's a delay in propagating the rule set to customers.

The scenario above is an optimistic one. In some cases, it may not even be possible to create an effective rule that doesn't result in an increase in false positives. Think back to the crippling image spam attack over a year ago. So much legitimate corporate mail had images such as the company logo attached. It wasn't easy to create rules. Anti-Spam effectiveness took a hit. Another example could be a customer in the Middle East using a US-centric Anti-Spam product. The operations center may not have enough visibility into localized samples of spam appearing in Arabic or Hebrew. The same can be said for customers in Asia.

For the most part, Anti-Spam vendors seem to keep very tight lipped on these deficiencies. Earlier this week, Cloudmark announced their new ActiveFilter. I should mention that they're a partner of ours and we ship Traffic Control with Cloudmark. It's pretty neat in that it actually scans the message store until the message is retrieved to see if any messages subsequently receive a spam verdict. The interesting thing is that this was the first time I've heard a major player in the AS market openly discuss the problem with new spam attacks:

The messaging security landscape has always been an arms race between attackers and anti-spam providers. In an effort to penetrate the inbox and reach their target audience, spammers and hackers are deploying extremely sophisticated techniques to evade spam filters. A current trend is to use botnets to send out huge volumes of rapidly-changing messages as quickly as possible. These bots can send millions of messages in under a minute. Given the intensity and speed of attacks, it’s no surprise that spam now constitutes more than 95 percent of all e-mail traffic and even with the most effective e-mail filtering in place, a small amount of spam will still find its way into e-mail inboxes––these are the messages spammers are banking on.


I'd love to hear how other Anti-Spam vendors are dealing with Day-Zero Spam Attacks? In the case of Traffic Control, we throttle never-before-seen connections until they build up a good reputation. A sender is guilty until proven innocent. Traffic shaping is agnostic to the message content. It doesn't matter whether the spam message hides its content in images or Google Docs, or even if it is targeted in a language for a specific geographic region. I don't believe in a silver bullet to combat spam in the short term, but I do believe in a layered approach. Use Traffic Shaping up front to protect the MTA, and a good content filter to further reduce spam.

Wednesday, April 16, 2008

Why anti-spam effectiveness testing sucks


InfoWorld have released a review of various anti-spam systems and along with that a comparison chart of effectiveness based on their long-term (2 week) testing of each of the systems. The report ends with the common issue of how to determine which one is the best given that there are multiple variables involved. Terry Zink has taken the results a step further and attempted to resolve the capture rate and false positive results to a single value. I agree that a single figure would help compare but it makes it even more important to get the underlying data right and to measure the right things. I think we need to consider variation in effectiveness as an overall more important measure of spam protection than capture rate.

Anti-spam effectiveness tests suck because:
a) nobody seems to be able to analyze and report statistics these days and
b) they test the wrong thing. Outbreak response time is the issue not long-term capture rates.

First lets talk about statistics. Initially I was going to rant about the general poverty of meaning in statistical reporting in terms of no standard deviations and excessive significant digits but then I realized that even the capture rate calculations are wrong. If you're going to go to all the effort of testing at least put some quality into your statistical analysis.

Looking at these results I see a wildly divergent volume of mail and spam being received by each of the anti-spam systems during their test period. The author reports that each of the systems received similar amounts of mail (13000~14000 messages) but that systems varied in the amount of messages they rejected at the connection level (using reputation filtering or DNSBL's) because they were spam. If that's true the results of this test are reported incorrectly because the dropped connections are not reported or factored into the spam capture rate.

If I'm barracuda and I drop 10,000 spam messages at the connection level and then another 1750 with content filtering thats a capture rate of 98% not 88%. It also means I'm doing a lot more to reduce load on the server since those dropped messages are never received and scanned. So the results are wrong, which is especially annoying since these results are going to be quoted and used in sales calls for the next 3 years and will affect some people's lives or at least livelihoods.

But I have a bigger concern with these tests, which are the same as every report on spam testing I've seen for the last 5 years of watching these things. The tests look at the wrong issue.

Spam is not a two week issue, it is a NOW issue. What matters is the amount of spam am I getting right now. How much of it is getting through my filters, hammering my email servers, annoying my users and filling up my archiving system.

If we want a single number or any measure it needs to be useful and long term capture rates are not very meaningful, especially when they are based on medium term tests.

What I want to know is what were the spammers doing during the time of each of those tests. Which vendors were hit with big new spam campaigns and which were sitting there during a lull in spam activity. Which were hit with a whole lot of new spam techniques during their test and which received all stale old spam campaigns anyone should detect.

We can't tell what was actually happening because all the data is rolled up into one nice neat number 9x.xxx% spam detection. A real world comparison of anti-spam effectiveness would measure the capture rate every 10 minutes, plot it and look at how often the capture rate dropped below some threshold, say 80% for the sake of argument, and then measure how long it took to recover back up to a 95% or so capture rate. That measure of the number of outbreaks that hit and the response time gives us a measure of the resiliency of the anti-spam system to new campaigns and the ability of the vendors labs to respond to those issues.

The key element of anti-spam protection is how organizations respond to new outbreaks, the sorts of outbreaks that cause the noticeable dips in effectiveness that in turn result in server load peaks, help desk calls and significant spam impacts. These are the spam concerns an ISP or an IT manager needs to plan for, not the ongoing general spam level which most people just put up with.

If we are comparing anti-spam effectiveness lets compare the systems capability to deal with the outbreaks not the ability to deal with the every day junk that most vendors get 95+% of.