Showing posts with label anti-spam. Show all posts
Showing posts with label anti-spam. Show all posts

Tuesday, June 24, 2008

Day Zero Anti-Spam?


Many of you will already be familiar with the concept of a Day Zero Virus attack. Whenever a new vulnerability is discovered, it's likely that never-before-seen malware, without existing signatures, will start to appear. Given the danger of new attacks, AV vendors have developed various Day-Zero Anti-Virus solutions. For example, one e-mail security vendor delays messages with executable attachments for a number of hours to allow time for new AV signatures to be propagated.

The Anti-Virus companies are very aware that new virus campaigns will emerge, without signatures. They have solutions in place. However, in the world of Anti-Spam I don't hear much discussion of new spam campaigns and what companies are doing to help protect their customer base against these attacks. A dip in effectiveness occurs when a new spam campign is launched and filters are not yet in place to block it i.e. Day Zero Spam! In February, we discussed the idea of "The Dip" with regard to AS effectiveness and I thought it worth further discussion.

Anti-Spam rules can be pre-emptive or reactive. For example, heuristic rules look for generic spam indicators in a message that could catch a small percentage of spam e-mail from new campaigns. However, spammers can easily setup drop boxes at many ISP's to confirm successful delivery of the e-mail, before commencing the campaign. Reactive rules respond to active campaigns by creating targeted rules. Collected samples are required to write the rules against.

Typically, an Anti-Spam Operations center will have visibility into spam attacks via the use of honey pots to collect samples, as well as end user missed spam submissions. There's a delay in the spam sample being reported to the operations center as it may take some time for the end user to report it. Also, the honey pot may not detect the message until long after the campaign has commenced. As the number of submissions to the center is huge, there's a delay before the sample is prioritized to be processed by automated or human rule writers. Finally, after the rule has been created, there's a delay in propagating the rule set to customers.

The scenario above is an optimistic one. In some cases, it may not even be possible to create an effective rule that doesn't result in an increase in false positives. Think back to the crippling image spam attack over a year ago. So much legitimate corporate mail had images such as the company logo attached. It wasn't easy to create rules. Anti-Spam effectiveness took a hit. Another example could be a customer in the Middle East using a US-centric Anti-Spam product. The operations center may not have enough visibility into localized samples of spam appearing in Arabic or Hebrew. The same can be said for customers in Asia.

For the most part, Anti-Spam vendors seem to keep very tight lipped on these deficiencies. Earlier this week, Cloudmark announced their new ActiveFilter. I should mention that they're a partner of ours and we ship Traffic Control with Cloudmark. It's pretty neat in that it actually scans the message store until the message is retrieved to see if any messages subsequently receive a spam verdict. The interesting thing is that this was the first time I've heard a major player in the AS market openly discuss the problem with new spam attacks:

The messaging security landscape has always been an arms race between attackers and anti-spam providers. In an effort to penetrate the inbox and reach their target audience, spammers and hackers are deploying extremely sophisticated techniques to evade spam filters. A current trend is to use botnets to send out huge volumes of rapidly-changing messages as quickly as possible. These bots can send millions of messages in under a minute. Given the intensity and speed of attacks, it’s no surprise that spam now constitutes more than 95 percent of all e-mail traffic and even with the most effective e-mail filtering in place, a small amount of spam will still find its way into e-mail inboxes––these are the messages spammers are banking on.


I'd love to hear how other Anti-Spam vendors are dealing with Day-Zero Spam Attacks? In the case of Traffic Control, we throttle never-before-seen connections until they build up a good reputation. A sender is guilty until proven innocent. Traffic shaping is agnostic to the message content. It doesn't matter whether the spam message hides its content in images or Google Docs, or even if it is targeted in a language for a specific geographic region. I don't believe in a silver bullet to combat spam in the short term, but I do believe in a layered approach. Use Traffic Shaping up front to protect the MTA, and a good content filter to further reduce spam.

Friday, June 20, 2008

Post #10 on Why Spam Filters Suck "trickle blog" series

Challenges with Throttling

Slowing traffic from spammers works well to decrease volume and contain infrastructure costs. It allows you to deal effectively with the large proportion of senders that are not yet listed in any blacklist by making spammers give up. The problem with slowing down spammers is that it increases the number of TCP connections to your email server.

One customer dealt with 100 connections at a time, but after traffic shaping, now sees upward of 1000 concurrent connections. This ten-fold increase in number of connections utterly destroys most email servers. To illustrate this problem, consider that it takes between two seconds to deliver an email message under normal circumstances. Slowing down a spam zombie causes the connection to last an average of 40 seconds. If a significant proportion of connections are lasting 30 times longer than normal, then the number of connections you have going on at any one time grows.



The graph above shows the number of SMTP connections being handled by a single server at a large university in the New York area. Noteworthy is that the number of concurrent connections hovers around 500. The red line represents the total number of connections. The green line indicates the number of connections that our traffic shaping is choosing to slow down.

Administrators running Sendmail or Postfix will note that 500 concurrent connections is a large number. The amount of memory required to handle 500 concurrent Sendmail processes, plus any associated spam filtering processes, is considerable. If we were passing this number of connections through to Sendmail, the email server would almost certainly become overloaded.

One approach to improve the scalability of email systems is to completely redesign your email server with a new highly scalable software architecture. But re-designing the email server is difficult, and changing the email system is a large commitment. An asymmetric SMTP proxy that we call real-time SMTP Multiplexing is designed to solve the scalability challenge posed by traffic shaping. The proxy accepts thousands of connections from the Internet and then multiplexes these connections onto a much smaller pool of connections with the existing email server. Unlike an email server, the Multiplexing proxy doesn't save messages to disk, which means it is a lot less complex and also doesn't consume much in the way of system resources.



This graph shows the number of connections to the email server of the large university mentioned previously. The red line indicates that the average number of connections with the email server hovers around 50, which is well within the amount a typical email server can handle. By multiplexing the SMTP connections, the system can achieve a 5:1 or 10:1 reduction in the number of connections the email server has to deal with. Moreover, reducing the concurrency of connections the email server has to deal with, enables the ability to reduce a large proportion of the incoming connections, getting rid of a great deal of spam traffic in the process.

To get updates, subscribe to the RSS feed (unsubscribe at any time).

Related Posts:

Friday, May 2, 2008

Post #8 on Why Spam Filters Suck "trickle blog" series



Dealing a Blow to Spammers

ISPs have recently been getting a lot of criticism for traffic shaping P2P file sharers. While we can argue over whether this is excessive or not, they have been doing this primarily for legitimate reasons, to reduce the impact of resource hogging users on the rest of their network.

The same technique can also have a positive impact on email, SMTP traffic shaping essentially puts shackles on email's heaviest users­ the spammers ­who have a voracious appetite for broadband capacity. Slowing down unknown senders causes the greatest harm for spammers who need to circulate their messages as quickly as possible. In fact during peak-load times, 90% of spammers go away after 10 seconds of being put in the slow lane.

Using traffic shaping, senders of spam are literally restricted from delivering packets to the network. This slowing down approach works by shaping the TCP connection and implements in a way similar to that of a network load-balancing device.

Unlike other traffic based spam protection, traffic shaping is not about putting limits on the quantity of emails from a sender (spammers can get around this easily by sending fewer emails per zombie). In comparison, true "shaping" literally slows down suspicious email delivery to a trickle (like 3 kbps) -- effectively stopping spam from flooding in and eliminating processing delays. Then senders with good reputation can be dispatched on a fast connection and given higher service priority.

The result is a clean mail stream of less than 25 per cent its original volume.

NEXT: Post #9 Real World Scenarios
PREVIOUS: Post #7 Slowing Things Down

Wednesday, April 30, 2008

Post #7 on Why Spam Filters Suck "trickle blog" series



Slowing Things Down

The problem is, typical email systems work in a queue. This means that high spam traffic clogs your network and crowds out legitimate mail. Botnets pour messages into your network, and mail servers receive the messages as quickly as they can. Next, the spam filter analyzes and tries to filter out any messages that appear to be spam.

Filters are effective at separating spam from email but do nothing to stop the rising volume of SMTP connections hammering the server. When spam traffic rises, the server becomes overloaded and results in delivery delays for all email, similar to how a backlogged exit ramp can impede the flow of traffic on a highway during peak hours.

Today, Internet facing email servers accept thousands of emails per minute. As spam volume increases, so too does the CPU required to process all that mail. The blunt solution is to scale hardware to keep up with volume but this is a one-to-one cost -- ­ the more volume, the more servers are needed.

The fact is spam filters aren't getting a whole lot more accurate, and it certainly doesn't help that blocking spam is a reactive approach­ -- a sender needs to be identified first before rules or signatures are updated. Filters will always be playing catch up with the spammers.

If you block based on reputation, what do you do when a new spam campaign breaks out and the sender has never been seen before?

What is needed is a way to get rid of the spam and prioritize legitimate mail without having to receive all the messages first or know who the bad senders are before hand.

To use the highway analogy, what if you could put good senders in an express lane and the spammers in the slow lane so that legitimate email can be delivered first?


NEXT: Post #8 Dealing a Blow to Spammers

PREVIOUS: Post #6 Blocking Spam in 2008

Monday, April 21, 2008

Post #6 on Why Spam Filters Suck "trickle blog" series



Blocking Spam In 2008

Like a shepherd, the duty of a bot herder (botnet operator) is to keep his/her botnet army intact. Bot herders make money by amassing a botnet, then contracting out the botnet services to spammers. That's right, spammers employ bot herders to do the dirty work for them!

Bot herders only get paid by the spammer when a message is actually delivered to the receiving email server. For those readers familiar with SMTP protocol, this means that the bot herder only gets paid once the server has sent 250 Ok after the DATA phase. In order to make a lot of money, bot herders have to send as much as possible in the shortest possible time. If a zombie is being blocked, the bot herder doesn't make any money. The bot herder only makes money when a message is actually received by the receiving email server.

Spamming software is impatient. In programming terms, spamming software has a very low timeout. The SMTP RFC recommends that email servers wait at least three minutes for each chunk of data they send to be received by the receiving server and acknowledged via a TCP acknowledgement packet. Furthermore, the RFC recommends that senders wait at least ten minutes for the final message delivery acknowledgement.

These long timeouts were established because in the early days of the Internet, the infrastructure was slow and unreliable, and the machines were easily overloaded, leading to frequent message delivery delays. Today, email servers and our networks are much faster at processing incoming messages in a matter of seconds. Delays still occur, but the
timeouts defined in the RFC are vastly higher than what is required in today's world.

Because bot herders don't get paid until they receive the 250 Ok, their software earns a higher profit by disconnecting after a few seconds and seeking out new victims whose servers respond more quickly. Bots can't afford to wait for a slow connection to go through, and they can't risk being discovered and put on a blacklist.

A few years ago, the MIT Spam Conference was a very interesting place. Each year, bright-eyed graduate students and intrepid industry types would present new filtering techniques that pushed the accuracy of spam filters to new levels. For the past three years, improvements in spam filter effectiveness has plateaued. A great result is a paper that shows the accuracy improvement of half a percent. Spam filtering has essentially become maxed out as a technology, and there isn't much more we can do but tweak rules to avoid falling behind the spammer's arms race.

Similarly, reputation systems which identify suspicious IP addresses have become asymptotic in their effectiveness. The spread of botnets has led to a virtually inexhaustible supply of new IP addresses, that spam us a few times and then disappear forever. Most of the large anti-spam companies now have comprehensive blacklists that are updated every minute.

In other words, anti-spam systems worldwide are blocking everything they possibly can. And yet spam continues to grow as a problem -- it's unbelievable. So what can we do?

Bill Gates was right in 2004. He boldly posited that the way to solve the spam problem was to introduce a cost barrier that caused spamming to be no longer profitable. Unfortunately, spammers created botnets, which have rendered to them more computing power than most governments. One way to think of the problem is that the spammers have millions of computers. You only have a handful. And you have to pay for yours. Who's going to win? While we can't win the spam war with better filters or better blacklists, there are alternatives.

To deter spamming we must undermine spammers, not simply block messages. You can make botnets unprofitable by slowing down SMTP traffic from spammers. This not only gives the receiver control of each email connection, but it also consumes sender resources to reduce the spammer's sending rate significantly.

Imagine the chaos at an airport without air traffic controllers and you begin to see why mail servers need email traffic control.

NEXT: Post #7 Slowing Things Down
PREVIOUS: Post #5 Why Are Botnets So Difficult To Stop?

Friday, April 18, 2008

Post #5 on Why Spam Filters Suck "trickle blog" series


Why Are Botnets So Difficult To Stop?

Definition: a "botnet" is commonly known as a network of infected computers used to send spam (among other actions).

The largest botnets contain hundred of thousands of "zombie" machines controlled by a "bot herder," who uses sophisticated encryption, infection and peer-to-peer (P2P) networking techniques to ensure the permanence and growth of the botnet. As the zombies are used, they become discovered and subsequently blocked. While individual zombies are constantly changing, the overall botnet and people who control them remain the same.

Because of botnets, spam does not come from a predictable set of computers rather, it comes from all over the place in a completely unpredictable manner. By leveraging the diversity of IP addresses available via botnets, spammers have rendered the blocking approach far less effective than it once was.

Further, as the number of broadband subscribers continues to grow ­ most rapidly in developing economies such as China and Eastern Europe the number of computers available to exploit for participation in botnets is expanding. As botnets increase in size and sophistication, trying to identify where the "bad stuff" is coming from is becoming less and less worthwhile.

Indeed, researchers at Georgia Tech discovered in 2006 in a survey of data from the Spamhaus black list that only 5 per cent of botnet IP addresses ever end up listed in the Spamhaus database. In another paper, the same researchers found that 85 per cent of spam zombies sent fewer than ten email messages to their honeypot server over the course of about 18 months, as shown in the above graph.

Example: A Transient Zombie

In late 2007, the zombie at 201.21.174.207 (a Brazilian broadband subscriber address) began sending approximately three spams each day into one of our honey pot systems. It took 19 days for the first real-time blackhole list (RBL) to identify this IP address and cause it to be blocked. By sending only a very light trickle of email, zombies can evade detection.

While blocking continues to be a core component of the multi-layered anti-spam architecture, it makes little sense in 2008 to depend on filtering technology designed to block spam in 2001 before the advent of botnets. Approaches that seek to block spam fail to deal with the issue of unknown senders.

NEXT: Post #6 Blocking Spam in 2008
PREVIOUS: Post #4 Spamonomics: The Economics of Spamming

Wednesday, April 16, 2008

Why anti-spam effectiveness testing sucks


InfoWorld have released a review of various anti-spam systems and along with that a comparison chart of effectiveness based on their long-term (2 week) testing of each of the systems. The report ends with the common issue of how to determine which one is the best given that there are multiple variables involved. Terry Zink has taken the results a step further and attempted to resolve the capture rate and false positive results to a single value. I agree that a single figure would help compare but it makes it even more important to get the underlying data right and to measure the right things. I think we need to consider variation in effectiveness as an overall more important measure of spam protection than capture rate.

Anti-spam effectiveness tests suck because:
a) nobody seems to be able to analyze and report statistics these days and
b) they test the wrong thing. Outbreak response time is the issue not long-term capture rates.

First lets talk about statistics. Initially I was going to rant about the general poverty of meaning in statistical reporting in terms of no standard deviations and excessive significant digits but then I realized that even the capture rate calculations are wrong. If you're going to go to all the effort of testing at least put some quality into your statistical analysis.

Looking at these results I see a wildly divergent volume of mail and spam being received by each of the anti-spam systems during their test period. The author reports that each of the systems received similar amounts of mail (13000~14000 messages) but that systems varied in the amount of messages they rejected at the connection level (using reputation filtering or DNSBL's) because they were spam. If that's true the results of this test are reported incorrectly because the dropped connections are not reported or factored into the spam capture rate.

If I'm barracuda and I drop 10,000 spam messages at the connection level and then another 1750 with content filtering thats a capture rate of 98% not 88%. It also means I'm doing a lot more to reduce load on the server since those dropped messages are never received and scanned. So the results are wrong, which is especially annoying since these results are going to be quoted and used in sales calls for the next 3 years and will affect some people's lives or at least livelihoods.

But I have a bigger concern with these tests, which are the same as every report on spam testing I've seen for the last 5 years of watching these things. The tests look at the wrong issue.

Spam is not a two week issue, it is a NOW issue. What matters is the amount of spam am I getting right now. How much of it is getting through my filters, hammering my email servers, annoying my users and filling up my archiving system.

If we want a single number or any measure it needs to be useful and long term capture rates are not very meaningful, especially when they are based on medium term tests.

What I want to know is what were the spammers doing during the time of each of those tests. Which vendors were hit with big new spam campaigns and which were sitting there during a lull in spam activity. Which were hit with a whole lot of new spam techniques during their test and which received all stale old spam campaigns anyone should detect.

We can't tell what was actually happening because all the data is rolled up into one nice neat number 9x.xxx% spam detection. A real world comparison of anti-spam effectiveness would measure the capture rate every 10 minutes, plot it and look at how often the capture rate dropped below some threshold, say 80% for the sake of argument, and then measure how long it took to recover back up to a 95% or so capture rate. That measure of the number of outbreaks that hit and the response time gives us a measure of the resiliency of the anti-spam system to new campaigns and the ability of the vendors labs to respond to those issues.

The key element of anti-spam protection is how organizations respond to new outbreaks, the sorts of outbreaks that cause the noticeable dips in effectiveness that in turn result in server load peaks, help desk calls and significant spam impacts. These are the spam concerns an ISP or an IT manager needs to plan for, not the ongoing general spam level which most people just put up with.

If we are comparing anti-spam effectiveness lets compare the systems capability to deal with the outbreaks not the ability to deal with the every day junk that most vendors get 95+% of.

Monday, April 14, 2008

When "Reputation Filtering" Fails, How to Deal with "the Unknown"


As discussed in a previous post, spammers are increasingly sending spam using legitimate email servers operated by free email providers such as gmail, Hotmail, and Yahoo!. While our data indicates that botnets remain the largest source of spam by a wide margin, the spam from legitimate servers nonetheless presents a difficult and significant challenge for so-called "reputation filtering" techniques.

When spammers abuse otherwise legitimate mail servers (such as those operated by Google, Yahoo, and Hotmail), the reputation of these servers declines. However, since a great deal of legitimate email is also sent from these servers, their reputation is rarely bad enough to be safely placed in the "known bad" category. In MailChannels' reputation system, legitimate servers like this fall into a grey area that we call, rather ominously, "the unknown." The unknown also includes servers about which we known nothing, or very little.

By increasing their use of legitimate mail servers, spammers are increasing the proportion of spam that originates from "the unknown" while consequently reducing the proportion that originates from "known bad" sources. As the volume of spam originating from "the unknown" grows, the effectiveness of spam filters that rely on blocking traffic from "known bad" sources will degrade. Since most anti spam filters rely on blocking for a substantial (typically 70% or higher) portion of their overall effectiveness, successful spam fighting will soon require either a large increase in filter accuracy (to get rid of the increased amount of spam making it past blocking systems), or something completely different.

But writing better filters is difficult. As we have discussed in our ongoing series of posts on "Why Spam Filters Suck," spam filtering requires substantial computational resources -- not to mention requiring the receipt and storage of email messages so that they can be scanned in the first place, and then more resources to archive filtered messages in a quarantine for later review by an unappreciative user base.

For these reasons, our preference is to reduce the need for filtering, and more to place more emphasis on intelligently dealing with traffic from the unknown.

Our approach to the unknown is simple: slow it down. Slowing down connections from mail servers in which we have an "unknown" (or perhaps "ambiguous") trust level allows legitimate traffic to continue flowing, while limiting the potential for abuse. It's not perfect, but hopefully with time the unknown will either clean up their act (i.e. become "known good"), or decline to the point that they can be blocked. Until a clear reputation emerges, slowing down traffic is really the best we can do.

There are different ways to slow down email traffic, but as with the Jedi, we believe there is only one true way. MailChannels' flavor of slowing down is called traffic shaping. Traffic shaping restricts the bandwidth available to connections from "the unknown" by altering the characteristics of the TCP connection and slowing responses to SMTP commands. The effect of traffic shaping is akin to simulating a very poor network connection, such as the connection you might get if you were using an old acoustic-coupled analog modem such as the one shown here.

Anti-spam vendors often mis-use the term traffic shaping, confusing it with simple techniques such as "rate limiting" (which involves reduces the number of new connections a sender is permitted to make in a given time period). Very few vendors actually offer traffic shaping, but most claim they do. Caveat emptor. To further the confusion, traffic shaping in the context of SMTP connections is sometimes referred to as "tar-pitting". I'll leave the Wikipedia perusal as an exercise for the reader.

MailChannels Traffic Control can safely shape SMTP connections down to as little as one byte per second. We track the effectiveness of Traffic Control around the clock via our reputation feedback service, and have collected substantial evidence that suggests that 90-97% of botnet originated spam disappears when connections are slowed to this extent. In fact, our data shows that fewer than 10% of botnet connections remain active after just 10 seconds of traffic shaping.

Traffic shaping has a dramatic effect on spam volume and consequently on overall mail system loads, because most of the spam never makes it into the filter. As if guided by Adam Smith's invisible hand, spambots are drawn to easier targets that will accept email quickly, and thus move on before completing message delivery.

Of course, Google's outbound mail servers are not operated by spammers. Most of Google's email users are legitimate, and therefore expect that Google's server will patiently persist as instructed to by the SMTP standard until their email gets through. So when spammers start sending via Google's gmail service, their spam gets the red-carpet treatment: patient outbound servers that will wait until even a very slow email server has finished receiving their payload.

Indeed, our data shows that legitimate sending servers will wait almost without exception a minimum of two minutes for message delivery to complete. Most will wait at least five minutes, and the standard recommends 10 minutes. That's an eternity when compared with the premature disconnection behavior of most spambots.

So is all lost for the effectiveness of traffic shaping?
No, for a couple of reasons.

First, legitimate senders such as Google, Yahoo, and Hotmail work very hard to kick spammers off of their networks. They do this to avoid having their mail servers traffic shaped or blocked - a situation which causes message delivery problems for their users and therefore severely degrades the usefulness and profitability of their email services. As spammers attempt to abuse these services more, we expect that they will place greater requirements on users to establish their own positive reputation before being allowed to send more than a very small amount of email.

Second, even though legitimate email servers are far more likely to complete message delivery over a traffic shaped connection, message delivery is nonetheless delayed by several minutes. This delay might not sound like much of an advantage to the receiver (delaying message delivery by a few minutes), but the advantage this delay provides to spam filters is substantial. Allow me to explain.

These days, spam filters almost universally rely on constant, around-the-clock updates from centralized databases that track the latest spam campaigns. The frequency of these updates varies greatly between vendors, with the very best vendors aiming for a two minute update frequency, and typical update frequency probably averaging around ten minutes.

By delaying the receipt of a spam message until the filter has been updated to recognize and reject that message, we gain a substantial filter effectiveness boost.

But what about truly legitimate traffic that is slowed down? Our position is that delays of a few minutes are seldom noticed by Internet users. If a few minutes' delay is the cost we must bear to gain a substantial improvement in spam filtering, most Internet users are willing to be patient -- at least, more patient than the spammers.

Friday, April 11, 2008

Post #4 on Why Spam Filters Suck "trickle blog" series



"Spamonomics": The Economics of Spamming

Spammers earn billions of dollars annually. The business is efficient, hierarchical, and organized. In much the same way that the global trade in narcotics involves every conceivable method of smuggling (from submarines to drug mules), the spam trade employs software engineers to develop increasingly sophisticated delivery technologies. Just as the drug trade will continue until the end of humanity, so too will the illegal delivery of spam.


To understand how spamming has become such an intractable problem, it serves to analyze the economics that drive spamming. Spammers make money if one in every 30,000 recipients makes a purchase. And given this response rate, a spammer advertising pharmaceutical products can expect to make roughly $5,000 per million email messages sent.


Finding out what it costs to send spam is not difficult: Botnet operators advertise their spamming services via online forums. One forum mentioned a price of $100 to send one million spam messages. If we assume that $100 is the cost per million spam messages, and $5,000 is the revenue, then the gross margin from spamming is approximately 98 percent.


Although some spam filters provide better accuracy than others, filter accuracy across the board is approximately 90 per cent, meaning that only one in ten spam messages reach a recipient. If global anti-spam effectiveness could be improved from 90 to 95 per cent, earning $5,000 from spamming would require sending 2 million spam messages, rather than 1 million. This increase in volume would reduce the spammers’ profit margin from 98 per cent to 96 per cent assuming sending costs remained constant. If global anti-spam accuracy reaches 99 per cent -- a figure that experts will tell you is nearly inconceivable given the innovative methods of spammers -- sending costs would reduce spamming margin to 80 per cent. Google is one of the world’s most profitable advertising companies with a margin of 25 per cent -- imagine 80 per cent? This is a business that won’t be going away any time soon.


Before botnets arrived, spammers could be stopped by blocking their IP addresses. DNSBLs like Spamhaus and Habeas block between 60-70%. With the introduction of botnets, blocking no longer provides a sufficient solution to the spam problem.


NEXT: Post #5 Why Are Botnets So Difficult To Stop?

PREVIOUS: Post #3 Final Ultimate Solution to the Spam Problem (FUSSP)

Wednesday, April 9, 2008

Sender Authentication, Gmail abuse, IPv6 ... Discuss!

Lately, I've been thinking about several related issues:

  • The challenges and effectiveness of sender authentication and reputation filtering.
  • The rise of Gmail spam and MessageLabs subsequent attempt to throttle it now that Gmail's Captcha is broken.
  • The issue of IPv6 reputation as raised by Cloudmark.
How are these issues related?

Anti-spam systems have steadily improved their ability to identify and block known spam senders.However, this is having a significant impact on the value of legitimate addresses.

Authentication, reputation systems, computational challenge, and traffic shaping share an “Achilles Heel.” They dramatically increase the value of hijacking legitimate servers. If the spammers hijack legitimate email servers or domains their messages will get through because they are now coming from legitimate senders. We see this all the time with spam from all sorts of legitimate sites but we've also seen a jump in spam from Gmail since their account creation Captcha mechanism has been cracked. What if all my mail is hosted on Gmail? How do recipients distinguish all these hosted senders? Can centralized reputation systems be expanded to track reputation at the individual sender level? Do we want them to?

As Cloudmark suggests in the interview, if we ever get to IPv6 , reputation will be compromised as far as spam protection goes. There will be so many addresses we'll be back to every spammer being an unknown sender. Reputation filtering will fail unless hard authentication is also widely adopted to enable recipients to reject mail not coming from known legitimate senders.

Along with increasingly aggressive treatment for unknown senders, spam protections will need to implement greater restrictions and careful scrutiny of webmail providers offering free accounts, especially those with automated account creation. There will also be a greater need for IT administrators to protect their systems from hijacking.

Tuesday, April 8, 2008

Anti-Spam Technology Adoption


In his comments on Post #3 of our trickle blog, TZink notes that Bill Gates would have been right about the 2-year time frame for stopping spam if the computational challenge and sender authentication measures had been widely implemented.

They were probably right. These steps would have worked but didn't because these techniques interfere with legitimate email delivery. Let's take a closer look at the challenges to implementing these and other anti-spam technologies by comparing the adoption of Sender Authentication and Reputation filtering.

To date, Sender Authentication has been limited in its deployment and usefulness by factors related to how legitimate organizations use email.

The first barrier to adoption is inconvenience. If we look at Sender Authentication today, most organizations implement soft-sender authentication. Soft-fail authentication basically says "Our email comes from these specific servers, OR it could come from anywhere else." This is not very different from the unauthenticated model of "Our email comes from anywhere" because I can't reject mail, even if it comes from somewhere other than the authenticated servers.

To enable rejection of messages, organizations should implement hard-fail authentication and state "All email from our domain originates at these servers." This is a good clear statement enabling the rejection of mail coming from elsewhere. Why don't more organizations do this?

One reason appears to be because it's inconvenient for their users if they enforce the use of their servers. Many end users dislike authentication because they must now setup their email client to send only via those servers, rather than change the "from" address and send from anywhere. That this process is perceived as a burden on senders is unclear to me, and it remains one of the barriers to implementation. [Disclosure: Mailchannels sets our records to soft-fail authentication and its unclear to me why]

A second barrier to adoption is the lack of incentive to do it. Unless I'm worried my email will get blocked when I send it, there is little value in configuring authentication for my servers. The value of authentication is mainly derived by the recipient - better and clearer information helps them decide whether or not to receive my email. However, people want to get their mail so authentication is configured on the (pragmatic) assumption that most servers don't have authentication setup. In this case, without authentication data, the recipient will normally receive the mail anyway. Since I can still deliver my messages without setting up authentication, why bother doing so. If measures like this are to be effective, the default action needs to be to enfocement. This would likely penalize most legitimate senders - hence adoption is slow. Yahoo and others have become more aggressive in their requirement for Authentication, the adoption has improved.

Another barrier to the adoption of authentication is that the value in taking the time to authenticate is perceived as low. Knowing that a person is who they claim to be is not in itself helpfulunless there is some measure determining whether or not that person is worth talking to. A driver's license is more useful to a police officer if they can also run your ID through a records search. It's not much good for me to know that yes, you are "Bob". If I want to do something with the information, it's better for me to know you are "Bob the known spammer."

But the number one reason for poor adoption is simple ... authentication on its own is useless for stopping spam.

Sender Authentication is only solves one aspect of email abuse, address spoofing. With SMTP any email can be sent from anywhere claiming to be from anybody. Sender Authentication enables the recipient to check whether a message was sent from a server belonging to the organization it claims to be from. The technique has proven effective against phishing attacks but spammers aren't impersonating anyone so sender authentication doesn't really help. What we get is mail from authenticated spammers.

I hate to be sounding like Ironport but, what is needed is reputation.

Sender Authentication could have stopped spam if everyone (or a large subset of everyone) agreed to register their servers or addresses with some central authority that could clearly identify the legitimate registered senders and be used to allow that mail through and block the rest. But who is going to be that authority? How will it be policed? Where will it operate? Can I trust it? What if there is more than one authority? Can I trust all of them? The internet was designed to avoid this sort of centralized control. It is pretty hard to get that cat back in the bag.

Instead of an agreed authority, what has arisen are third-party reputation systems that came along as an evolution of blacklists. These systems track the history of the senders they see in their traffic. They have been effective against spam because they identify the known bad addresses and block those. They also identify known good senders and allow those messages through. Each of these systems tries to be a central authority for email reputation. However, they don't work well with unknown senders because the senders don't have to register first. The systems don't have enough reputation information to stop the message. Each day, Botnets exploits the fact that it takes time to see a new address, and then give it a reputation score.

Reputation has been widely adopted where Authentication has not. The difference between them in terms of adoption are clear. Reputation does not inconvenience end users. There is incentive to implement reputation because it reduces load on servers. The value is high because it can be used it to make real decisions. Most importantly, it works to reduce a real pain.

In his comments on Post #3 of our trickle blog, TZink notes that Bill Gates would have been right about the 2-year time frame for stopping spam if the computational challenge and sender authentication measures had been widely implemented.

They were probably right. These steps would have worked but didn't because these techniques interfere with legitimate email delivery. Let's take a closer look at the challenges to implementing these and other anti-spam technologies by comparing the adoption of Sender Authentication and Reputation filtering.

To date, Sender Authentication has been limited in its deployment and usefulness by factors related to how legitimate organizations use email.

The first barrier to adoption is inconvenience. If we look at Sender Authentication today, most organizations implement soft-sender authentication. Soft-fail authentication basically says "Our email comes from these specific servers, OR it could come from anywhere else." This is not really very different from the unauthenticated model of "Our email comes from anywhere" because I can't reject mail, even if it comes from somewhere other than the authenticated servers.

To enable rejection of messages, organizations should implement hard-fail authentication and state, "all email from our domain originates at these servers." This is a good clear statement enabling the rejection of mail coming from elsewhere. Why don't more organizations do this?

One reason appears to be because it's inconvenient for their users if they enforce the use of their servers. Many end users dislike authentication because they must now setup their email client to send only via those servers, rather than change the "from" address and send from anywhere. That this process is perceived as a burden on senders is unclear to me, and it remains one of the barriers to implementation. [Disclosure: Mailchannels sets our records to soft-fail authentication and its unclear to me why]

A second barrier to adoption is the lack of incentive to do it. Unless I'm worried my email will get blocked when I send it, there is little value in configuring authentication for my servers. The value of authentication is mainly derived by the recipient - better information clearer information helps them decide whether or not to receive my email. However, people want to get their mail so authentication is configured on the (pragmatic) assumption that most servers don't have authentication setup. In this case, without authentication data the recipient will normally receive the mail anyway. Since I can still deliver my messages without setting up authentication, why bother doing so. If measures like this are to be effective the default action needs to be to enforce it, but that would penalize most legitimate senders - hence adoption is slow. Although, as Yahoo and others have become more aggressive in their requirement for Authentication, the adoption has improved.

Another barrier to the adoption of authentication is that the value is low. Knowing that a person is who they claim to be, is very low unless you have some measure of whether that person is worth talking to. A driver's license is more useful to a police officer if they can also run your ID through a records search. It's not much good for me to know that yes, you are "Bob". If I want to do something with the information it's better for me to know, yes you are "Bob the known spammer".

But the number one reason for poor adoption is simple ... authentication on its own is useless for stopping spam.

Sender Authentication is an only solves one aspect of email abuse, address spoofing. With SMTP any email can be sent from anywhere claiming to be from anybody. Sender Authentication enables the recipient to check whether a message was sent from a server belonging to the organization it claims to be from. The technique has proven effective against phishing attacks but spammers aren't impersonating anyone so sender authentication doesn't really help, what we get is mail from authenticated spammers.

I hate to be sounding like Ironport but what is needed is reputation.

Sender Authentication could have stopped spam if everyone (or a large subset of everyone) agreed to register their servers or addresses with some central authority that could clearly identify the legitimate registered senders and be used to allow that mail through and block the rest. But who is going to be that authority? how will it be policed? where will it operate? can I trust it? What if there is more than one, can I trust all of them? The internet was designed to avoid this sort of centralized control, its pretty hard to get that cat back in the bag.

Instead of an agreed authority, what has arisen are third-party reputations systems that came along as an evolution of blacklists. These systems track the history of the senders they see in their traffic and have been effective against spam because they identify the known bad addresses and block those, and identify known good senders and allow messages from those through. Each one of these systems tries to be a central authority for email reputation, but they don't work well with unknown senders because the senders don't have to register first and the systems don't have enough reputation information to stop the message. Every day, Botnets exploit the weakness that it takes time to see a new address and give it a reputation score.

Reputation has been widely adopted where Authentication has not. The difference between them in terms of adoption are clear. Reputation does not inconvenience end users, there is incentive to implement because it reduces load on my servers, the value is high because I can use it to make real decisions and most importantly it works to reduce a real pain.

Monday, April 7, 2008

Post #3 on Why Spam Filters Suck “trickle blog” series



Once Promising Proposals for a Final Ultimate Solution to the Spam Problem (FUSSP)


"Two years from now, spam will be solved."


That was Bill Gates' famous pronouncement back in 2004. Microsoft, Yahoo and the open source community devised two techniques that they believed would eradicate spam. The first was sender authentication, which allowed email senders to provide a list of the servers permitted to send email for users within their domain. The idea was that sender authentication would eliminate spammers spoofing legitimate email addresses, and allow for the creation of a permanent, ironclad white list of trustworthy domains that never send spam, thus allowing recipients to simply block everything not on the white list and end spam forever.


Another idea pitched in 2004 was the computational challenge. Senders would, upon connecting to a receiving email server, have to spend considerable CPU cycles computing the answer to a mathematical challenge provided by the receiving server. Bill Gates believed this approach would stop spam by making it cost too much to send the high volumes of email required to make spamming profitable.


Unfortunately, neither sender authentication nor the computational challenge technique resolved the spam problem. Computational challenges were rejected as being too costly for legitimate bulk email senders (airlines, banks, open source mailing lists, etc.) And sender authentication while eventually enjoying wide-spread adoption in the form of DKIM and SenderID, proved prone to errors. As as result it has remained useful mostly for the acceptance of legitimate email and phishing protection rather than the rejection of spam.


By 2005, what the anti-spam community was getting right was content filtering. When spam filters had reached above the 90 per cent accuracy level, spam transitioned from a problem of content to a problem of volume, the spammers simply send more spam. And they can do this because the recipient pays the cost of content filtering rather than the spammer.


The cost of a resource-consuming filtering system increases during high traffic loads. If you block spam content, spammers will find new ways to get around it. Bill Gates was right, the only way to stop them is to create difficulty by making spam too costly to send. If you do spammers are left to find new targets that are easier to hit.


NEXT: Post #4 Spamonomics: The Economics of Spamming

PREVIOUS: Post #2 Prohibition Induces "Botlegging"

Thursday, April 3, 2008

Post #2 on Why Spam Filters Suck "trickle blog" series



Prohibition Induces "Botlegging"

Spamming is a "tragedy of the commons," in which a finite resource (our time and attention) is abused at low cost by a minority (the spammers). Like many such tragedies in our human history, prohibition has been seen as the quick fix. Classic targets of prohibitionism include alcohol, drugs, and gambling. The idea is simple really. Stop spammers from profiting by making the actions illegal, enforceable and a harmful choice to the culprit. However, this kind of law is difficult to enforce.

In 2003, American legislators passed the CAN-SPAM Act (Controlling the Assault of Non-Solicited Pornography And Marketing). CAN-SPAM made it illegal to send unsolicited bulk email with a deceiving subject line and forced legitimate senders to identity themselves with a full mailing address.

So why then, does spam volume continue to rise despite an increased adoption of spam blocking mechanisms worldwide?

Several years have passed and spam volume is higher than ever. While CAN-SPAM is rightly criticized for not ending the spam problem, its most significant side effect was to force spamming underground and out of the reach of law enforcement. Face with service interruptions, spammers began in early 2004 to migrate their operations to a highly scalable distribution platform immune to law enforcement: the botnet.

By the end of the same year, the majority of spam was being delivered by decentralized networks such as "Phatbot" - and nowadays by Storm, Mega-D, and Srizbi - lending little hope to Bill Gates' famous pronouncement that spam would be beaten before the end of 2006.

The fact is that there are limitations with each anti-spam technique. Content filters are a core component of that architecture and are very effective at separate spam from email once they receive and recognize it. DNSBLs can block bad senders from known IP addresses once they known the sender is bad. But what happens when a botnet harvests new zombies with IP addresses unknown to DNSBLs and uses those to send new spam campaigns – something that happens every day? Discarding spam after you receive it does nothing to decrease high spam traffic from new campaigns. What is needed is a combination of the best-of-breed elements suited to deal with each type of spam: known content, unknown content, known senders and most importantly the unknown sender.

If you're doubling servers to deal with heavy spam loads, your infrastructure costs are under control of the spammers who can just keep sending more spam. What you need is a new solution that can block most spam without having to receive the message first in order to get the costs and the load back under control and ensure your infrastructure is used to deliver legitimate mail first.

NEXT: Post #3 Once Promising Proposals for a Final Ultimate Solution to the Spam Problem (FUSSP)
PREVIOUS: Post #1 Short History on Spam Protection

Friday, March 28, 2008

Post #1 on Why Spam Filters Suck "trickle blog" series



A Short History of Spam Protection

While methods have changed, spam continues to be the misuse of an open communication network for financial gain. What was once a harmless annoyance has led to serious conditions where high spam traffic can clog email servers to the detriment of legitimate mail.

How did we get here? And what can we change to solve the problem?

The first spam email ever was used to promote a seminar from Digital Equipment Corporation (DEC) in 1978. I'd call it spam because it was a mass emailing harvested from a printed directory of ARPAnet to recipients who had not requested any contact.

Spam didn't become a huge problem until around 2002 when there were enough active email users worldwide to make spamming profitable. In response, the first commercial and open source spam filters arrived in Brightmail, PureMessage, and SpamAssassin to name a few. The first generation of filters applied sets of rules to each message received, identifying features within messages which might indicate the likelihood
of being spam.

Spammers countered rule-based filters by obfuscating the content of their messages. Rather than sending a text message advertising Viagra, for example, the spammer might chop the message into small HTML pieces which, while unrecognizable to the spam filter, would still render into legible text for the message recipient. The rule-based filters added more rules to catch these obfuscations, causing the spammers to further innovate. This pattern of content obfuscation continues to the present day, the most recent example of which is probably MP3 spam (i.e. spam message contained in an audio file).

Anti-spam is one of those areas of IT where you're "damned if you don't." If email is flowing free of spam, you hear nothing. But when spam is getting through or emails are backlogged on the server, there's hell to pay.

Why is spam causing backlogs? Why is all mail treated equally? And do we need to keep adding what are effectively junk processing servers?

As the sophistication of spam has increased so has the need for processing power to analyze those messages. Today, with email servers under high traffic loads, the ever increasing computational cost and processing overhead of analyzing the content of every email often results in service disruptions for legitimate email. This has to change. IT infrastructure costs should be a function of legitimate activity not spammer driven loads.

To solve the loading problem imposed by the current method of spam filtering where all incoming email messages are accepted by the server, buffered in a common queue on a first-come first-served basis, there needs to be a shift away from a single-queue of email traffic towards a prioritized system that can expedite legitimate mail first.

But there's more that needs to be considered...

UPDATE: On the subject of the history of spam, Christopher Nickson writes that the word "spam" to describe unsolicited commercial email recently celebrated it's 15th anniversary.

NEXT: Post #2 Prohibition Induces "Botlegging"