Wednesday, April 30, 2008

Post #7 on Why Spam Filters Suck "trickle blog" series



Slowing Things Down

The problem is, typical email systems work in a queue. This means that high spam traffic clogs your network and crowds out legitimate mail. Botnets pour messages into your network, and mail servers receive the messages as quickly as they can. Next, the spam filter analyzes and tries to filter out any messages that appear to be spam.

Filters are effective at separating spam from email but do nothing to stop the rising volume of SMTP connections hammering the server. When spam traffic rises, the server becomes overloaded and results in delivery delays for all email, similar to how a backlogged exit ramp can impede the flow of traffic on a highway during peak hours.

Today, Internet facing email servers accept thousands of emails per minute. As spam volume increases, so too does the CPU required to process all that mail. The blunt solution is to scale hardware to keep up with volume but this is a one-to-one cost -- ­ the more volume, the more servers are needed.

The fact is spam filters aren't getting a whole lot more accurate, and it certainly doesn't help that blocking spam is a reactive approach­ -- a sender needs to be identified first before rules or signatures are updated. Filters will always be playing catch up with the spammers.

If you block based on reputation, what do you do when a new spam campaign breaks out and the sender has never been seen before?

What is needed is a way to get rid of the spam and prioritize legitimate mail without having to receive all the messages first or know who the bad senders are before hand.

To use the highway analogy, what if you could put good senders in an express lane and the spammers in the slow lane so that legitimate email can be delivered first?


NEXT: Post #8 Dealing a Blow to Spammers

PREVIOUS: Post #6 Blocking Spam in 2008

Friday, April 25, 2008

Update: Anti-spam technology adoption

It was pointed out to me that I had missed a key element of the difference in the adoption of Sender Authentication vs. Reputation filtering, the issue of single-party vs. multi-party technology adoption.

Sender authentication requires multi-party adoption. Sender Authentication requires both the sender and the receiver to act. The sender must define their authentication information and the receiver must implement technology to check these records as each email is received. In this situation until there is a sufficient critical mass of senders implementing their records or a large recipient (Yahoo's adoption of DKIM for example) to drive the change there is insufficient incentive to act. Creating a chicken and the egg situation.

Reputation on the other hand only requires single-party adoption. As soon as I have the reputation data to work with I can implement the technology on my systems and start benefiting immediately. No critical mass of adoption is required for the technology to succeed.

Many of the objections to FUSSP proposals are based on the difficulty presented by multi-party adoption. Getting hundreds of thousands of email servers and millions of email users to change technology and behavior in order to stop spam is a major undertaking and will be both a slow and likely incomplete process but not necessarily futile. Sender Authentication provides a good example, its a slow process but authentication has been adopted relatively widely and has reached a point where it provides useful data, so these sorts of multi-party changes can be effective, eventually and should not be a reason for dismissing such initiatives.

Tuesday, April 22, 2008

Newsletter forwarding , thanks for your account info!


I recently received an HTML email newsletter forwarded to me by a friend. The email was a great offer on hotel deals and thinking I might be interested, my friend forwarded it to me.

I usually just throw these things away. But an "account" link at the bottom caught my eye. Curious, I took a look. The link very conveniently took me directly to my friends account page, no login, no password, straight in! Once there I had access to his address , phone number, password, account settings and various other identity information that I'm sure my friend is not keen to disclose.

Including links like this that circumvent login pages is a bad practice on the part of the newsletter senders, but should also serve as a warning for people. Be very careful when forwarding email, especially HTML email as it may just be more personal an email than you really want to send to all your friends.

Lets be careful out there!

Monday, April 21, 2008

Post #6 on Why Spam Filters Suck "trickle blog" series



Blocking Spam In 2008

Like a shepherd, the duty of a bot herder (botnet operator) is to keep his/her botnet army intact. Bot herders make money by amassing a botnet, then contracting out the botnet services to spammers. That's right, spammers employ bot herders to do the dirty work for them!

Bot herders only get paid by the spammer when a message is actually delivered to the receiving email server. For those readers familiar with SMTP protocol, this means that the bot herder only gets paid once the server has sent 250 Ok after the DATA phase. In order to make a lot of money, bot herders have to send as much as possible in the shortest possible time. If a zombie is being blocked, the bot herder doesn't make any money. The bot herder only makes money when a message is actually received by the receiving email server.

Spamming software is impatient. In programming terms, spamming software has a very low timeout. The SMTP RFC recommends that email servers wait at least three minutes for each chunk of data they send to be received by the receiving server and acknowledged via a TCP acknowledgement packet. Furthermore, the RFC recommends that senders wait at least ten minutes for the final message delivery acknowledgement.

These long timeouts were established because in the early days of the Internet, the infrastructure was slow and unreliable, and the machines were easily overloaded, leading to frequent message delivery delays. Today, email servers and our networks are much faster at processing incoming messages in a matter of seconds. Delays still occur, but the
timeouts defined in the RFC are vastly higher than what is required in today's world.

Because bot herders don't get paid until they receive the 250 Ok, their software earns a higher profit by disconnecting after a few seconds and seeking out new victims whose servers respond more quickly. Bots can't afford to wait for a slow connection to go through, and they can't risk being discovered and put on a blacklist.

A few years ago, the MIT Spam Conference was a very interesting place. Each year, bright-eyed graduate students and intrepid industry types would present new filtering techniques that pushed the accuracy of spam filters to new levels. For the past three years, improvements in spam filter effectiveness has plateaued. A great result is a paper that shows the accuracy improvement of half a percent. Spam filtering has essentially become maxed out as a technology, and there isn't much more we can do but tweak rules to avoid falling behind the spammer's arms race.

Similarly, reputation systems which identify suspicious IP addresses have become asymptotic in their effectiveness. The spread of botnets has led to a virtually inexhaustible supply of new IP addresses, that spam us a few times and then disappear forever. Most of the large anti-spam companies now have comprehensive blacklists that are updated every minute.

In other words, anti-spam systems worldwide are blocking everything they possibly can. And yet spam continues to grow as a problem -- it's unbelievable. So what can we do?

Bill Gates was right in 2004. He boldly posited that the way to solve the spam problem was to introduce a cost barrier that caused spamming to be no longer profitable. Unfortunately, spammers created botnets, which have rendered to them more computing power than most governments. One way to think of the problem is that the spammers have millions of computers. You only have a handful. And you have to pay for yours. Who's going to win? While we can't win the spam war with better filters or better blacklists, there are alternatives.

To deter spamming we must undermine spammers, not simply block messages. You can make botnets unprofitable by slowing down SMTP traffic from spammers. This not only gives the receiver control of each email connection, but it also consumes sender resources to reduce the spammer's sending rate significantly.

Imagine the chaos at an airport without air traffic controllers and you begin to see why mail servers need email traffic control.

NEXT: Post #7 Slowing Things Down
PREVIOUS: Post #5 Why Are Botnets So Difficult To Stop?

Friday, April 18, 2008

Post #5 on Why Spam Filters Suck "trickle blog" series


Why Are Botnets So Difficult To Stop?

Definition: a "botnet" is commonly known as a network of infected computers used to send spam (among other actions).

The largest botnets contain hundred of thousands of "zombie" machines controlled by a "bot herder," who uses sophisticated encryption, infection and peer-to-peer (P2P) networking techniques to ensure the permanence and growth of the botnet. As the zombies are used, they become discovered and subsequently blocked. While individual zombies are constantly changing, the overall botnet and people who control them remain the same.

Because of botnets, spam does not come from a predictable set of computers rather, it comes from all over the place in a completely unpredictable manner. By leveraging the diversity of IP addresses available via botnets, spammers have rendered the blocking approach far less effective than it once was.

Further, as the number of broadband subscribers continues to grow ­ most rapidly in developing economies such as China and Eastern Europe the number of computers available to exploit for participation in botnets is expanding. As botnets increase in size and sophistication, trying to identify where the "bad stuff" is coming from is becoming less and less worthwhile.

Indeed, researchers at Georgia Tech discovered in 2006 in a survey of data from the Spamhaus black list that only 5 per cent of botnet IP addresses ever end up listed in the Spamhaus database. In another paper, the same researchers found that 85 per cent of spam zombies sent fewer than ten email messages to their honeypot server over the course of about 18 months, as shown in the above graph.

Example: A Transient Zombie

In late 2007, the zombie at 201.21.174.207 (a Brazilian broadband subscriber address) began sending approximately three spams each day into one of our honey pot systems. It took 19 days for the first real-time blackhole list (RBL) to identify this IP address and cause it to be blocked. By sending only a very light trickle of email, zombies can evade detection.

While blocking continues to be a core component of the multi-layered anti-spam architecture, it makes little sense in 2008 to depend on filtering technology designed to block spam in 2001 before the advent of botnets. Approaches that seek to block spam fail to deal with the issue of unknown senders.

NEXT: Post #6 Blocking Spam in 2008
PREVIOUS: Post #4 Spamonomics: The Economics of Spamming

Wednesday, April 16, 2008

Why anti-spam effectiveness testing sucks


InfoWorld have released a review of various anti-spam systems and along with that a comparison chart of effectiveness based on their long-term (2 week) testing of each of the systems. The report ends with the common issue of how to determine which one is the best given that there are multiple variables involved. Terry Zink has taken the results a step further and attempted to resolve the capture rate and false positive results to a single value. I agree that a single figure would help compare but it makes it even more important to get the underlying data right and to measure the right things. I think we need to consider variation in effectiveness as an overall more important measure of spam protection than capture rate.

Anti-spam effectiveness tests suck because:
a) nobody seems to be able to analyze and report statistics these days and
b) they test the wrong thing. Outbreak response time is the issue not long-term capture rates.

First lets talk about statistics. Initially I was going to rant about the general poverty of meaning in statistical reporting in terms of no standard deviations and excessive significant digits but then I realized that even the capture rate calculations are wrong. If you're going to go to all the effort of testing at least put some quality into your statistical analysis.

Looking at these results I see a wildly divergent volume of mail and spam being received by each of the anti-spam systems during their test period. The author reports that each of the systems received similar amounts of mail (13000~14000 messages) but that systems varied in the amount of messages they rejected at the connection level (using reputation filtering or DNSBL's) because they were spam. If that's true the results of this test are reported incorrectly because the dropped connections are not reported or factored into the spam capture rate.

If I'm barracuda and I drop 10,000 spam messages at the connection level and then another 1750 with content filtering thats a capture rate of 98% not 88%. It also means I'm doing a lot more to reduce load on the server since those dropped messages are never received and scanned. So the results are wrong, which is especially annoying since these results are going to be quoted and used in sales calls for the next 3 years and will affect some people's lives or at least livelihoods.

But I have a bigger concern with these tests, which are the same as every report on spam testing I've seen for the last 5 years of watching these things. The tests look at the wrong issue.

Spam is not a two week issue, it is a NOW issue. What matters is the amount of spam am I getting right now. How much of it is getting through my filters, hammering my email servers, annoying my users and filling up my archiving system.

If we want a single number or any measure it needs to be useful and long term capture rates are not very meaningful, especially when they are based on medium term tests.

What I want to know is what were the spammers doing during the time of each of those tests. Which vendors were hit with big new spam campaigns and which were sitting there during a lull in spam activity. Which were hit with a whole lot of new spam techniques during their test and which received all stale old spam campaigns anyone should detect.

We can't tell what was actually happening because all the data is rolled up into one nice neat number 9x.xxx% spam detection. A real world comparison of anti-spam effectiveness would measure the capture rate every 10 minutes, plot it and look at how often the capture rate dropped below some threshold, say 80% for the sake of argument, and then measure how long it took to recover back up to a 95% or so capture rate. That measure of the number of outbreaks that hit and the response time gives us a measure of the resiliency of the anti-spam system to new campaigns and the ability of the vendors labs to respond to those issues.

The key element of anti-spam protection is how organizations respond to new outbreaks, the sorts of outbreaks that cause the noticeable dips in effectiveness that in turn result in server load peaks, help desk calls and significant spam impacts. These are the spam concerns an ISP or an IT manager needs to plan for, not the ongoing general spam level which most people just put up with.

If we are comparing anti-spam effectiveness lets compare the systems capability to deal with the outbreaks not the ability to deal with the every day junk that most vendors get 95+% of.

Tuesday, April 15, 2008

ClamAV Vulnerabilities



The US-CERT website posted an advisory in relation to multiple ClamAV vulnerabilities. In total, four vulnerabilities were discovered which could result in remote code execution or a denial of service attack.

Fortunately, ClamAV have released version 0.93 with fixes for these issues. The change log shows the following fixes:

Mon Apr 14 21:35:11 CEST 2008 (tk)
----------------------------------
* Check in 0.93 patches:
- libclamunrar: bb#541 (RAR - Version required to extract - Evasion)
- libclamav/spin.c: bb#876 (PeSpin Heap Overflow Vulnerability)
- libclamav/pe.c: bb#878 (Upack Buffer Overflow Vulnerability)
- libclamav/message.c: bb#881 (message.c: read beyond allocated region)
- libclamav/unarj.c: bb#897 (ARJ: Sample from CERT-FI hangs clamav)
- libclamunrar: bb#898 (RAR crashes on some fuzzed files from CERT-FI)

The update to ClamAV is available for download here

Monday, April 14, 2008

When "Reputation Filtering" Fails, How to Deal with "the Unknown"


As discussed in a previous post, spammers are increasingly sending spam using legitimate email servers operated by free email providers such as gmail, Hotmail, and Yahoo!. While our data indicates that botnets remain the largest source of spam by a wide margin, the spam from legitimate servers nonetheless presents a difficult and significant challenge for so-called "reputation filtering" techniques.

When spammers abuse otherwise legitimate mail servers (such as those operated by Google, Yahoo, and Hotmail), the reputation of these servers declines. However, since a great deal of legitimate email is also sent from these servers, their reputation is rarely bad enough to be safely placed in the "known bad" category. In MailChannels' reputation system, legitimate servers like this fall into a grey area that we call, rather ominously, "the unknown." The unknown also includes servers about which we known nothing, or very little.

By increasing their use of legitimate mail servers, spammers are increasing the proportion of spam that originates from "the unknown" while consequently reducing the proportion that originates from "known bad" sources. As the volume of spam originating from "the unknown" grows, the effectiveness of spam filters that rely on blocking traffic from "known bad" sources will degrade. Since most anti spam filters rely on blocking for a substantial (typically 70% or higher) portion of their overall effectiveness, successful spam fighting will soon require either a large increase in filter accuracy (to get rid of the increased amount of spam making it past blocking systems), or something completely different.

But writing better filters is difficult. As we have discussed in our ongoing series of posts on "Why Spam Filters Suck," spam filtering requires substantial computational resources -- not to mention requiring the receipt and storage of email messages so that they can be scanned in the first place, and then more resources to archive filtered messages in a quarantine for later review by an unappreciative user base.

For these reasons, our preference is to reduce the need for filtering, and more to place more emphasis on intelligently dealing with traffic from the unknown.

Our approach to the unknown is simple: slow it down. Slowing down connections from mail servers in which we have an "unknown" (or perhaps "ambiguous") trust level allows legitimate traffic to continue flowing, while limiting the potential for abuse. It's not perfect, but hopefully with time the unknown will either clean up their act (i.e. become "known good"), or decline to the point that they can be blocked. Until a clear reputation emerges, slowing down traffic is really the best we can do.

There are different ways to slow down email traffic, but as with the Jedi, we believe there is only one true way. MailChannels' flavor of slowing down is called traffic shaping. Traffic shaping restricts the bandwidth available to connections from "the unknown" by altering the characteristics of the TCP connection and slowing responses to SMTP commands. The effect of traffic shaping is akin to simulating a very poor network connection, such as the connection you might get if you were using an old acoustic-coupled analog modem such as the one shown here.

Anti-spam vendors often mis-use the term traffic shaping, confusing it with simple techniques such as "rate limiting" (which involves reduces the number of new connections a sender is permitted to make in a given time period). Very few vendors actually offer traffic shaping, but most claim they do. Caveat emptor. To further the confusion, traffic shaping in the context of SMTP connections is sometimes referred to as "tar-pitting". I'll leave the Wikipedia perusal as an exercise for the reader.

MailChannels Traffic Control can safely shape SMTP connections down to as little as one byte per second. We track the effectiveness of Traffic Control around the clock via our reputation feedback service, and have collected substantial evidence that suggests that 90-97% of botnet originated spam disappears when connections are slowed to this extent. In fact, our data shows that fewer than 10% of botnet connections remain active after just 10 seconds of traffic shaping.

Traffic shaping has a dramatic effect on spam volume and consequently on overall mail system loads, because most of the spam never makes it into the filter. As if guided by Adam Smith's invisible hand, spambots are drawn to easier targets that will accept email quickly, and thus move on before completing message delivery.

Of course, Google's outbound mail servers are not operated by spammers. Most of Google's email users are legitimate, and therefore expect that Google's server will patiently persist as instructed to by the SMTP standard until their email gets through. So when spammers start sending via Google's gmail service, their spam gets the red-carpet treatment: patient outbound servers that will wait until even a very slow email server has finished receiving their payload.

Indeed, our data shows that legitimate sending servers will wait almost without exception a minimum of two minutes for message delivery to complete. Most will wait at least five minutes, and the standard recommends 10 minutes. That's an eternity when compared with the premature disconnection behavior of most spambots.

So is all lost for the effectiveness of traffic shaping?
No, for a couple of reasons.

First, legitimate senders such as Google, Yahoo, and Hotmail work very hard to kick spammers off of their networks. They do this to avoid having their mail servers traffic shaped or blocked - a situation which causes message delivery problems for their users and therefore severely degrades the usefulness and profitability of their email services. As spammers attempt to abuse these services more, we expect that they will place greater requirements on users to establish their own positive reputation before being allowed to send more than a very small amount of email.

Second, even though legitimate email servers are far more likely to complete message delivery over a traffic shaped connection, message delivery is nonetheless delayed by several minutes. This delay might not sound like much of an advantage to the receiver (delaying message delivery by a few minutes), but the advantage this delay provides to spam filters is substantial. Allow me to explain.

These days, spam filters almost universally rely on constant, around-the-clock updates from centralized databases that track the latest spam campaigns. The frequency of these updates varies greatly between vendors, with the very best vendors aiming for a two minute update frequency, and typical update frequency probably averaging around ten minutes.

By delaying the receipt of a spam message until the filter has been updated to recognize and reject that message, we gain a substantial filter effectiveness boost.

But what about truly legitimate traffic that is slowed down? Our position is that delays of a few minutes are seldom noticed by Internet users. If a few minutes' delay is the cost we must bear to gain a substantial improvement in spam filtering, most Internet users are willing to be patient -- at least, more patient than the spammers.

Friday, April 11, 2008

Post #4 on Why Spam Filters Suck "trickle blog" series



"Spamonomics": The Economics of Spamming

Spammers earn billions of dollars annually. The business is efficient, hierarchical, and organized. In much the same way that the global trade in narcotics involves every conceivable method of smuggling (from submarines to drug mules), the spam trade employs software engineers to develop increasingly sophisticated delivery technologies. Just as the drug trade will continue until the end of humanity, so too will the illegal delivery of spam.


To understand how spamming has become such an intractable problem, it serves to analyze the economics that drive spamming. Spammers make money if one in every 30,000 recipients makes a purchase. And given this response rate, a spammer advertising pharmaceutical products can expect to make roughly $5,000 per million email messages sent.


Finding out what it costs to send spam is not difficult: Botnet operators advertise their spamming services via online forums. One forum mentioned a price of $100 to send one million spam messages. If we assume that $100 is the cost per million spam messages, and $5,000 is the revenue, then the gross margin from spamming is approximately 98 percent.


Although some spam filters provide better accuracy than others, filter accuracy across the board is approximately 90 per cent, meaning that only one in ten spam messages reach a recipient. If global anti-spam effectiveness could be improved from 90 to 95 per cent, earning $5,000 from spamming would require sending 2 million spam messages, rather than 1 million. This increase in volume would reduce the spammers’ profit margin from 98 per cent to 96 per cent assuming sending costs remained constant. If global anti-spam accuracy reaches 99 per cent -- a figure that experts will tell you is nearly inconceivable given the innovative methods of spammers -- sending costs would reduce spamming margin to 80 per cent. Google is one of the world’s most profitable advertising companies with a margin of 25 per cent -- imagine 80 per cent? This is a business that won’t be going away any time soon.


Before botnets arrived, spammers could be stopped by blocking their IP addresses. DNSBLs like Spamhaus and Habeas block between 60-70%. With the introduction of botnets, blocking no longer provides a sufficient solution to the spam problem.


NEXT: Post #5 Why Are Botnets So Difficult To Stop?

PREVIOUS: Post #3 Final Ultimate Solution to the Spam Problem (FUSSP)

Wednesday, April 9, 2008

Sender Authentication, Gmail abuse, IPv6 ... Discuss!

Lately, I've been thinking about several related issues:

  • The challenges and effectiveness of sender authentication and reputation filtering.
  • The rise of Gmail spam and MessageLabs subsequent attempt to throttle it now that Gmail's Captcha is broken.
  • The issue of IPv6 reputation as raised by Cloudmark.
How are these issues related?

Anti-spam systems have steadily improved their ability to identify and block known spam senders.However, this is having a significant impact on the value of legitimate addresses.

Authentication, reputation systems, computational challenge, and traffic shaping share an “Achilles Heel.” They dramatically increase the value of hijacking legitimate servers. If the spammers hijack legitimate email servers or domains their messages will get through because they are now coming from legitimate senders. We see this all the time with spam from all sorts of legitimate sites but we've also seen a jump in spam from Gmail since their account creation Captcha mechanism has been cracked. What if all my mail is hosted on Gmail? How do recipients distinguish all these hosted senders? Can centralized reputation systems be expanded to track reputation at the individual sender level? Do we want them to?

As Cloudmark suggests in the interview, if we ever get to IPv6 , reputation will be compromised as far as spam protection goes. There will be so many addresses we'll be back to every spammer being an unknown sender. Reputation filtering will fail unless hard authentication is also widely adopted to enable recipients to reject mail not coming from known legitimate senders.

Along with increasingly aggressive treatment for unknown senders, spam protections will need to implement greater restrictions and careful scrutiny of webmail providers offering free accounts, especially those with automated account creation. There will also be a greater need for IT administrators to protect their systems from hijacking.

Tuesday, April 8, 2008

Anti-Spam Technology Adoption


In his comments on Post #3 of our trickle blog, TZink notes that Bill Gates would have been right about the 2-year time frame for stopping spam if the computational challenge and sender authentication measures had been widely implemented.

They were probably right. These steps would have worked but didn't because these techniques interfere with legitimate email delivery. Let's take a closer look at the challenges to implementing these and other anti-spam technologies by comparing the adoption of Sender Authentication and Reputation filtering.

To date, Sender Authentication has been limited in its deployment and usefulness by factors related to how legitimate organizations use email.

The first barrier to adoption is inconvenience. If we look at Sender Authentication today, most organizations implement soft-sender authentication. Soft-fail authentication basically says "Our email comes from these specific servers, OR it could come from anywhere else." This is not very different from the unauthenticated model of "Our email comes from anywhere" because I can't reject mail, even if it comes from somewhere other than the authenticated servers.

To enable rejection of messages, organizations should implement hard-fail authentication and state "All email from our domain originates at these servers." This is a good clear statement enabling the rejection of mail coming from elsewhere. Why don't more organizations do this?

One reason appears to be because it's inconvenient for their users if they enforce the use of their servers. Many end users dislike authentication because they must now setup their email client to send only via those servers, rather than change the "from" address and send from anywhere. That this process is perceived as a burden on senders is unclear to me, and it remains one of the barriers to implementation. [Disclosure: Mailchannels sets our records to soft-fail authentication and its unclear to me why]

A second barrier to adoption is the lack of incentive to do it. Unless I'm worried my email will get blocked when I send it, there is little value in configuring authentication for my servers. The value of authentication is mainly derived by the recipient - better and clearer information helps them decide whether or not to receive my email. However, people want to get their mail so authentication is configured on the (pragmatic) assumption that most servers don't have authentication setup. In this case, without authentication data, the recipient will normally receive the mail anyway. Since I can still deliver my messages without setting up authentication, why bother doing so. If measures like this are to be effective, the default action needs to be to enfocement. This would likely penalize most legitimate senders - hence adoption is slow. Yahoo and others have become more aggressive in their requirement for Authentication, the adoption has improved.

Another barrier to the adoption of authentication is that the value in taking the time to authenticate is perceived as low. Knowing that a person is who they claim to be is not in itself helpfulunless there is some measure determining whether or not that person is worth talking to. A driver's license is more useful to a police officer if they can also run your ID through a records search. It's not much good for me to know that yes, you are "Bob". If I want to do something with the information, it's better for me to know you are "Bob the known spammer."

But the number one reason for poor adoption is simple ... authentication on its own is useless for stopping spam.

Sender Authentication is only solves one aspect of email abuse, address spoofing. With SMTP any email can be sent from anywhere claiming to be from anybody. Sender Authentication enables the recipient to check whether a message was sent from a server belonging to the organization it claims to be from. The technique has proven effective against phishing attacks but spammers aren't impersonating anyone so sender authentication doesn't really help. What we get is mail from authenticated spammers.

I hate to be sounding like Ironport but, what is needed is reputation.

Sender Authentication could have stopped spam if everyone (or a large subset of everyone) agreed to register their servers or addresses with some central authority that could clearly identify the legitimate registered senders and be used to allow that mail through and block the rest. But who is going to be that authority? How will it be policed? Where will it operate? Can I trust it? What if there is more than one authority? Can I trust all of them? The internet was designed to avoid this sort of centralized control. It is pretty hard to get that cat back in the bag.

Instead of an agreed authority, what has arisen are third-party reputation systems that came along as an evolution of blacklists. These systems track the history of the senders they see in their traffic. They have been effective against spam because they identify the known bad addresses and block those. They also identify known good senders and allow those messages through. Each of these systems tries to be a central authority for email reputation. However, they don't work well with unknown senders because the senders don't have to register first. The systems don't have enough reputation information to stop the message. Each day, Botnets exploits the fact that it takes time to see a new address, and then give it a reputation score.

Reputation has been widely adopted where Authentication has not. The difference between them in terms of adoption are clear. Reputation does not inconvenience end users. There is incentive to implement reputation because it reduces load on servers. The value is high because it can be used it to make real decisions. Most importantly, it works to reduce a real pain.

In his comments on Post #3 of our trickle blog, TZink notes that Bill Gates would have been right about the 2-year time frame for stopping spam if the computational challenge and sender authentication measures had been widely implemented.

They were probably right. These steps would have worked but didn't because these techniques interfere with legitimate email delivery. Let's take a closer look at the challenges to implementing these and other anti-spam technologies by comparing the adoption of Sender Authentication and Reputation filtering.

To date, Sender Authentication has been limited in its deployment and usefulness by factors related to how legitimate organizations use email.

The first barrier to adoption is inconvenience. If we look at Sender Authentication today, most organizations implement soft-sender authentication. Soft-fail authentication basically says "Our email comes from these specific servers, OR it could come from anywhere else." This is not really very different from the unauthenticated model of "Our email comes from anywhere" because I can't reject mail, even if it comes from somewhere other than the authenticated servers.

To enable rejection of messages, organizations should implement hard-fail authentication and state, "all email from our domain originates at these servers." This is a good clear statement enabling the rejection of mail coming from elsewhere. Why don't more organizations do this?

One reason appears to be because it's inconvenient for their users if they enforce the use of their servers. Many end users dislike authentication because they must now setup their email client to send only via those servers, rather than change the "from" address and send from anywhere. That this process is perceived as a burden on senders is unclear to me, and it remains one of the barriers to implementation. [Disclosure: Mailchannels sets our records to soft-fail authentication and its unclear to me why]

A second barrier to adoption is the lack of incentive to do it. Unless I'm worried my email will get blocked when I send it, there is little value in configuring authentication for my servers. The value of authentication is mainly derived by the recipient - better information clearer information helps them decide whether or not to receive my email. However, people want to get their mail so authentication is configured on the (pragmatic) assumption that most servers don't have authentication setup. In this case, without authentication data the recipient will normally receive the mail anyway. Since I can still deliver my messages without setting up authentication, why bother doing so. If measures like this are to be effective the default action needs to be to enforce it, but that would penalize most legitimate senders - hence adoption is slow. Although, as Yahoo and others have become more aggressive in their requirement for Authentication, the adoption has improved.

Another barrier to the adoption of authentication is that the value is low. Knowing that a person is who they claim to be, is very low unless you have some measure of whether that person is worth talking to. A driver's license is more useful to a police officer if they can also run your ID through a records search. It's not much good for me to know that yes, you are "Bob". If I want to do something with the information it's better for me to know, yes you are "Bob the known spammer".

But the number one reason for poor adoption is simple ... authentication on its own is useless for stopping spam.

Sender Authentication is an only solves one aspect of email abuse, address spoofing. With SMTP any email can be sent from anywhere claiming to be from anybody. Sender Authentication enables the recipient to check whether a message was sent from a server belonging to the organization it claims to be from. The technique has proven effective against phishing attacks but spammers aren't impersonating anyone so sender authentication doesn't really help, what we get is mail from authenticated spammers.

I hate to be sounding like Ironport but what is needed is reputation.

Sender Authentication could have stopped spam if everyone (or a large subset of everyone) agreed to register their servers or addresses with some central authority that could clearly identify the legitimate registered senders and be used to allow that mail through and block the rest. But who is going to be that authority? how will it be policed? where will it operate? can I trust it? What if there is more than one, can I trust all of them? The internet was designed to avoid this sort of centralized control, its pretty hard to get that cat back in the bag.

Instead of an agreed authority, what has arisen are third-party reputations systems that came along as an evolution of blacklists. These systems track the history of the senders they see in their traffic and have been effective against spam because they identify the known bad addresses and block those, and identify known good senders and allow messages from those through. Each one of these systems tries to be a central authority for email reputation, but they don't work well with unknown senders because the senders don't have to register first and the systems don't have enough reputation information to stop the message. Every day, Botnets exploit the weakness that it takes time to see a new address and give it a reputation score.

Reputation has been widely adopted where Authentication has not. The difference between them in terms of adoption are clear. Reputation does not inconvenience end users, there is incentive to implement because it reduces load on my servers, the value is high because I can use it to make real decisions and most importantly it works to reduce a real pain.

Monday, April 7, 2008

Post #3 on Why Spam Filters Suck “trickle blog” series



Once Promising Proposals for a Final Ultimate Solution to the Spam Problem (FUSSP)


"Two years from now, spam will be solved."


That was Bill Gates' famous pronouncement back in 2004. Microsoft, Yahoo and the open source community devised two techniques that they believed would eradicate spam. The first was sender authentication, which allowed email senders to provide a list of the servers permitted to send email for users within their domain. The idea was that sender authentication would eliminate spammers spoofing legitimate email addresses, and allow for the creation of a permanent, ironclad white list of trustworthy domains that never send spam, thus allowing recipients to simply block everything not on the white list and end spam forever.


Another idea pitched in 2004 was the computational challenge. Senders would, upon connecting to a receiving email server, have to spend considerable CPU cycles computing the answer to a mathematical challenge provided by the receiving server. Bill Gates believed this approach would stop spam by making it cost too much to send the high volumes of email required to make spamming profitable.


Unfortunately, neither sender authentication nor the computational challenge technique resolved the spam problem. Computational challenges were rejected as being too costly for legitimate bulk email senders (airlines, banks, open source mailing lists, etc.) And sender authentication while eventually enjoying wide-spread adoption in the form of DKIM and SenderID, proved prone to errors. As as result it has remained useful mostly for the acceptance of legitimate email and phishing protection rather than the rejection of spam.


By 2005, what the anti-spam community was getting right was content filtering. When spam filters had reached above the 90 per cent accuracy level, spam transitioned from a problem of content to a problem of volume, the spammers simply send more spam. And they can do this because the recipient pays the cost of content filtering rather than the spammer.


The cost of a resource-consuming filtering system increases during high traffic loads. If you block spam content, spammers will find new ways to get around it. Bill Gates was right, the only way to stop them is to create difficulty by making spam too costly to send. If you do spammers are left to find new targets that are easier to hit.


NEXT: Post #4 Spamonomics: The Economics of Spamming

PREVIOUS: Post #2 Prohibition Induces "Botlegging"

Thursday, April 3, 2008

Post #2 on Why Spam Filters Suck "trickle blog" series



Prohibition Induces "Botlegging"

Spamming is a "tragedy of the commons," in which a finite resource (our time and attention) is abused at low cost by a minority (the spammers). Like many such tragedies in our human history, prohibition has been seen as the quick fix. Classic targets of prohibitionism include alcohol, drugs, and gambling. The idea is simple really. Stop spammers from profiting by making the actions illegal, enforceable and a harmful choice to the culprit. However, this kind of law is difficult to enforce.

In 2003, American legislators passed the CAN-SPAM Act (Controlling the Assault of Non-Solicited Pornography And Marketing). CAN-SPAM made it illegal to send unsolicited bulk email with a deceiving subject line and forced legitimate senders to identity themselves with a full mailing address.

So why then, does spam volume continue to rise despite an increased adoption of spam blocking mechanisms worldwide?

Several years have passed and spam volume is higher than ever. While CAN-SPAM is rightly criticized for not ending the spam problem, its most significant side effect was to force spamming underground and out of the reach of law enforcement. Face with service interruptions, spammers began in early 2004 to migrate their operations to a highly scalable distribution platform immune to law enforcement: the botnet.

By the end of the same year, the majority of spam was being delivered by decentralized networks such as "Phatbot" - and nowadays by Storm, Mega-D, and Srizbi - lending little hope to Bill Gates' famous pronouncement that spam would be beaten before the end of 2006.

The fact is that there are limitations with each anti-spam technique. Content filters are a core component of that architecture and are very effective at separate spam from email once they receive and recognize it. DNSBLs can block bad senders from known IP addresses once they known the sender is bad. But what happens when a botnet harvests new zombies with IP addresses unknown to DNSBLs and uses those to send new spam campaigns – something that happens every day? Discarding spam after you receive it does nothing to decrease high spam traffic from new campaigns. What is needed is a combination of the best-of-breed elements suited to deal with each type of spam: known content, unknown content, known senders and most importantly the unknown sender.

If you're doubling servers to deal with heavy spam loads, your infrastructure costs are under control of the spammers who can just keep sending more spam. What you need is a new solution that can block most spam without having to receive the message first in order to get the costs and the load back under control and ensure your infrastructure is used to deliver legitimate mail first.

NEXT: Post #3 Once Promising Proposals for a Final Ultimate Solution to the Spam Problem (FUSSP)
PREVIOUS: Post #1 Short History on Spam Protection