
Blocking Spam In 2008
Like a shepherd, the duty of a bot herder (botnet operator) is to keep his/her botnet army intact. Bot herders make money by amassing a botnet, then contracting out the botnet services to spammers. That's right, spammers employ bot herders to do the dirty work for them!
Bot herders only get paid by the spammer when a message is actually delivered to the receiving email server. For those readers familiar with SMTP protocol, this means that the bot herder only gets paid once the server has sent 250 Ok after the DATA phase. In order to make a lot of money, bot herders have to send as much as possible in the shortest possible time. If a zombie is being blocked, the bot herder doesn't make any money. The bot herder only makes money when a message is actually received by the receiving email server.
Spamming software is impatient. In programming terms, spamming software has a very low timeout. The SMTP RFC recommends that email servers wait at least three minutes for each chunk of data they send to be received by the receiving server and acknowledged via a TCP acknowledgement packet. Furthermore, the RFC recommends that senders wait at least ten minutes for the final message delivery acknowledgement.
These long timeouts were established because in the early days of the Internet, the infrastructure was slow and unreliable, and the machines were easily overloaded, leading to frequent message delivery delays. Today, email servers and our networks are much faster at processing incoming messages in a matter of seconds. Delays still occur, but the
timeouts defined in the RFC are vastly higher than what is required in today's world.
Because bot herders don't get paid until they receive the 250 Ok, their software earns a higher profit by disconnecting after a few seconds and seeking out new victims whose servers respond more quickly. Bots can't afford to wait for a slow connection to go through, and they can't risk being discovered and put on a blacklist.
A few years ago, the MIT Spam Conference was a very interesting place. Each year, bright-eyed graduate students and intrepid industry types would present new filtering techniques that pushed the accuracy of spam filters to new levels. For the past three years, improvements in spam filter effectiveness has plateaued. A great result is a paper that shows the accuracy improvement of half a percent. Spam filtering has essentially become maxed out as a technology, and there isn't much more we can do but tweak rules to avoid falling behind the spammer's arms race.
Similarly, reputation systems which identify suspicious IP addresses have become asymptotic in their effectiveness. The spread of botnets has led to a virtually inexhaustible supply of new IP addresses, that spam us a few times and then disappear forever. Most of the large anti-spam companies now have comprehensive blacklists that are updated every minute.
In other words, anti-spam systems worldwide are blocking everything they possibly can. And yet spam continues to grow as a problem -- it's unbelievable. So what can we do?
Bill Gates was right in 2004. He boldly posited that the way to solve the spam problem was to introduce a cost barrier that caused spamming to be no longer profitable. Unfortunately, spammers created botnets, which have rendered to them more computing power than most governments. One way to think of the problem is that the spammers have millions of computers. You only have a handful. And you have to pay for yours. Who's going to win? While we can't win the spam war with better filters or better blacklists, there are alternatives.
To deter spamming we must undermine spammers, not simply block messages. You can make botnets unprofitable by slowing down SMTP traffic from spammers. This not only gives the receiver control of each email connection, but it also consumes sender resources to reduce the spammer's sending rate significantly.
Imagine the chaos at an airport without air traffic controllers and you begin to see why mail servers need email traffic control.
NEXT: Post #7 Slowing Things Down
PREVIOUS: Post #5 Why Are Botnets So Difficult To Stop?
Monday, April 21, 2008
Post #6 on Why Spam Filters Suck "trickle blog" series
Posted by
Desmond Liao
at
11:23 AM
Labels: 250 ok, anti-spam, bill gates, bot herders, botnets, data phase, MIT Spam Conference, profit, RFC, smtp, spam, spammers, trickle blog
Subscribe to:
Post Comments (Atom)








9 comments:
An asymptote is not the same as a "plateau".
Filtering an additional half of one percent of the spam I receive would be major achievement, assuming there was no increase in false positives.
The "law" of diminishing returns also applies.
richi.
Agreed, it would be more accurate to say that Reputation systems have plateaued in terms of effectiveness.
David, I disagree. Let's arbitrarily say the state of the art for effectiveness was 90% in 2005, 97% in 2006, and 98% in 2007. Just an example, but arguably not far from the truth.
Does that mean improvements are plateauing? No, not at all. It's a non-linear scale (it's asymptotic).
You might as well say that effectiveness should exceed 100% at some stage!
It depends what you measure, how much spam you stop x%, its just a percentage.
What is non-linear is the measure of how much effort you need to put in to reach x%. And that does increase significantly as you head toward 100%, and we soon hit the point of diminishing returns.
If we were to measure how much effort was required to maintain say a 98% capture rate as a function of time, we would see this value increasing with each new spam innovation and decreasing with each new anti-spam technique. How you would measure that I'm not sure. Perhaps # of man-hours in the anti-spam labs.
A plateau is reached when it doesn't really matter how much effort you put into improving the technique it doesn't improve its effectiveness or efficiency. I think reputation systems have hit this, although I suppose bigger traps and faster processing/updating could still make a marginal improvement.
So I guess we could just drop the fancy words and say they are reaching the practical limit for what they can block. Or as we said, they are blocking as much as they can.
The main point though is that, if you improve your catch rate from 98% to 99% all the spammer has to do is double their message volume to get back to the same delivery rate. Without an economic disincentive every improvement in filtering just results in higher spam volumes. This is why introducing an economic disincentive is so important.
Well, yes. But than again, no.
There's nothing magic about a percentage. Let's a given filter achieves effectiveness e=98%, against a certain number of spam messages in week 1.
If the spammer then doubles his volume in week 2, there's nothing mathematical that forces e to remain at 98% -- it's a measurement, not a constant.
In fact, I'd expect e to increase, all else being equal, because it would have more data to work on.
That's why I fundamentally disagree with your first sentence in the final paragraph -- doubling the spam volume probably doesn't double the number of spam messages in the inbox, all else being equal. (FWIW, I neither agree nor disagree with the rest of the graf.)
Hi Richi,
I'm a developer with Mailchannels and I've been reading this discussion with interest. I thought I'd chime in with some comments and I'm knowingly going off on a tangent here. You raised a couple of interesting points:
"..there's nothing mathematical that forces e to remain at 98% -- it's a measurement, not a constant."
Exactly! This relates to the discussion of The Dip where the effectiveness percentage measurement can fluctuate dramatically due to the response time for rule creation.
I also thought this comment was interesting:
"In fact, I'd expect e to increase, all else being equal, because it would have more data to work on."
If that's true, then converseley wouldn't you expect e to decrease as new spam campaigns are launched with fresh data to be analyzed? This can then be exploited by having smaller campaigns, exploiting the response time window effectiveness dip and using armies of bots.
Absolutely so, and another reason why it's unrealistic to say, "improvements in spam filter effectiveness has plateaued."
It's war out there, and the vendors that continue to improve two-dimensional filtering accuracy are to be applauded. As an industry we (tinw) shouldn't take a position that, "there isn't much more we can do."
First off, I completely agree that it's a war out there and we as an industry can do more. This trickle blog points out the limitations of traditional anti-spam filtering such as DNSBL and content filtering.
Take for example, the following limitations:
* visibility - does the spam message even get reported to the spam lab?
* response time - if reported there is a delay to create and propagate rules
* fp risk - image spam introduced new content that made rules difficult to write
* unknown sources - zombies often go under the radar in DNSBL's
Due to these and other factors there are limitations in improving effectiveness with traditional techniques. Traffic Shaping is agnostic to many of these limitations as it's ideal for dealing with unknown sources while still adhering to RFC's to avoid false positives.
David (and David), don't get me wrong, I agree that your stuff, as well as techniques such as greetpause, Teergruben, and reject-on-pipeline are absolutely valuable. I'm on record as saying so for several years.
My point is simply that the mathematical analysis that confused an asymptote with a plateau was flawed.
Speaking of ways to mess with spammers' heads, here's a plug for Project Honeypot.
Post a Comment