Showing posts with label botnets. Show all posts
Showing posts with label botnets. Show all posts

Monday, April 21, 2008

Post #6 on Why Spam Filters Suck "trickle blog" series



Blocking Spam In 2008

Like a shepherd, the duty of a bot herder (botnet operator) is to keep his/her botnet army intact. Bot herders make money by amassing a botnet, then contracting out the botnet services to spammers. That's right, spammers employ bot herders to do the dirty work for them!

Bot herders only get paid by the spammer when a message is actually delivered to the receiving email server. For those readers familiar with SMTP protocol, this means that the bot herder only gets paid once the server has sent 250 Ok after the DATA phase. In order to make a lot of money, bot herders have to send as much as possible in the shortest possible time. If a zombie is being blocked, the bot herder doesn't make any money. The bot herder only makes money when a message is actually received by the receiving email server.

Spamming software is impatient. In programming terms, spamming software has a very low timeout. The SMTP RFC recommends that email servers wait at least three minutes for each chunk of data they send to be received by the receiving server and acknowledged via a TCP acknowledgement packet. Furthermore, the RFC recommends that senders wait at least ten minutes for the final message delivery acknowledgement.

These long timeouts were established because in the early days of the Internet, the infrastructure was slow and unreliable, and the machines were easily overloaded, leading to frequent message delivery delays. Today, email servers and our networks are much faster at processing incoming messages in a matter of seconds. Delays still occur, but the
timeouts defined in the RFC are vastly higher than what is required in today's world.

Because bot herders don't get paid until they receive the 250 Ok, their software earns a higher profit by disconnecting after a few seconds and seeking out new victims whose servers respond more quickly. Bots can't afford to wait for a slow connection to go through, and they can't risk being discovered and put on a blacklist.

A few years ago, the MIT Spam Conference was a very interesting place. Each year, bright-eyed graduate students and intrepid industry types would present new filtering techniques that pushed the accuracy of spam filters to new levels. For the past three years, improvements in spam filter effectiveness has plateaued. A great result is a paper that shows the accuracy improvement of half a percent. Spam filtering has essentially become maxed out as a technology, and there isn't much more we can do but tweak rules to avoid falling behind the spammer's arms race.

Similarly, reputation systems which identify suspicious IP addresses have become asymptotic in their effectiveness. The spread of botnets has led to a virtually inexhaustible supply of new IP addresses, that spam us a few times and then disappear forever. Most of the large anti-spam companies now have comprehensive blacklists that are updated every minute.

In other words, anti-spam systems worldwide are blocking everything they possibly can. And yet spam continues to grow as a problem -- it's unbelievable. So what can we do?

Bill Gates was right in 2004. He boldly posited that the way to solve the spam problem was to introduce a cost barrier that caused spamming to be no longer profitable. Unfortunately, spammers created botnets, which have rendered to them more computing power than most governments. One way to think of the problem is that the spammers have millions of computers. You only have a handful. And you have to pay for yours. Who's going to win? While we can't win the spam war with better filters or better blacklists, there are alternatives.

To deter spamming we must undermine spammers, not simply block messages. You can make botnets unprofitable by slowing down SMTP traffic from spammers. This not only gives the receiver control of each email connection, but it also consumes sender resources to reduce the spammer's sending rate significantly.

Imagine the chaos at an airport without air traffic controllers and you begin to see why mail servers need email traffic control.

NEXT: Post #7 Slowing Things Down
PREVIOUS: Post #5 Why Are Botnets So Difficult To Stop?

Friday, April 18, 2008

Post #5 on Why Spam Filters Suck "trickle blog" series


Why Are Botnets So Difficult To Stop?

Definition: a "botnet" is commonly known as a network of infected computers used to send spam (among other actions).

The largest botnets contain hundred of thousands of "zombie" machines controlled by a "bot herder," who uses sophisticated encryption, infection and peer-to-peer (P2P) networking techniques to ensure the permanence and growth of the botnet. As the zombies are used, they become discovered and subsequently blocked. While individual zombies are constantly changing, the overall botnet and people who control them remain the same.

Because of botnets, spam does not come from a predictable set of computers rather, it comes from all over the place in a completely unpredictable manner. By leveraging the diversity of IP addresses available via botnets, spammers have rendered the blocking approach far less effective than it once was.

Further, as the number of broadband subscribers continues to grow ­ most rapidly in developing economies such as China and Eastern Europe the number of computers available to exploit for participation in botnets is expanding. As botnets increase in size and sophistication, trying to identify where the "bad stuff" is coming from is becoming less and less worthwhile.

Indeed, researchers at Georgia Tech discovered in 2006 in a survey of data from the Spamhaus black list that only 5 per cent of botnet IP addresses ever end up listed in the Spamhaus database. In another paper, the same researchers found that 85 per cent of spam zombies sent fewer than ten email messages to their honeypot server over the course of about 18 months, as shown in the above graph.

Example: A Transient Zombie

In late 2007, the zombie at 201.21.174.207 (a Brazilian broadband subscriber address) began sending approximately three spams each day into one of our honey pot systems. It took 19 days for the first real-time blackhole list (RBL) to identify this IP address and cause it to be blocked. By sending only a very light trickle of email, zombies can evade detection.

While blocking continues to be a core component of the multi-layered anti-spam architecture, it makes little sense in 2008 to depend on filtering technology designed to block spam in 2001 before the advent of botnets. Approaches that seek to block spam fail to deal with the issue of unknown senders.

NEXT: Post #6 Blocking Spam in 2008
PREVIOUS: Post #4 Spamonomics: The Economics of Spamming

Friday, April 11, 2008

Post #4 on Why Spam Filters Suck "trickle blog" series



"Spamonomics": The Economics of Spamming

Spammers earn billions of dollars annually. The business is efficient, hierarchical, and organized. In much the same way that the global trade in narcotics involves every conceivable method of smuggling (from submarines to drug mules), the spam trade employs software engineers to develop increasingly sophisticated delivery technologies. Just as the drug trade will continue until the end of humanity, so too will the illegal delivery of spam.


To understand how spamming has become such an intractable problem, it serves to analyze the economics that drive spamming. Spammers make money if one in every 30,000 recipients makes a purchase. And given this response rate, a spammer advertising pharmaceutical products can expect to make roughly $5,000 per million email messages sent.


Finding out what it costs to send spam is not difficult: Botnet operators advertise their spamming services via online forums. One forum mentioned a price of $100 to send one million spam messages. If we assume that $100 is the cost per million spam messages, and $5,000 is the revenue, then the gross margin from spamming is approximately 98 percent.


Although some spam filters provide better accuracy than others, filter accuracy across the board is approximately 90 per cent, meaning that only one in ten spam messages reach a recipient. If global anti-spam effectiveness could be improved from 90 to 95 per cent, earning $5,000 from spamming would require sending 2 million spam messages, rather than 1 million. This increase in volume would reduce the spammers’ profit margin from 98 per cent to 96 per cent assuming sending costs remained constant. If global anti-spam accuracy reaches 99 per cent -- a figure that experts will tell you is nearly inconceivable given the innovative methods of spammers -- sending costs would reduce spamming margin to 80 per cent. Google is one of the world’s most profitable advertising companies with a margin of 25 per cent -- imagine 80 per cent? This is a business that won’t be going away any time soon.


Before botnets arrived, spammers could be stopped by blocking their IP addresses. DNSBLs like Spamhaus and Habeas block between 60-70%. With the introduction of botnets, blocking no longer provides a sufficient solution to the spam problem.


NEXT: Post #5 Why Are Botnets So Difficult To Stop?

PREVIOUS: Post #3 Final Ultimate Solution to the Spam Problem (FUSSP)

Tuesday, April 8, 2008

Anti-Spam Technology Adoption


In his comments on Post #3 of our trickle blog, TZink notes that Bill Gates would have been right about the 2-year time frame for stopping spam if the computational challenge and sender authentication measures had been widely implemented.

They were probably right. These steps would have worked but didn't because these techniques interfere with legitimate email delivery. Let's take a closer look at the challenges to implementing these and other anti-spam technologies by comparing the adoption of Sender Authentication and Reputation filtering.

To date, Sender Authentication has been limited in its deployment and usefulness by factors related to how legitimate organizations use email.

The first barrier to adoption is inconvenience. If we look at Sender Authentication today, most organizations implement soft-sender authentication. Soft-fail authentication basically says "Our email comes from these specific servers, OR it could come from anywhere else." This is not very different from the unauthenticated model of "Our email comes from anywhere" because I can't reject mail, even if it comes from somewhere other than the authenticated servers.

To enable rejection of messages, organizations should implement hard-fail authentication and state "All email from our domain originates at these servers." This is a good clear statement enabling the rejection of mail coming from elsewhere. Why don't more organizations do this?

One reason appears to be because it's inconvenient for their users if they enforce the use of their servers. Many end users dislike authentication because they must now setup their email client to send only via those servers, rather than change the "from" address and send from anywhere. That this process is perceived as a burden on senders is unclear to me, and it remains one of the barriers to implementation. [Disclosure: Mailchannels sets our records to soft-fail authentication and its unclear to me why]

A second barrier to adoption is the lack of incentive to do it. Unless I'm worried my email will get blocked when I send it, there is little value in configuring authentication for my servers. The value of authentication is mainly derived by the recipient - better and clearer information helps them decide whether or not to receive my email. However, people want to get their mail so authentication is configured on the (pragmatic) assumption that most servers don't have authentication setup. In this case, without authentication data, the recipient will normally receive the mail anyway. Since I can still deliver my messages without setting up authentication, why bother doing so. If measures like this are to be effective, the default action needs to be to enfocement. This would likely penalize most legitimate senders - hence adoption is slow. Yahoo and others have become more aggressive in their requirement for Authentication, the adoption has improved.

Another barrier to the adoption of authentication is that the value in taking the time to authenticate is perceived as low. Knowing that a person is who they claim to be is not in itself helpfulunless there is some measure determining whether or not that person is worth talking to. A driver's license is more useful to a police officer if they can also run your ID through a records search. It's not much good for me to know that yes, you are "Bob". If I want to do something with the information, it's better for me to know you are "Bob the known spammer."

But the number one reason for poor adoption is simple ... authentication on its own is useless for stopping spam.

Sender Authentication is only solves one aspect of email abuse, address spoofing. With SMTP any email can be sent from anywhere claiming to be from anybody. Sender Authentication enables the recipient to check whether a message was sent from a server belonging to the organization it claims to be from. The technique has proven effective against phishing attacks but spammers aren't impersonating anyone so sender authentication doesn't really help. What we get is mail from authenticated spammers.

I hate to be sounding like Ironport but, what is needed is reputation.

Sender Authentication could have stopped spam if everyone (or a large subset of everyone) agreed to register their servers or addresses with some central authority that could clearly identify the legitimate registered senders and be used to allow that mail through and block the rest. But who is going to be that authority? How will it be policed? Where will it operate? Can I trust it? What if there is more than one authority? Can I trust all of them? The internet was designed to avoid this sort of centralized control. It is pretty hard to get that cat back in the bag.

Instead of an agreed authority, what has arisen are third-party reputation systems that came along as an evolution of blacklists. These systems track the history of the senders they see in their traffic. They have been effective against spam because they identify the known bad addresses and block those. They also identify known good senders and allow those messages through. Each of these systems tries to be a central authority for email reputation. However, they don't work well with unknown senders because the senders don't have to register first. The systems don't have enough reputation information to stop the message. Each day, Botnets exploits the fact that it takes time to see a new address, and then give it a reputation score.

Reputation has been widely adopted where Authentication has not. The difference between them in terms of adoption are clear. Reputation does not inconvenience end users. There is incentive to implement reputation because it reduces load on servers. The value is high because it can be used it to make real decisions. Most importantly, it works to reduce a real pain.

In his comments on Post #3 of our trickle blog, TZink notes that Bill Gates would have been right about the 2-year time frame for stopping spam if the computational challenge and sender authentication measures had been widely implemented.

They were probably right. These steps would have worked but didn't because these techniques interfere with legitimate email delivery. Let's take a closer look at the challenges to implementing these and other anti-spam technologies by comparing the adoption of Sender Authentication and Reputation filtering.

To date, Sender Authentication has been limited in its deployment and usefulness by factors related to how legitimate organizations use email.

The first barrier to adoption is inconvenience. If we look at Sender Authentication today, most organizations implement soft-sender authentication. Soft-fail authentication basically says "Our email comes from these specific servers, OR it could come from anywhere else." This is not really very different from the unauthenticated model of "Our email comes from anywhere" because I can't reject mail, even if it comes from somewhere other than the authenticated servers.

To enable rejection of messages, organizations should implement hard-fail authentication and state, "all email from our domain originates at these servers." This is a good clear statement enabling the rejection of mail coming from elsewhere. Why don't more organizations do this?

One reason appears to be because it's inconvenient for their users if they enforce the use of their servers. Many end users dislike authentication because they must now setup their email client to send only via those servers, rather than change the "from" address and send from anywhere. That this process is perceived as a burden on senders is unclear to me, and it remains one of the barriers to implementation. [Disclosure: Mailchannels sets our records to soft-fail authentication and its unclear to me why]

A second barrier to adoption is the lack of incentive to do it. Unless I'm worried my email will get blocked when I send it, there is little value in configuring authentication for my servers. The value of authentication is mainly derived by the recipient - better information clearer information helps them decide whether or not to receive my email. However, people want to get their mail so authentication is configured on the (pragmatic) assumption that most servers don't have authentication setup. In this case, without authentication data the recipient will normally receive the mail anyway. Since I can still deliver my messages without setting up authentication, why bother doing so. If measures like this are to be effective the default action needs to be to enforce it, but that would penalize most legitimate senders - hence adoption is slow. Although, as Yahoo and others have become more aggressive in their requirement for Authentication, the adoption has improved.

Another barrier to the adoption of authentication is that the value is low. Knowing that a person is who they claim to be, is very low unless you have some measure of whether that person is worth talking to. A driver's license is more useful to a police officer if they can also run your ID through a records search. It's not much good for me to know that yes, you are "Bob". If I want to do something with the information it's better for me to know, yes you are "Bob the known spammer".

But the number one reason for poor adoption is simple ... authentication on its own is useless for stopping spam.

Sender Authentication is an only solves one aspect of email abuse, address spoofing. With SMTP any email can be sent from anywhere claiming to be from anybody. Sender Authentication enables the recipient to check whether a message was sent from a server belonging to the organization it claims to be from. The technique has proven effective against phishing attacks but spammers aren't impersonating anyone so sender authentication doesn't really help, what we get is mail from authenticated spammers.

I hate to be sounding like Ironport but what is needed is reputation.

Sender Authentication could have stopped spam if everyone (or a large subset of everyone) agreed to register their servers or addresses with some central authority that could clearly identify the legitimate registered senders and be used to allow that mail through and block the rest. But who is going to be that authority? how will it be policed? where will it operate? can I trust it? What if there is more than one, can I trust all of them? The internet was designed to avoid this sort of centralized control, its pretty hard to get that cat back in the bag.

Instead of an agreed authority, what has arisen are third-party reputations systems that came along as an evolution of blacklists. These systems track the history of the senders they see in their traffic and have been effective against spam because they identify the known bad addresses and block those, and identify known good senders and allow messages from those through. Each one of these systems tries to be a central authority for email reputation, but they don't work well with unknown senders because the senders don't have to register first and the systems don't have enough reputation information to stop the message. Every day, Botnets exploit the weakness that it takes time to see a new address and give it a reputation score.

Reputation has been widely adopted where Authentication has not. The difference between them in terms of adoption are clear. Reputation does not inconvenience end users, there is incentive to implement because it reduces load on my servers, the value is high because I can use it to make real decisions and most importantly it works to reduce a real pain.

Tuesday, November 20, 2007

USENIX LISA Conference Report

I had the pleasure of speaking at the USENIX LISA conference last week in Dallas. My talk was entitled, "Using Throttling and Traffic Shaping to Combat Botnet Spam".

USENIX LISA is the annual conference for sysadmins of large systems (i.e. networks having more than 1,000 end users). LISA is a great conference: there's almost no marketing and sales presence, and the technical sessions are truly hands-on, if not entertaining. The BoFs (bird of a feather sessions) are like little nerd parties, and continue well into the night after the main conference is done.

About 100 people showed up to watch my talk (video will be available soon), which started with a brief history of spamming, then took the audience through my theory of spammer economics, and finished with some stats porn showing how well throttling works to get rid of botnet spam. Some of the more interesting statistics I presented were analyses of the make-up and behavior of zombies from the Storm botnet.

One of the unique things we do with Traffic Control is to track the operating system type email senders. We track operating system type using a technique known as passive OS fingerprinting. Another thing we do is to track the ability of different senders to actually deliver email through to end user recipients. By correlating the delivery success rate with the operating system type, we can draw some interesting conclusions about email senders, based on their operating system type.



The chart above summarizes the operating systems of email senders that are successfully able to delivery email through Traffic Control. Or, in other words, this chart summarizes the operating systems which are sending mostly good email -- because good email has a high chance of being delivered to end users. As you can see, Linux hosts do very well at delivering email. They are tolerant of throttling, generally have a good reputation, and rarely send spam. It's fair to say that a large proportion of the world's legitimate email servers are therefore running Linux.



The second chart summarizes the operating systems of email senders that are not successful at delivering through Traffic Control. They have a poor reputation that causes them to be blocked or severely throttled; or, they send spam which is blocked by downstream filters. In either case, they aren't very good at getting their messages delivered to end users. The bulk of these senders are running Windows. This matches with our understanding that the majority of spam originates from Windows machines which are participating in botnets.

I'll post more about USENIX LISA later. For now, please comment if you have questions.