Trends When “Reputation Filtering” Fails, How to Deal with “the Unknown” By Ken Simpson | 6 minute read As discussed in a previous post, spammers are increasingly sending spam using legitimate email servers operated by free email providers such as gmail, Hotmail, and Yahoo!. While our data indicates that botnets remain the largest source of spam by a wide margin, the spam from legitimate servers nonetheless presents a difficult and significant challenge for so-called “reputation filtering” techniques. When spammers abuse otherwise legitimate mail servers (such as those operated by Google, Yahoo, and Hotmail), the reputation of these servers declines. However, since a great deal of legitimate email is also sent from these servers, their reputation is rarely bad enough to be safely placed in the “known bad” category. In MailChannels’ reputation system, legitimate servers like this fall into a grey area that we call, rather ominously, “the unknown.” The unknown also includes servers about which we known nothing, or very little. By increasing their use of legitimate mail servers, spammers are increasing the proportion of spam that originates from “the unknown” while consequently reducing the proportion that originates from “known bad” sources. As the volume of spam originating from “the unknown” grows, the effectiveness of spam filters that rely on blocking traffic from “known bad” sources will degrade. Since most anti spam filters rely on blocking for a substantial (typically 70% or higher) portion of their overall effectiveness, successful spam fighting will soon require either a large increase in filter accuracy (to get rid of the increased amount of spam making it past blocking systems), or something completely different. But writing better filters is difficult. As we have discussed in our ongoing series of posts on “Why Spam Filters Suck,” spam filtering requires substantial computational resources — not to mention requiring the receipt and storage of email messages so that they can be scanned in the first place, and then more resources to archive filtered messages in a quarantine for later review by an unappreciative user base. For these reasons, our preference is to reduce the need for filtering, and more to place more emphasis on intelligently dealing with traffic from the unknown. Our approach to the unknown is simple: slow it down. Slowing down connections from mail servers in which we have an “unknown” (or perhaps “ambiguous”) trust level allows legitimate traffic to continue flowing, while limiting the potential for abuse. It’s not perfect, but hopefully with time the unknown will either clean up their act (i.e. become “known good”), or decline to the point that they can be blocked. Until a clear reputation emerges, slowing down traffic is really the best we can do. There are different ways to slow down email traffic, but as with the Jedi, we believe there is only one true way. MailChannels’ flavor of slowing down is called traffic shaping. Traffic shaping restricts the bandwidth available to connections from “the unknown” by altering the characteristics of the TCP connection and slowing responses to SMTP commands. The effect of traffic shaping is akin to simulating a very poor network connection, such as the connection you might get if you were using an old acoustic-coupled analog modem such as the one shown here. Anti-spam vendors often mis-use the term traffic shaping, confusing it with simple techniques such as “rate limiting” (which involves reduces the number of new connections a sender is permitted to make in a given time period). Very few vendors actually offer traffic shaping, but most claim they do. Caveat emptor. To further the confusion, traffic shaping in the context of SMTP connections is sometimes referred to as “tar-pitting“. I’ll leave the Wikipedia perusal as an exercise for the reader. MailChannels Traffic Control can safely shape SMTP connections down to as little as one byte per second. We track the effectiveness of Traffic Control around the clock via our reputation feedback service, and have collected substantial evidence that suggests that 90-97% of botnet originated spam disappears when connections are slowed to this extent. In fact, our data shows that fewer than 10% of botnet connections remain active after just 10 seconds of traffic shaping. Traffic shaping has a dramatic effect on spam volume and consequently on overall mail system loads, because most of the spam never makes it into the filter. As if guided by Adam Smith’s invisible hand, spambots are drawn to easier targets that will accept email quickly, and thus move on before completing message delivery. Of course, Google’s outbound mail servers are not operated by spammers. Most of Google’s email users are legitimate, and therefore expect that Google’s server will patiently persist as instructed to by the SMTP standard until their email gets through. So when spammers start sending via Google’s gmail service, their spam gets the red-carpet treatment: patient outbound servers that will wait until even a very slow email server has finished receiving their payload. Indeed, our data shows that legitimate sending servers will wait almost without exception a minimum of two minutes for message delivery to complete. Most will wait at least five minutes, and the standard recommends 10 minutes. That’s an eternity when compared with the premature disconnection behavior of most spambots. So is all lost for the effectiveness of traffic shaping?No, for a couple of reasons. First, legitimate senders such as Google, Yahoo, and Hotmail work very hard to kick spammers off of their networks. They do this to avoid having their mail servers traffic shaped or blocked – a situation which causes message delivery problems for their users and therefore severely degrades the usefulness and profitability of their email services. As spammers attempt to abuse these services more, we expect that they will place greater requirements on users to establish their own positive reputation before being allowed to send more than a very small amount of email. Second, even though legitimate email servers are far more likely to complete message delivery over a traffic shaped connection, message delivery is nonetheless delayed by several minutes. This delay might not sound like much of an advantage to the receiver (delaying message delivery by a few minutes), but the advantage this delay provides to spam filters is substantial. Allow me to explain. These days, spam filters almost universally rely on constant, around-the-clock updates from centralized databases that track the latest spam campaigns. The frequency of these updates varies greatly between vendors, with the very best vendors aiming for a two minute update frequency, and typical update frequency probably averaging around ten minutes. By delaying the receipt of a spam message until the filter has been updated to recognize and reject that message, we gain a substantial filter effectiveness boost. But what about truly legitimate traffic that is slowed down? Our position is that delays of a few minutes are seldom noticed by Internet users. If a few minutes’ delay is the cost we must bear to gain a substantial improvement in spam filtering, most Internet users are willing to be patient — at least, more patient than the spammers.