Friday, June 20, 2008

Post #10 on Why Spam Filters Suck "trickle blog" series

Challenges with Throttling

Slowing traffic from spammers works well to decrease volume and contain infrastructure costs. It allows you to deal effectively with the large proportion of senders that are not yet listed in any blacklist by making spammers give up. The problem with slowing down spammers is that it increases the number of TCP connections to your email server.

One customer dealt with 100 connections at a time, but after traffic shaping, now sees upward of 1000 concurrent connections. This ten-fold increase in number of connections utterly destroys most email servers. To illustrate this problem, consider that it takes between two seconds to deliver an email message under normal circumstances. Slowing down a spam zombie causes the connection to last an average of 40 seconds. If a significant proportion of connections are lasting 30 times longer than normal, then the number of connections you have going on at any one time grows.



The graph above shows the number of SMTP connections being handled by a single server at a large university in the New York area. Noteworthy is that the number of concurrent connections hovers around 500. The red line represents the total number of connections. The green line indicates the number of connections that our traffic shaping is choosing to slow down.

Administrators running Sendmail or Postfix will note that 500 concurrent connections is a large number. The amount of memory required to handle 500 concurrent Sendmail processes, plus any associated spam filtering processes, is considerable. If we were passing this number of connections through to Sendmail, the email server would almost certainly become overloaded.

One approach to improve the scalability of email systems is to completely redesign your email server with a new highly scalable software architecture. But re-designing the email server is difficult, and changing the email system is a large commitment. An asymmetric SMTP proxy that we call real-time SMTP Multiplexing is designed to solve the scalability challenge posed by traffic shaping. The proxy accepts thousands of connections from the Internet and then multiplexes these connections onto a much smaller pool of connections with the existing email server. Unlike an email server, the Multiplexing proxy doesn't save messages to disk, which means it is a lot less complex and also doesn't consume much in the way of system resources.



This graph shows the number of connections to the email server of the large university mentioned previously. The red line indicates that the average number of connections with the email server hovers around 50, which is well within the amount a typical email server can handle. By multiplexing the SMTP connections, the system can achieve a 5:1 or 10:1 reduction in the number of connections the email server has to deal with. Moreover, reducing the concurrency of connections the email server has to deal with, enables the ability to reduce a large proportion of the incoming connections, getting rid of a great deal of spam traffic in the process.

To get updates, subscribe to the RSS feed (unsubscribe at any time).

Related Posts:

3 comments:

Manoj said...

Hey,
Multiplexing is already in use brother. Its called Load-balancers in all major email-providers. (What we see when we check for Mail-Exchangers of a particular domain)

Ken Simpson said...

Manoj,
What we're talking about here is not load balancing of inbound SMTP connections, which is certainly something that larger service providers do to allow more than one mail server to service a single IP address. We're talking about multiplexing between Traffic Control, which uses highly memory and CPU efficient asynchronous IO, and the back-end MTA, which uses heavy-weight processes to receive SMTP connections.

David Cawley said...

As an example, to handle 10,000 concurrent connections by load balancing, it could require 20 machines each handling ~500 connections. Sure, it works but this post points out that it's possible to multiplex 10,000 connections on one machine when traffic shaping connections.