I had the pleasure of speaking at the USENIX LISA conference last week in Dallas. My talk was entitled, "Using Throttling and Traffic Shaping to Combat Botnet Spam".
USENIX LISA is the annual conference for sysadmins of large systems (i.e. networks having more than 1,000 end users). LISA is a great conference: there's almost no marketing and sales presence, and the technical sessions are truly hands-on, if not entertaining. The BoFs (bird of a feather sessions) are like little nerd parties, and continue well into the night after the main conference is done.
About 100 people showed up to watch my talk (video will be available soon), which started with a brief history of spamming, then took the audience through my theory of spammer economics, and finished with some stats porn showing how well throttling works to get rid of botnet spam. Some of the more interesting statistics I presented were analyses of the make-up and behavior of zombies from the Storm botnet.
One of the unique things we do with Traffic Control is to track the operating system type email senders. We track operating system type using a technique known as passive OS fingerprinting. Another thing we do is to track the ability of different senders to actually deliver email through to end user recipients. By correlating the delivery success rate with the operating system type, we can draw some interesting conclusions about email senders, based on their operating system type.
The chart above summarizes the operating systems of email senders that are successfully able to delivery email through Traffic Control. Or, in other words, this chart summarizes the operating systems which are sending mostly good email -- because good email has a high chance of being delivered to end users. As you can see, Linux hosts do very well at delivering email. They are tolerant of throttling, generally have a good reputation, and rarely send spam. It's fair to say that a large proportion of the world's legitimate email servers are therefore running Linux.
The second chart summarizes the operating systems of email senders that are not successful at delivering through Traffic Control. They have a poor reputation that causes them to be blocked or severely throttled; or, they send spam which is blocked by downstream filters. In either case, they aren't very good at getting their messages delivered to end users. The bulk of these senders are running Windows. This matches with our understanding that the majority of spam originates from Windows machines which are participating in botnets.
I'll post more about USENIX LISA later. For now, please comment if you have questions.
Tuesday, November 20, 2007
USENIX LISA Conference Report
Subscribe to:
Post Comments (Atom)








3 comments:
I suggest there's a fundamental error in your approach.
Passive OS fingerprinting takes a pile of IP data, then attempts to guess an O/S from it, the result of which you're proposing to use for spam filtering. This is bad, because the O/S "guess" step discards a lot of IP data. It is better to skip that step completely - just use the raw IP data to feed your spam filter. You will find it's massive amounts more accurate, often flagging 100% spam with 0% false-positives (eg: "owned" embedded linux routers send 100% spam).
Thank you for the feedback. I should point out that we are not suggesting that OS fingerprinting should be used as the sole indicator of a spam message rather it can be used as part of a multi-tiered approach. You pointed out that if a source is producing 100% spam it could be blocked based by IP and that's true and the reason we use RBL's as a first line of defense. OS fingerprinting is far from 100% accurate. However, based on the guess along with several other inputs we can make a decision to throttle a connection which will not hinder a legitimate sender but forces a spammer to give up.
Hi David - I said "IP Data" - not IP "Address" - I'm talking about the Internet-Protocol packet "variables" - you will find, if you analyze all the different incoming *variables* (as in **NOT** the O/S those variables map to), that you *can* (if you want) reliably detect 100% spam, and safely block all this with zero false positives.
I've done this using RedHat, a custom TCPDump script, sendmail with a custom milter, and Brightmail to provide my spam flags. There are a number of different "sending fingerprints" that produce 100% spam (and conversly - there's a number that produce 100% non-spam).
I see you do understand the fundamental error I mentioned - you wrote: "OS fingerprinting is far from 100% accurate" - that is *exactly* why you should not convert the fingerprint data into an O/S before you use it - you are destroying the valuable IP-protocol information that you need to accurately detect spam when you do this.
Post a Comment