06/21/2004

One of my clients got hit with a denial-of-service attack on Saturday, and their web site was terribly sluggish for a few hours as the attacker took most of the bandwidth. I was able to make some adjustments on the server and brings things back to life, and today the client asked me if I could do some forensic work to figure out who it was and how they might combat the problem more effectively in the future.

So I sat down this morning to figure out how to do it, and basically had a huge log file. It contained all the web hits for the week, and was half a gigabyte in size. There were millions of hits in the file, and from that haystack I had to find the needles that pointed to the guy (or guys) responsible for the attack.

In about an hour I’d sifted through the data, pinned down about half a dozen addresses who were obviously working together on the attack, and tracked the culprit to China. I doubt anything will ever come of it, but I was able to provide my client with a lot of information that’ll help them determine if it was one of their resellers (or some disgruntled user) trying to get some sloppy revenge.

Afterward, it occurred to me to consider what I might have done if I didn’t have a Linux system at my disposal. Assuming (for the sake of argument) that I had this same 500 MB log file with millions of entries, how would I have sorted through it using a Windows system? I needed to do things like grab IP addresses off lines, count the number of unique addresses, figure out which ones were making more requests to the server over a short time, and so on. Then I had to do lookups on domain records to track the addresses to an ISP, and so forth.

My conclusion was that I have no idea how a hard-core Windows user would’ve been able to do what I did– at least in the short time I did it. I suppose I could’ve imported the data into SQL Server and run some queries on it, but that would’ve been awkward and time-consuming. I wanted to process millions of lines of text in a quick fashion. Unix command-line tools were perfect for the job; what Windows GUI tools would’ve worked?

Contrary to my usual demeanor, I’m not Windows-bashing here. Rather, I’m trying to understand what a Windows web administrator in my shoes would’ve done. It’s an interesting problem. I suppose if I’d been using IIS (again, for argument’s sake) there might be some built-in log analysis tools in the IIS Management Console, but I doubt it. Moreover, processing the sheer volume of data I had presented a whole set of issues; it’s not like I could just import everything into Excel and do some sorting.

So all in all, I remain a happy Linux user and wonder how those poor souls using Microsoft products manage to get things done effectively and efficiently…