09/19/2009

Spam is so irritating.

I run my own mail server– actually, I run mail for hundreds of people– and as a result I have to do everything I can to block spam to them, without compromising legitimate messages. So I have three layers of protection:

First, there’s “greylisting”, which tells the mail server sending the message that the receiving server (mine) is temporarily unavailable. It’s kind of a clumsy trick, but it works because spam software will only attempt to send a message once before giving up. A bona fide mail server, on the other hand, will wait a few minutes and re-send the message. The second time, my server lets it through– and remembers the sender, so the next time the mail gets through immediately. Although this seems really simple, it’s terrifically effective and probably blocks at least half of the inbound spam.

Then there’s “blacklisting”, which checks the source of the message (every mail server includes information about itself in the message header) against a list of known spammers. There are hundreds of thousands of known servers out there, so if an incoming message matches, it’s stopped dead.

Finally there’s a heuristic content filter that actually “reads” the incoming message and looks for key words (“Viagra”, “enhancement”, “free mortgage quote”, that sort of thing). It also checks for suspicious headers, lots of images with little text, and other things that spam tends to have. If there’s enough funny stuff going on, the filter deletes the message.

So all of that happens before the mail even gets to the recipient’s mailbox. I find that it’s pretty effective, probably blocking 90% of the incoming spam. Of course a little bit still trickles through. The problem is that spam is such a flood– possibly accounting for more than 90% of mail traffic on the entire internet– that even a trickle means my customers are getting a handful of spam messages every day.

Since I’ve had some of my email addresses for a decade, there’s been plenty of time for them to show up on various spam lists. As a result I probably receive more spam than the average user. So I put another filter in place on my own mailbox, and that filter is the most awesome of all. I “train” it by sending it examples of spam messages that got past, as well as “ham” (legitimate) messages that I want to receive. It remembers words and phrases from each type of message, and over time it “learns” what I consider to be spam versus what I tend to want to receive. Amazing stuff, really, but the downside is it’s sort of a personalized spam filter because if I used the same rules for one of my customer’s mailboxes, it may fail catastrophically. My customers probably don’t get quite the same mix of web programming, Linux user group, ultimate frisbee, and Facebook invitations that I do.

So here’s my latest spam vs ham database:

[fixed:The information shown below is an analysis of your spam database.

Histogram
score   count  pct  histogram
0.00    31884 38.67 ####################################
0.05      133  0.16 #
0.10      182  0.22 #
0.15      265  0.32 #
0.20      331  0.40 #
0.25      321  0.39 #
0.30      316  0.38 #
0.35      524  0.64 #
0.40      501  0.61 #
0.45      256  0.31 #
0.50      540  0.66 #
0.55      288  0.35 #
0.60      770  0.93 #
0.65      360  0.44 #
0.70      232  0.28 #
0.75     1440  1.75 ##
0.80      330  0.40 #
0.85      671  0.81 #
0.90      468  0.57 #
0.95    42629 51.71 ################################################
tot     82441
hapaxes:  ham   20710 (25.12%), spam   34438 (41.77%)
   pure:  ham   31808 (38.58%), spam   42439 (51.48%)]

I’ve sent it over 82,000 messages to chew on, and of those, almost 43,000 contain words that are “100% spam”– meaning the words in those messages only appear in other spam messages (at least as I categorize them). And about 32,000 words are pure ham– meaning they’re terms that my spam just doesn’t contain. There’s the fuzzy area between, where words sometimes appear in spam and sometimes in ham.

But the filter is pretty smart, so when a new message comes in, it looks at all of the words, compares them to its dictionary, and assigns a score to the message. If the score is greater than some threshold I define, the message is probably spam and it’s dropped. Good riddance.

After training this puppy, I was amazed at how effective it is. I went from probably a few hundred spam messages a day (ugh) to maybe half a dozen. Sweet!

Out of curiosity, I checked the performance of this filter over the past three months. From June 18 to today, I’ve received 50,208 email messages. Remember this number is after the initial three spam filters have been applied– in reality there have probably been close to half a million messages sent to my mailboxes in that period. Yikes.

Of those that were handled by the filter, 26,299 were spam. Doing the math, that’s 300 junk messages per day. And it means 23,909 legitimate messages were sent, or about 270 messages per day.

Wow, that’s a lot of email. I sure feel loved.

09/18/2009

Kyra’s pet gerbil Pumpkin died tonight.

She was a great little pet, and had been with us for almost two years (since Christmas ’07). Even I’m a little sad, and I didn’t really want to get her at all in the first place. But she kind of grew on me.

09/18/2009

Alex gets a gold star for doing a good deed today.

He found a nice Verizon Storm phone on the way home from school today, so we poked around in the contact book a bit and found a person labeled “Babe” with a picture of a pretty woman in a bridal gown. Assuming it was his wife, I dialed the number. She picked up immediately and said in a chipper voice, “Hey honey, I’m on the way home now. How was your day?”

I chuckled a bit and said I wasn’t her husband. She got a little nervous all of a sudden and asked who the heck I was. I explained the situation and then she laughed and said how relieved she was that we found the phone.

As it turns out, they live in our subdivision a block away, so Alex and I went over to their house and left the phone outside the front door.

It feels good to do something nice for a total stranger.

09/10/2009

The Hubble Telescope had another repair session, and the shots it’s returning from the depths of space are as stunning as ever. Behold NGC 6302:

Beautiful stuff. Nature can be so breathtaking.

09/08/2009

From a Slashdot discussion about why America has lost “the edge” in innovation and technology:

Hobbies and passions, such as developing aluminum electrolysis in a backyard in Oberlin, Ohio, or airplanes in a field in Dayton, Ohio by bicycle repair men, are a thing of the past. We don’t have backyards anymore, and the DHS descends on you if you try to do anything in it, such as aluminum or flying. Everything requires a permit. Permit to attempt to fly. Permit to electrolyze aluminum. And the police are holding a straitjacket at the appeals session in court waiting for the verdict from the jury of twelve deliberating the testimony of psychologist witnesses pushing drug company agenda about mental illnesses.

09/08/2009

Wow, according to Microsoft’s Windows 7 literature, Linux sure sucks.

I can’t believe I’ve been using this dang Linux thingamabob for ten years now when it doesn’t let me share digital media in my house, connect to my two digital cameras, let me put music on my iPod, print anything, or even connect to the internet (WAN).

Oh, wait. It does.

09/07/2009

Alex has a programming class in school, and they’re using a program called Chipmunk BASIC. He asked me today if I could install it on his computer, so I poked around the internet and found a version that runs on Linux. To test it, I installed it here on my desktop and gave it a go.

Chipmunk BASIC v3.6.4(b8)
>10 for i = 0 to 10
>20 print i
>30 next
>run
0
1
2
3
4
5
6
7
8
9
10

Sweet, I actually remember how to program in BASIC!

That takes me back to the Good Old Days of 1980 or so, when I wrote my first program (for a second-grade enriched studies class). It presented ten multiple-choice questions about Saturn’s moon Mimas, allowing the user to answer them and then scoring the answers.

Hello world!

09/04/2009

Whee, I just spent 45 minutes being routed through Washington Mutual and Chase phone systems hoping to reset my stupid online banking security questions. After all of that nothing had changed– no one could help me, and everyone told me to call someone else, and the last gal gave me the first number I’d called, so I realized it was just a big cruel joke. I’m never going to get my security questions reset, and they’ll take my money and probably cancel all my credit cards as well.

I was griping to Laralee about it, and said, “No one really likes banks. They’re like…” To which she replied “porta-potties”.

Yes, that’s exactly it. Banks are like porta-potties: no one likes them, but everyone uses them because they have to. Argh.