iRi

Still getting sporadic emails about my Bayes spam predictions. Thanks to those who emailed for staying civil; I expected a bit more flamage. People are still understandably skeptical (can't blame them, I'd be skeptical too). So, here's another thing I'll do: When the SpamBayes project actually releases files officially, I'll take that and implement the attack I mentioned, to see if I can make it work. Exactly what I'll do if I can remains to be seen; I sure as *#&$ won't just release the code, though!

[Google Top Stories] I generally stay away from international jurisdiction stories, considering them relatively uninteresting problems that must be solved through diplomacy, like any other national conflict. This seems wrong; even if the case is heard in Melbourne it should be judged under USA laws, since the article was written in America. OK, I admit I just wanted to post an article from the Google feed to see how it looks.

Washington Post: ...under authority it already has or is asserting in court cases, the administration, with approval of the special Foreign Intelligence Surveillance Court, could order a clandestine search of a U.S. citizen's home and, based on the information gathered, secretly declare the citizen an enemy combatant, to be held indefinitely at a U.S. military base. Courts would have very limited authority to second-guess the detention, to the extent that they were aware of it.

Federal prosecutors have arrested three men involved in what officials are calling the largest identity fraud case in American history.... Cummings would then use the ruse of "helping" the customers work through software and hardware problems to obtain the customer code that allowed the company to request credit records. This is like a textbook case on why privacy issues are so importent. There is no such thing as "a company"

Matt Haughey: "The [SpamAssassian] arms race has officially begun." [Scripting News] I'm reading between the lines here based on scanty hints (based on the remarkable uniformity of spammer's arguments that they are doing a good thing), because I'm not intimately familiar with the world of spammers, but the biggest spammers seem to talk to each other fairly regularly. If one of them has figured out how to do it, rest assured that it is not long before they all know how.

Two people have now expressed the opinion that I am underestimating the advantage of personalization, and the power of statistics. It's worth replying to, I suppose, since that's two out of three. You can see why I left this out of an already-long weblog post before. First, I have built Bayesian filters before, so I'm not completely ignorant about how they work. I'm not an expert, but then, right now I don't know that anyone is, since there's a lot of application-specific tuning that must be done.

Note: If you're arriving from one of the many links who think I'm underestimating the power of personalization, please see my rebuttal (now with a working link!). Personalization won't work either, and it's nowhere near as powerful as people seem to think. Recently, a relatively new idea for filtering spam has surfaced: Bayesian classification of e-mail, or at least Bayesian-inspired analysis. This seems to have been recently been brought to the Internet community's attention by Paul Graham in his essay A Plan for Spam, though I know he's not the first to think of it: For instance, here's a programming assignment given at the University of California, Irvine's Information and Computer Science department in Dec.

The stupidity of computers has become a bit of a minor running theme on this weblog over the past few weeks (and I've got another post on that topic on tap, waiting for Monday), so I couldn't resist posting this news from Slashdot. The conversational bot A.L.I.C.E., winner of the Loebner Prize in 2000 and 2001 for most human-like conversation bot, was hooked up to itself and this is the result.

Addendum to my previous education posts: In general, the best way to learn anything is to simply jump in, do some wild and crazy stuff, make mistakes, get quick, accurate feedback about how well you are doing, and benefit from the previous experience of others in the environment. This goes for both humans and computers, and is essentially true in all environments. It is in theory possible to learn without direct interaction with the environment, but the learning rate takes a major hit, a minimum of a factor of three to five slow down.

Warning... the following is going to be a very, very "bloggy" entry. Basically, this post has no thesis, because I'm not sure what I'd want to say. First, today (Nov. 13) is my 24th birthday. This doesn't directly relate to the rest of this post, but it might put an interesting spin on it. This Monday, the good Doc posted a link to The Underground History of American Education: An Angry Look at Modern Schooling by John Taylor Gatto.

Bayes spam: More promises

Court makes landmark ruling in web defamation case

Homeland Security from Doc

Cops Bust Massive ID Theft Ring

Arms Race

Spam Filtering's Last Stand, Part Two

Spam Filtering's Last Stand

Computers Still Stupid

More on Learning

Belabored Birthday Brain Baring