iRi

Note: If you're arriving from one of the many links who think I'm underestimating the power of personalization, please see my rebuttal (now with a working link!). Personalization won't work either, and it's nowhere near as powerful as people seem to think.

Recently, a relatively new idea for filtering spam has surfaced: Bayesian classification of e-mail, or at least Bayesian-inspired analysis. This seems to have been recently been brought to the Internet community's attention by Paul Graham in his essay A Plan for Spam, though I know he's not the first to think of it: For instance, here's a programming assignment given at the University of California, Irvine's Information and Computer Science department in Dec. 1999. That the idea was not popular until recently is probably a direct consequence of the fact that since 1999, the war on e-mail spam has been victory Spammers at every turn. The need for bigger guns is now more acutely felt then in 1999.

The stupidity of computers has become a bit of a minor running theme on this weblog over the past few weeks (and I've got another post on that topic on tap, waiting for Monday), so I couldn't resist posting this news from Slashdot. The conversational bot A.L.I.C.E., winner of the Loebner Prize in 2000 and 2001 for most human-like conversation bot, was hooked up to itself and this is the result. Surprise surprise, very stupid conversation results, especially considered on the semantic level.

Addendum to my previous education posts: In general, the best way to learn anything is to simply jump in, do some wild and crazy stuff, make mistakes, get quick, accurate feedback about how well you are doing, and benefit from the previous experience of others in the environment. This goes for both humans and computers, and is essentially true in all environments.

It is in theory possible to learn without direct interaction with the environment, but the learning rate takes a major hit, a minimum of a factor of three to five slow down. Again, this applies to both humans and computers, across all skills. One of the few interesting unifications of AI and human psychology has been a partial empirical theory of learning. This is about the only result it has, but it's worth knowing.

Warning... the following is going to be a very, very "bloggy" entry. Basically, this post has no thesis, because I'm not sure what I'd want to say.

First, today (Nov. 13) is my 24^th birthday. This doesn't directly relate to the rest of this post, but it might put an interesting spin on it.

This Monday, the good Doc posted a link to The Underground History of American Education: An Angry Look at Modern Schooling by John Taylor Gatto. I've only gotten to chapter two in the online book, but I've seen what he has to say before, in shorter form. You may want to read at least the first chapter for the rest of this to make sense.

Nearly all political elections in the United States are plurality votes, in which each voter selects a single candidate, and the candidate with the most votes wins. Yet voting theorists argue that plurality voting is one of the worst of all possible choices.... Unlike these procedures [described in the elided section], the plurality system looks only at a voter's top choice. By ignoring how voters might rank the other candidates, it opens the floodgates to unsettling, paradoxical results.

Slashdot had a story about this article. There is something in this article worth highlighting, though, which is the section called "No One's Perfect", which references something called Arrow's Theorum, which shows that no voting system can be perfect.

It's pretty pointless for me to go on about how I feel about the Microsoft ruling. But I feel obligated to nonetheless at least register as One More Coder who thinks this is complete bullshit, so that my silence is not interpreted as assent. I'll leave writing the actual opinions to two people who have already done a better job then I could hope to do anytime soon. One, James Grimmelmann on LawMeme, and two, John Robb with both of his points regarding the case.

In my Human Justice for Human Beings essay, I used as an example of automated law enforcement the idea that somebody could today take satellite imagery, and write a program that would attempt to detect when people do things to wetlands that they are not supposed to do, such as fill them in, or dredge them out, or drain them, etc.

Well, I still don't know if that's happening, but something similar enough to it is happening that I feel justified in claiming that the example is now firmly grounded in reality. The Mercury News reports on a project to photograph the coast of California to look for illegal sea walls. It doesn't use computers to process the photos in any sort of automated fashion, but does take advantage of computer networks to allow the problem to be conveniently partitioned amoung any interested people, which counts as something difficult to do without computers, easy to do with. I even got the "environmental" aspect right. ;-)

Mark expands on a couple of comments I made with regards to the recent beginning of people spamming comments sections of websites. Apparently the weblog community recently passed some sort of critical mass that makes it worth spamming.

Mark, if you read this, I think for now the only "Lojack" solution that will be feasible in the short-to-medium term is the one I proposed in my second comment, which is to let the web site owner easily review all recently posted comment and easily delete offending ones, in combination with a generalized rate-constraining scheme to ensure the user never has to filter through 3000 messages at a time. If enough comment tool authors do this, and enough of the comment tool users are proactive in deleting the spam (which is easily imaginable), it may (emphasize may) deter the spammers from working too hard to deface the comment sections, since unlike email spam, the spammers desired result is that these spams stay there indefinately, so that people (or search engines!) can see them.

As per Phil's observation people are starting to spam weblog comments, I've disabled the comments here. I think maybe a sum total of 10 comments have been posted anyhow, none of them terribly importent. It's not worth the exposure.

I am planning on creating the RU Freenet syndication tool sometime this week. One of the things I learned was that Freenet was soon going to .5, so I decided to wait until after that, since just between my last Freenet exploration post and today there have been 4 seperate releases of Freenet. I wanted to wait for a bit more stability then that.

I put all the pieces I want to use together, and now it's just a matter of assembling them. It shouldn't take that long, I just need the time to code it.

Spam Filtering's Last Stand

Computers Still Stupid

More on Learning

Belabored Birthday Brain Baring

More voting theory (this time mathematical)

LawMeme on the Microsoft Decree, me on voting theory

Human Justice redux

Mark expands on comment spam

Comments removed

Freenet syndication update