As y'all probably know I've been interested in Bayesian filtering. I've been using the Mozilla 1.3 implementation, because even though I don't think it's going to work in the long term I figure you ought to get it while the getting's good.
Plus side, it's been pretty good, with easily a 97%+ success rate on both correct positives and correct negatives.
The downside is that the false positives have been pretty bad.
One major outcome of this project is the discovery that simple tree matching algorithms don't work.
I tried an algorithm where I took the three best nodes from a tree and compared that to a database of the three best nodes from the collected text works. The expectation is that the numerical value of each sense represents meaning spread out across many words and would to some degree adequately represent the "
An idea I had for my next blog-style thing, since iRights is within a few months of basically wrapping up.
I think it would be great to have a blog-like thing that tracks predictions: Who makes them, when they make them, and whether or not they come true. Kinda like Long Bets, but tracking anyone who makes a prediction at all, and no money; just reputation points. Also, rather then waiting for people to enter them, we record predictions of people who may not even be aware of the site at all.
I thought this was really touching (via InstaPundit): A captured Iraqi colonel being held in one of the hangars listened in astonishment as his information minister praised Republican Guard soldiers for recapturing the airport. He looked at his captors and, as he realised that what he had heard was palpably untrue, his eye filled with tears. Turning to a translator, he asked: "How long have they been lying like this?
The Justice Department lifted a requirement Monday that the FBI ensure the accuracy and timeliness of information about criminals and crime victims before adding it to the country's most comprehensive law enforcement database. The system, run by the FBI's National Crime Information Center, includes data about terrorists, fugitives, warrants, people missing, gang members and stolen vehicles, guns or boats. [Privacy Digest] I submit to you that this is actually a good thing, or at least will be in the long run.
Who can avoid talking about the war? My feeling on the war is now on record, I suppose (support, contingent on dedicated and sincere attempt to reconstruct Iraq; if it self-destructs it should be despite our efforts, not because of them), but I wanted to comment on a couple of criticisms that I feel are either disingenuous or invalid.
"Bush (and by extension the administration) is stupid." - No. The administration may be wrong, corrupt, or a wide variety of other things, but it is not stupid.
In reaction to my previous post, Rafe posts an update... My intention was to talk a bit about how Perl and Java differ, not explain how one should construct a program. Yes, and I was one meta above that. I should have made it more clear we were discussing different levels.
He also says that you should avoid Perl's idioms, but then the question I ask is whether you should use Perl at all?
Rafe Colburn explains why Algol-like languages are far superior to Perl for working on large scale, multi-programmer, long-term projects. I'd go further. If you use an outliner to edit your source code, his multi-line Java example shrinks down to one line, just like his Perl example. If you don't program in an outliner I'm sure you have no idea what I just said. If you do, you're probably chortling and guffawing and pointing at the screen saying "
Please ignore if you're not into computing theory.
Says Den Beste: My original statement was that a single Turing machine cannot perfectly simulate a system which consists of two Turing machines such that the ratio of the clock rates for those two Turing machines is a transcendental number.... [attempted proof clipped] ... Am I right? ... Can my proposed idea be restated in a way which truly does make it uncomputable?
When do you know that you've got the right answer to a programming conundrum? When the answer means you delete lots of code, and the final product is more efficient, more flexible, and more robust. I love programming.