Bayesian Experiences

As y'all probably know I've been interested in Bayesian filtering. I've been using the Mozilla 1.3 implementation, because even though I don't think it's going to work in the long term I figure you ought to get it while the getting's good.

Plus side, it's been pretty good, with easily a 97%+ success rate on both correct positives and correct negatives.

The downside is that the false positives have been pretty bad. I cleaned out my spam folder today and here's what got thrown into the spam folder:

Failure of Simple Tree Matching

One major outcome of this project is the discovery that simple tree matching algorithms don't work.

I tried an algorithm where I took the three best nodes from a tree and compared that to a database of the three best nodes from the collected text works. The expectation is that the numerical value of each sense represents meaning spread out across many words and would to some degree adequately represent the "high points" of the text snippet.

Formalized Accountability

An idea I had for my next blog-style thing, since iRights is within a few months of basically wrapping up.

I think it would be great to have a blog-like thing that tracks predictions: Who makes them, when they make them, and whether or not they come true. Kinda like Long Bets, but tracking anyone who makes a prediction at all, and no money; just reputation points. Also, rather then waiting for people to enter them, we record predictions of people who may not even be aware of the site at all.

Captured Iraqi Colonel

I thought this was really touching (via InstaPundit):

A captured Iraqi colonel being held in one of the hangars listened in astonishment as his information minister praised Republican Guard soldiers for recapturing the airport.

He looked at his captors and, as he realised that what he had heard was palpably untrue, his eye filled with tears. Turning to a translator, he asked: "How long have they been lying like this?"

U.S. Lifts FBI Criminal Database Checks

The Justice Department lifted a requirement Monday that the FBI ensure the accuracy and timeliness of information about criminals and crime victims before adding it to the country's most comprehensive law enforcement database.

The system, run by the FBI's National Crime Information Center, includes data about terrorists, fugitives, warrants, people missing, gang members and stolen vehicles, guns or boats. [Privacy Digest]

I submit to you that this is actually a good thing, or at least will be in the long run.

Criticisms of criticisms

Who can avoid talking about the war? My feeling on the war is now on record, I suppose (support, contingent on dedicated and sincere attempt to reconstruct Iraq; if it self-destructs it should be despite our efforts, not because of them), but I wanted to comment on a couple of criticisms that I feel are either disingenuous or invalid.

  • "Bush (and by extension the administration) is stupid." - No. The administration may be wrong, corrupt, or a wide variety of other things, but it is not stupid.Note that we do not have access to a lot of information the administration does. Note also that it is almost never a good idea to completely tip your hand during war. The fact is that if all of its actions made sense to everybody that would probably be a very, very bad sign.Lest you think I'm violating my first paragraph and supporting Bush, I am not, because this cuts both ways. Accusations of stupidity absolve the administration of responsibility in a certain sense. The administration is going into this with eyes wide open, and with almost certainly millions of man-hours spent analysing the results to the n-th degree. Don't fool yourself into thinking otherwise to score a cheap shot.This goes for any non-pathological, non-degenerate government. I'd love to define that more precisely but a reasonable approximation is a government that consists of a reasonable number of people sharing power (not one person will full control) with a reasonable distribution of that power. Such governments are often many things but truly stupid is not usually one of them.
  • "Bush is just gung-ho for war.", implied as the sole or majority reason for war - Related to the administration not being stupid, I can't imagine the economic doldrums have escaped the attention of the administration, nor the strong correlation between the economy and re-election chances, regardless of any other effects. Further, it is quite likely that doing nothing, especially after the basic clean-up in Afghanistan, would not have negative consequences until the next President is in power, so the easy thing for the administration to do would be to allow the UN to dick around for the next two years at its toothless leisure. The odds are inaction would bite the next guy, not them. Something other then mere "bloodlust" is driving the President in this direction, strong enough to overcome the patently obvious downsides to war for the administration itself.I submit that it is at least plausible that the motivation of the administration is quite likely to be almost exactly what they say it is; considered honestly, true idealism and the true belief these are necessary actions is the only motivation that makes sense, when simply doing nothing is so much easier and immensely more monetarily profitable for everybody. I even have to admit that I expected worse on the civil liberties front, but that the administration seems to be genuinely focused on their stated goals, not using this as an excuse to tighten the reigns at home. (Which isn't to say I approve fully or intend to be any less interested and diligent, but I do admit a certain surprise.)Again, nobody can say for certain that the actions they are taking will have the outcome they desire, nor am I commenting directly on the desirability of that outcome at this time. All I'm saying is attributing this to bloodlust or desire for glory, while certainly easy (downright intellectually lazy), is disingenuous.

There's room for a lot of opinions on these issues going each way; the world is now directly in the middle of a major transition to a new way of doing business that started in 1990 with the collapse of the Soviet Union, and the on-going development of powerful weapons technologies cheap enough for any country to acquire. Things probably won't finish shaking down until 2010 or so, roughly, and things won't look the way they did before. (Look for the concept of "sovereignity" to undergo a lot of changes, for instance, as the world keeps shrinking.) There's a lot of new ambiguity and uncertainty. If nothing else, if you oppose the Bush administration's views, you do nothing to convince others by attacking shallow characterizations of it. It's not stupidity, and it's not simple bloodlust.

Reaction to previous post

In reaction to my previous post, Rafe posts an update...

My intention was to talk a bit about how Perl and Java differ, not explain how one should construct a program.

Yes, and I was one meta above that. I should have made it more clear we were discussing different levels.

He also says that you should avoid Perl's idioms, but then the question I ask is whether you should use Perl at all? I know sometimes you have no choice ...

That's what it is in this case. I work on a mod_perl application, and I came on the project quite a few years after that decision was made. My opinion on that issue is quite irrelevant. ;-) Also, considering when the system was started, mod_perl was the best of many evils; having used ASP, it beats the hell out of that. So in context, I actually support the use of perl in this instance.

Perl vs. Java code

Rafe Colburn explains why Algol-like languages are far superior to Perl for working on large scale, multi-programmer, long-term projects. I'd go further. If you use an outliner to edit your source code, his multi-line Java example shrinks down to one line, just like his Perl example. If you don't program in an outliner I'm sure you have no idea what I just said. If you do, you're probably chortling and guffawing and pointing at the screen saying "See what I said." [Scripting News]

For what it's worth, both the Perl and Java in the linked article are wrong. The correct solution in both cases is to write a function "dirDepth" or something that takes a path and counts the depth, then call that function. The Java one may be more complicated-looking, but that's OK. The original Blosxom solution unnecessarily ties the program to the UNIX platform, which is the only one you can depend on for '/' to be the path delimiter with no exceptions.

Computing Theory time

Please ignore if you're not into computing theory.

Says Den Beste:

My original statement was that a single Turing machine cannot perfectly simulate a system which consists of two Turing machines such that the ratio of the clock rates for those two Turing machines is a transcendental number.... [attempted proof clipped] ... Am I right? ... Can my proposed idea be restated in a way which truly does make it uncomputable?