What Is An Outline, part 3

In my previous post, I concentrated on extending our data model to handle multiple data sources cleanly and transparently. However, we now have a conceptual contradiction in our data model. To this point, the data model we've built still assumes it is a tree, which means we can use the tree-optimized algorithms for things like counting children. But if a document transcludes something that transcludes the original, we can have a cycle, which means there is a path starting with a node in the original, going through the transcluded document, and ending up back at the original node. Most tree-based algorithms respond to this situation by infinitely looping, which manifests itself as a hard lock-up, which is generally considered a negative by users.

Design Decision 3: How To Represent Structure

What Is An Outline, part 2

In my previous post, I discussed what I believe is the outline data structure in current use by outliners, and why they are inadequate for my purposes. I discussed why transclusion doesn't work very well under that model (an algorithm can sort of hack transclusion on top of the tree, but it's not "real" enough), and the difficulties of replacing some of the tree concepts when you have a full graph. Today's post covers the "transclusion" issue.

Crashing hard drives

It happens to me without fail.

I have three computers in this house. One is a headless machine that runs Radio Userland, providing my news aggregator and this weblog, and nothing else. One is a Linux desktop machine that Windows literally refuses to run on. One is a laptop that runs primarily Linux but has Windows only because it came that way. Thus, only two usuable machines.

Whenever I take one of the machines down for heavy duty maintenence, secure in the knowlege the other one can easily pick up the slack, BAM one of the hard drives in the other machine comes crashing down, taking down both of them at once.

Joe Job

All this week, I've been Joe-job'ed. It's in "response" to my insulting comments about spammer's intelligence here, here, and here, with regard to the article I wrote on Bayes Filters.

The efficacy of a Joe-Job has decreased significantly over the past couple of years. Ever since the Window viruses that starting forging return addresses willy-nilly, we've all been "joe jobbed" for the last few months. Going from 0 to 100 spam bounce messages a day would be much more annoying then going from 20 to 120 spam bounce messages.

What Is an Outline?, part 1

As I hinted at in my previous post, this is not as easy to answer as you'd think. To give you a hint on how hard it is, I'm currently up to my fourth major version of my fundamental data structure. The best way to answer "What is an outline?" is probably to take you through the same development process I used to get to where I am now.

A word of warning: I'll try to keep this largely comprehensible to anyone who's willing to really try to read this, but (as some of the more technical of you are probably already guessing) we will have to dip a little bit into graph theory and some programming concepts.

Why Are Data Structures Important?

Bowers' Law

I've been writing ahead on my promised developer travelogue posts, because I am trying to make them high quality posts useful for programmers both new and old, not just off-the-cuff reportings of what I happened to be doing last night.

I was concerned about the time lost writing these posts, because as I said, time spent writing is time not spent coding. (Though I'm getting much of the writing done when coding is not possible.) An entirely unexpected benefit has accrued, though: I was factoring out a concept from the next two posts and it has prompted a nearly complete re-conceptualization of how the dominant programming paradigms of our day inter-relate.

Journalism idea

We know reporters are biased. They are human, all humans have bias, therefore the reporters are biased.

Suppose the large journalism institutions tried a new style of reporting. Instead of letting one reporter write a story, assign two reporters to the story. We want them to be clearly biased, one on each side of the issue.

Then, write a three-part news article:

  • A core that both agree to.
  • One part each that are the aspects they could not get the other to agree to.

Add a rule that each personal part may be no longer then the agreed core. Add a rule that the editors may edit as normal, but they may not decide what fact goes into which part; that's for the reporters only. To make this work, the reporters may need assurance that all three parts are always posted together, and the rules for cutting it down to size may need to be agreed upon. (You might be able to get away with publishing only the core but I think this will fail; it makes the core too contentious.)

Nation-Building 101

Some of the best-justified criticism of the Administration's Iraq plans I've seen to date. People who expected us to anticipate and have a plan for every possible contingency annoy me; such a thing is not possible. People who don't understand the value of flexible plans that involve intelligent agents in the field responding dynamically to local conditions annoy me; it may look like a lack of a plan but that "lack of a plan" is hundreds of times better then a bad, globally applied plan.

Bayesian Filters

Slashdot recently had an article on gibberish in spam. I posted a comment about my work on Bayesian, which reminded me I need to update that. This post will have to do.

First, you'll note a cranky tone in the Slashdot postings. Would you believe that Bayesian filtering has fanboys? A lot of people seem congenitally incapable of reading something about Bayesian once they get the faint idea that I may be a little critical of them. (Completely over their head is the distinction that I'm not critical of Bayesian per se, but of the idea that it will solve the spam problem once and for all.) Instead of reading the words, they seem to suffer some sort of strange vision ailment that renders them incapable of seeing anything but the phrase "Bayesian Filters are bad."

Snopes Gets RSS Feed

Tell one, tell all, the invaluable Snopes.com has finally gotten an RSS feed!

Snopes is required reading for people on the Internet. If it sounds too good to be true, if it's a little too conveniently in favor (or against) your favorite ideological position, or if it's a little too horrifying to be true, check it on Snopes before you get upset, or worse, spread the claims further. Because you'll meet someone who has nearly the entire site indexed in their head, and there's little that's more damaging to your point then to have it conclusive rebutted on Snopes.