XML Namespaces in a Nutshell

I've said several times that XML Namespaces are simple, but I don't have a good proof document. So here it is.

XML Namespaces are an optional standand that you apply as a sort of "plug-in" to XML. All example XML documents in this text are namespaced documents.

  1. Namespaced document redefine tag names and attribute names to be a tuple. The name of all tags and attributes is now a pair of (namespace, name).

    The "namespace" is an opaque string. It is conventionally a URL, like http://www.w3.org/1999/xhtml, the namespace for XHTML, but that's optional.

    In an XHTML document, there is no tag em. There is only (http://www.w3.org/1999/xhtml, em). All tags that have the same tuple are the same. All tags that have a different tuple are not the same tag even if the name component is the same.

  2. To retain human-editability, we need a shortcut for writing these tuples. This is done with the xmlns attribute on tags.

    This is a tag named em in the default "empty string" namespace:

    <em>content</em>

    This is the XHTML tag em for emphasis:

    <em xmlns="http://www.w3.org/1999/xhtml">content</em>

    The xmlns attribute affects everything beneath it.

    <span>
      <span xmlns="http://www.w3.org/1999/xhtml">
        <em>content</em>
      </span>
    </span>

    That is an XHTML em tag, inside of an XHTML span tag, inside of a tag named ("", span) that is not XHTML.

    Many namespaced XML documents will have an xmlns set on the root tag, and that may be the only sign you're in a namespaced document.

    The xmlns tag can also set a namespace prefix, as so:

    <xhtml:span
       xmlns:xhtml="http://www.w3.org/1999/xhtml">

    That is also an XHTML span tag, exactly the same as:

    <span 
       xmlns="http://www.w3.org/1999/xhtml">
  3. Namespaces override each other in the obvious way.
    <outer xmlns="outer.namespace">
      <inner xmlns="inner.namespace">
        <content/>
      </inner>
      <outercontent/>
    </outer>

    The default namespace is set on the outer tag. The default namespace is overridden on the inner tag. The true identity of the content tag is (inner.namespace, content). The true identity of the outercontent tag is (outer.namespace, outercontent). xmlns:prefix-type tags work the same way.

    The prefix is not part of the name. It is an error in namespace-based code to "do" anything with the prefix, except perhaps for serialization. It is an error to require the prefix to be set to a certain value.

    (If I deserialize and reserialize something, I generally take the time in my code to try to preserve the prefixes. If no pre-existing prefix is available, I also generally have a dictionary of "friendly" default prefixes. These are both kindnesses to humans, not requirements.)

  4. No matter what the default xmlns namespace, all attributes are in the empty-string namespace unless set explicitly with a prefix. This means that in
    <span xmlns="http://www.w3.org/1999/xhtml" 
       class="sample">

    the true identity of the attribute is ("", class). The XHTML standard is specified such that the ("", class) is the class attribute you know from XHTML. It is possible to namespace attributes with prefix:attribute="...", but you must declare a prefix to do so. This seems to be rarely done

That's about it.

However, you'll find that most things that use XML Namespaces use them incorrectly. Of course you'll usually not be in a position to fix it, so you have to adapt to the local dialect. Even XMPP, probably the second-best namespace-using standard behind XHTML and friends, will put two different namespaces on the opening stream element for things like clients vs. components and not directly specify that the stream tag should be treated with the same semantics in both cases, implicitly treating the stream tag as not namespaced while using the namespace declaration as just another attribute, but just for that one tag.