Interfaces and Nil in Go, or, Don't Lie to Computers

2021-05-11 (Last Modified: 2021-05-12)

It is commonly held up as a wart in Go that interfaces have "two different nils"; one is when the interface value is nil:

var something interface{}
fmt.Println(something == nil) // prints true

and one is when the interface contains a nil:

var i *int // initializes to nil
var something interface{} = i
fmt.Println(something == nil) // prints false

This is not a wart in Go. It is a result of a programmer misconception combining with a software engineering bug which results in an attribution error.

Programmer Misconception

Before I get into the exact misconception, let me demonstrate this issue with another misconception from the same family. Go is very similar to C. It is similar to other languages as well but the C heritage is very clear. I have fielded several questions on the Go reddit to the effect of:

Shouldn't this be illegal?
func x() *int {
    var i int = 25
    return &i
}

There's no reason to ever ask this question if one is learning Go... unless one already knows C/C++, and recognizes the similarity to what is a fatal error in C. In C, having "allocated i on the stack", it is now illegal to take the address of i and return it, because once the function returns the stack entry for this function will be torn down and reused, resulting in that address being re-used for some other function call later. Worse, this can superficially appear to work in C as long as you try to use the pointed-to integer only before some other function call.

But in Go, this is not a problem. This is because Go doesn't have a stack or a heap. The implementation of Go will have a stack and a heap, but the language does not have such a distinction. The compiler takes care of figuring out what it can put on the stack vs. what must go on the heap, and while you may declare every allocation in Go, you don't get to pick where it goes because in the machine defined by the Go language, there aren't even any choices to make. It's just "allocated", with no further details available.

This is a misconception brought over from programming experience in C.

That's not a criticism of anyone who may have that misconception. Misconceptions are inevitable. But that doesn't make it any less an inappropriate application of a concept from a different language into Go.

Another misconception imported from C and/or C++ is that a NULL pointer is "invalid", unconditionally. This is not true in Go. Go doesn't actually have a NULL pointer. It has a nil, and I think one reason it is not named NULL is precisely to try to avoid the idea that it is "invalid". In Go, nil is not "invalid". It is a perfectly well-defined value with well-defined behavior. Among those behaviors are that it is is perfectly valid to define methods on them:

type Something struct {}

func (s *Something) Example() {
    if s == nil {
        fmt.Println("nil pointer")
        return
    }
    fmt.Println("not nil pointer")
}

func test() {
    var s *Something
    s.Example() // prints "nil pointer"
    s = &Something{}
    s.Example() // prints "not nil pointer"
}

This is not only "valid" in Go, it is perfectly moral, i.e., it isn't a code smell or bad programming or anything else. It is a perfectly acceptable technique with many interesting uses.

Why doesn't this crash? Because in Go, the runtime always knows the type of the values. In C++ if you have a NULL there isn't enough type information on it at runtime to call a method, but Go always has the type. You can imagine it as if every value in the language instead of being just the value is actually the tuple (Type, Value). This is not necessarily what is going on under the hood because this is amenable to a lot of optimization such that the type is not literally carried around by every value in RAM, but it's a useful mental model.

Therefore, when you have a nil pointer and call a method on it, the Go runtime is perfectly capable of resolving the method call with no errors.

It is a misconception from other languages that nil pointers are just like NULL in C/C++, and therefore it is illegal to call methods on them. This is objectively false in Go.

In fact it's objectively false in many other languages as well. It's a good idea for a language to always know what type something is. It's useful for a lot of other things in the compiler and runtime as well, so it's a pretty popular choice nowadays to work like Go does and always know what the type of something is. Be it through careful compilation or simply always labeling values with their types (common implementation in dynamic languages), it is perfectly legal to perform operations on the type even for certain special sentinel or "illegal" values. C and C++ are kinda the odd language out here.

Go would have to go out of its way to ban this usage, because there is no compelling language design reason to ban it. Once a language is looking up methods using only the type of a value, with no reference to the value itself, there's simply no problem with calling a method on a nil pointer. In fact I find the way the C heritage line does this to be the flawed way, and observe that it fits in to the rest of C's flawed handling of types in general where its "type" support is surprisingly surface-deep when you really start pushing it like this. It is much better for the language+runtime to always know the type of the values it has than to conflate the two things.

Valid uses of "nil" pointers in Go include:

Class methods: Wait, Go doesn't have class methods, right? Of course it does. Class methods are just methods that don't reference the value itself. While some languages have direct support for calling methods on a class with no specific value in hand, it's really just a convenience. When a language always concretely knows the type of all values, passing around the values just for their types is perfectly fine, and opens the door to all the useful patterns based on class methods. For pointers, is not only "acceptable", it's the ideal value, because you don't have to figure out how to "correctly" fake up the rest of the values in the object, even with Go's bias towards zero-values.
Easy testing swapouts: I have a very particular memory pool I needed for a particular project with a particular memory use pattern. If you use the nil pointer, instead it always allocates. This makes for easily testing whether or not it's working in real code, and removing the memory pool as a consideration in the rest of the test code when it's not what is under test. I have monitoring code with a monitoring struct that hands out counter structs that can be incremented; when the monitoring struct is nil, it hands out nil counter structs that don't count, which makes it easy to remove all monitoring for code under test when the monitoring is not what is under test. (The monitoring itself, of course, is tested in other tests.) I have structs that are output drivers that, by design, don't output anything when they have a nil pointer.
There's a lot of use cases for implementations on nil pointers. I think the community at large doesn't understand this precisely because so many people mentally model them as simply "invalid", or in other words, I think the causality runs backwards from what people would assume... it's not that nils are invalid so people never implement methods on them, it's that people never implement methods on them so they continue to think they are invalid. In fact it has to be the way I say, because nils aren't invalid.
It remains perfectly valid to use nils specifically to implement SQL NULL or cgo's NULL or missing or invalid or whatever in general. However, this isn't something forced on you by the language; it is a type-specific choice.

The good news is that in Go, you don't forcibly have an "invalid" value adjoined to every pointer type. The bad news is, you are still forced to have some sort of a nil on all pointer types, so if you have a type for which you have no use for it, too bad, it's still there. And going the other way, if you need two distinguished special values (or three or four...), well, you can't have that, so you'll need to start doing the usual programming things (flags in the value, or start using interfaces with several implementations, etc.). You have one and exactly one of these Options, if you get my drift.

Yes, this is the "billion-dollar issue", slightly reduced in potency by the ability to have "valid" nils, but only slightly; you still have that nil forcibly adjoined to your type whether you like it or not. I'd like non-nullable pointers in Go myself. But this blog post is about what Go is today, not what it should be, which is a perfectly valid topic.

It is also a misconception that the nil interface and an interface containing a typed nil are the same thing. For one thing, specific nil pointers are always typed in Go, so they are in fact trivially not the same. This can be confusing because the programming language literal nil is not typed:

func tmp() {
    var x *int = nil   // legal; "nil" becomes a pointer to int
    var y *float = nil // legal; "nil" becomes a pointer to float
}

but that is because the literal string nil is a special constant that can take on arbitrary types. Numbers work the same way:

func tmp() {
    var x byte = 1     // legal: 1 becomes a byte
    var y int = 1      // legal: 1 becomes an int
    var z uint32 = 1   // legal: 1 becomes a uint32
}

The programming language literal 1 is not typed in the language spec; it becomes whatever it needs to be. This can pass through const statements:

const ONE = 1          // note no type given

func tmp() {
    var x byte = ONE     // legal: ONE becomes a byte
    var y int = ONE      // legal: ONE becomes an int
    var z uint32 = ONE   // legal: ONE becomes a uint32
}

However, "a nil interface" and "an interface containing nil" are not the same nil. One nil is of the interface's type, and the other nil is contained in the interface but has a specific other type that is not the interface. (Go does not nest interface values; an interface is always either nil, or contains a concrete type.) So, they are objectively not the same nil because they are two different types (in the Go language sense) of nil.

Consequently, it is not a well-defined operation to collapse both of these cases, because it is not clear what the type of the resulting value should be. Neither the interface type nor the underlying concrete type is fully correct.

The Software Engineering Bug

The software engineering bug that is the topic of this perennial discussion is writing an interface:

type DoesAThing interface {
    Thing()
}

To then create some type that implements it on a pointer value that can't be called by nil:

type ThingDoer struct {
     thingsDone int
}

func (td *ThingDoer) Thing() {
     td.thingsDone++
}

(A very common pattern to assume that td won't be nil and not check for it, so if this method is called on a nil ThingDoer pointer it will result in a panic.)

And then write code like this:

func WillPanic() {
     var thingDoer *ThingDoer     // a nil *ThingDoer

     // assign that into an interface
     var someThing DoesAThing = thingDoer 
     someThing.Thing()                // panic!
}

Of course these three things will end up separated by some more code, so it isn't so stark; an arbitrary distance in the code can separate the creation of a nil and the assignment of that into the interface.

But even this is really just a plain ol' bug. What makes this interesting and contentious is...

The Attribution Error

The final error here lies in attributing the error to the line

    someThing.Thing()

This is not the error. This line is correct.

The error is on the line:

    var someThing DoesAThing = thingDoer

After that line, you had already lost. The program state was irretrievably scrambled and the only question is when the error is going to manifest.

Why?

What is an interface? It is a promise that the value inside the interface can perform certain methods. It is supposed to allow you to abstract away from what concrete types may be in that interface and deal with the value strictly over the interface.

This line is where the bug is because this line of code is a lie. It is a claim that thingDoer is capable of being operated on strictly through the ThingDoer interface. It can't. If you try, it will panic. In this case, the nil is an invalid implementation, not because it is an invalid Go value, but because it is an invalid implementation of the interface.

In fact, while nil pointers are certainly far and away the most common manifestation of this problem, the following code is equally flawed for the same reason:

type AnotherDoer struct {
    beNaughty bool
}

// Note how this is not even a pointer type!
func (ad AnotherDoer) Thing() {
    if ad.beNaughty {
        panic("did I do that?")
    }
}

func anotherLie() {
    var someThing DoesAThing = AnotherDoer{true}
    someThing.Thing()
}

This crashes, and is invalid for the same reason: I put something in an interface that is not capable of implementing that interface. I emphasize again there are no pointers even involved here. The error here is on the first line of anotherLie, not the second.

I lied.

There will be consequences.

What About `interface{}`?

The empty interface may seem like it doesn't match my description above, because it makes no promises in the interface definition itself. However, if you are experiencing this error with empty interfaces, there is still some sort of lie being told, it's just a violation of some promise not expressible by Go's rather weak type system. A common one is "this type can be serialized via encoding/json". Nevertheless, if you pass something to the JSON encoder that it can't handle, the error isn't in the JSON encoder, it's in the code that wrapped that value it can't handle into the interface{} and shipped it to the encoder.

Not Just About Go

In fact the root problem here isn't about Go specifically. This is a general problem that you can encounter in any language, and in fact, even beyond. When you lie to your code, there will be consequences.

If you promise that some value will be able to have some method called on it, but it can't, there will be consequences.

If you are in a dynamic language and write code based on the promise that some attribute will be present, and then someday it's not, there will be consequences.

If some bit of code claims to implement an interface, and you have to stub out half the methods with the local equivalent of panic("can't be implemented"), because the interface isn't granular enough, there will be consequences.

This being software engineering, we sometimes will have no choice. We will have some big code base, and there will be some interface of 10 methods we have to conform to, we'll have some type that can only possibly implement 3 of them we have, and we have to pass it to some legacy code base we can't modify and hope that whatever we need from that code base will only use those 3 methods. Sometimes we have no choice but to pile three toddlers on top of each other, equip them with a trenchcoat and a fake ID, and send them in to the bar to get some water, because the API is designed such that the bar is the only place to get water.

When I say "there will be consequences", I do not mean that from a moralistic perspective; I mean it from an engineering perspective. It is important, as engineers, to correctly attribute those consequences to the lie, rather than the downstream things that "believed" the lie.

Code that receives data and then tries to determine whether or not it was a "lie" is extremely difficult code to get right; often it is mathematically impossible to get it completely right, because there is at least one input that will be produced by both lies and truth, and at that point the code has no chance of being correct about which is which. (I take the expansive definition of "input" here to include global state and such, the entire "input" to a function, not just its in parameters.)

This particular issue is only a Go-specific manifestation of a generalized problem in programming languages. It's easy to lie, both deliberately and accidentally. It takes something like dependent types to get to where this is not a possible mistake, so while we can quibble about what changes to Go may make it easier or harder to be truthful, every practical programming language has the capacity to tell lies built into it.

Going even beyond programming languages, one of the most important rules of databases is don't lie to your database. A common example to show what I mean: If your pricing database says "Service X" costs $50, do not try to give a discount to some customer for Service X by going in to the database and modifying it to cost $25. This way lies pain, since now all customers will get that discount. You need to add to the database a way of representing discounts, and teach everything to understand that concept.

Of course when I put it so baldly the problem is obvious, but I see this sort of thing going on a lot, where somebody has some requirement and engineers start thinking how to "trick" the database into implementing that requirement by spiking it with the correct incorrect data. There are always further consequences for this sort of thing. I can't say "don't ever do that", I've been in the positions where I had no practical choice too, but certainly give it a good bit of resistance first, rather than reaching for it first as I've seen so many engineers do.

In general, there are consequences to lying to computers, be it programming languages or databases. They just believe the lies. They have no choice. At the very least, use this power sparingly, lest you turn your entire database and codebase into a complicated web of lies, compensating for lies, blocking out other lies, all for the UI layer to try to combine these lies into the right truth for the user. Those code bases are no fun to work in.

Programmer Misconception

The Software Engineering Bug

The Attribution Error

What About interface{}?

Not Just About Go

What About `interface{}`?