Half Constructed Objects Are Unnecessary

Page content

Functional programming languages of the type I’m talking about have immutable objects1.

For this particular section I’m just going to be talking about “immutability”, so in addition to Haskell, I also include Erlang and any other language with immutable values only. This does not includes languages that merely encourage them, but permit mutability; this discussion is about 100% immutable langauges, where mutation is held not just at arm’s length, no matter how “long” that arm may be (Rust), but outright eliminated.

Immutable objects can not be changed once created. New objects can be created, even related ones that are “this value with dozens of fields I had before but with this particular field set to 3”, but the old ones can’t be changed. Therefore, before creating any sort of composite value, all the relevant components must be in hand, because they can not be added in later.

A mutable language does not need this discipline. You can half-construct objects, heck, you can construct entirely uninitialized objects (not in the C/C++ “uninitialized memory” sense, but in the sense that they do not contain valid values in accordance with whatever the more-specific local “valid” is). The languages neither particularly encourage this nor discourage this.

But since it doesn’t tend to discourage it, it tends to creep into code when you’re not looking. Even when you think you’re using some sort of constructor or initializer to create an object, it’s still easy to return objects that are valid according to some internal or library definition (e.g., an empty linked list, which is a legal linked list) that are not valid according to some other definition (e.g., “this object must be owned by at least one user”), and so the object is created, then through later mutations, changed to be the valid value for the more restrictive context.

One of the things that functional programmers complain about in conventional imperative languages is the presence of some sort of nil/null value, that if dereferenced will crash the program. I remember my own journey in learning about Maybe and Option and other such values.

But what’s weird is, when I’m programming Go, I can literally go weeks without encountering nil pointer panics. If you ignore “things I quickly picked up in unit tests”, and my most common instance “I added a map without initializing it in the initializer, whoops” (also quickly picked up in tests), I can go months.

I wondered for a long time what the difference is, and I still don’t know; I’d really need to sit down with someone who is constantly encountering nil pointer panics and do some pair programming and compare notes.

But my best theory at the moment is that I’ve adopted the functional programming principle of always fully constructing valid objects in one shot. I do not tend to have invalid objects flying around my programs and getting nil pointer exceptions. If it is invalid for some field to be “nil”, then the program never has an instance of the value for which it is nil.

In fact, nil pointer exceptions are merely the problem people notice, because they crash noisily. In this sense, they are a positive boon, because that’s better than the invalid value that doesn’t crash the program and goes unnoticed! And that, of course, I see all the time in code; it is not some rare thing.

The thing that distiguishes null values particularly isn’t their invalidity. There are many other ways that data can be invalid in practice. It is that you can not remove them from the values the programming language considers valid. In C, for instance, you can not declare “a pointer that is not allowed to be NULL”. There is no such type. It is forced into your data model whether you like it or not. That is the distinguishing thing about null values.

It is far more productive to consider it a particularly bothersome special case of the general problem of having constructed invalid data in general, for which nils are not really all that special, then to overfocus on them and neglect other issues. If you remove all “nils” from your program, whether through careful programming or support in the programming language itself, but you’re still routinely passing around invalid or half-constructed data in the rest of your code, you’ve made some progress, yes, but not nearly enough progress.

It sounds too simple to be true: To not have invalid data in your program, do not construct invalid data.

But it works!

Strongly-typed functional languages strongly afford this by creating very rigorous and rigid types that fully prevent the data types of the program from even representing invalid data in memory, giving you tools to enforce this very well. Imperative languages have tools that can be leveraged to this end as well, but they are generally not as strong. But “not as strong” is not the same as “does not exist at all”. Whatever it is your local language offers for these tasks should be used.

And even in fully dynamic scripting languages that lack these tools almost entirely, you can just… not create invalid data.

Easier Said Then Done… But Probably Not Why You Think

Of all my suggestions here, I suspect this is the one that could most surprise you if you try to put it into practice. You would think that refactoring your code to never half-construct objects would be easy. You take anywhere you construct an object, shift all the mutations necessary to generate the initial values above the initial construction, then construct it all in one go. The theory is simple.

The practice is bizarre. An imperative program written without an awareness of this issue will naturally tend to grind in an expectation of half-constructed objects more deeply than a wine stain on a white shirt. If you try to convert one object creation, the odds that you’ll find that the incomplete object was passed somewhere to do something, which will itself require some other value to be created, which will then call a function that itself implicitly depends on half-constructed objects in a way you didn’t expect, rapidly approach one.

On the plus side, the first time you try to do this refactoring and you find this happening to you, you’ll probably come to a deeper understanding of how hazardous this can be to your program’s architecture.

But even having experienced it, I can’t really explain it. All I can say is, find yourself a nice little 500-1000 line program that ends up half-constructing things and passing them around, and try to refactor it to never half-construct objects. It’s wild how deeply imperative code can grind this antipattern into itself.

You may even get to the point that you believe it is impossible to avoid half-constructed objects in imperative code. Which is why I am again going to bang on the fact that functional programs prove that it is in fact possible, because they give their programmers no choice, and there isn’t any program that functional programmers are particularly stymied by as a result of this constraint on them.

There’s nothing stopping you from simply making that choice yourself in an imperative program.

It’s definitely good to start with a greenfield project though; refactoring is possible but quite difficult. It’s even possible this is one of the primary things that makes refactoring hard in general, though I don’t feel I have enough experience to make that claim. Since I’ve been programming in this style for many years now, I don’t have a lot of recent experience in refactoring code that partially-constructs objects. But it does make sense to me that in the code between the object being constructed and the object actually being “valid” forms an invisible, yet quite potent, barrier to many refactoring operations.


  1. Since I’m discussing a very wide slice of language, “object” here means something very general; think of it as closer to the standard English meaning rather than the C++ meaning. I need some way to discuss things across languages here. ↩︎