Abuse of Sum Types In OO Languages

Sum types are useful, but they are also an attractive nuisance in object oriented languages. There's a certain type of programmer who gets a taste of functional programming and has a good time, but misinterprets that good time to mean that sum types are always better, because they are FP, and FP is Better.

But sum types are not generally better, they are specifically better. Using sum types, or even forcing sum types into languages that don't really have them like C, is a valid solution for certain problems, but in most cases they are not the best choice.

To understand when sum types are best, you must understand something called...

The Expression Problem

The expression problem is a well-known issue in computer programming. There links are good for a deeper dive, I'm only going to give a short summary here:

Suppose you have three separate modules in your FP code. Consider:

  1. One module creates a data type, like in Haskell notation data Fruit = Apple | Pear | Banana.
  2. Another module provides some function on Fruit, like pattern-matching on the fruit and printing its type.
  3. Some third module wants to add another type of fruit to the data type, like data MyFruit = Fruit | Orange. But nothing that knows how to deal with Fruit will know how to deal with the new MyFruit without modifications in the other modules.

That's one prong of the expression problem. The other is, consider in an OO language:

  1. One module creates a class, like in Python: class Fruit: pass, and then we can create subclasses: class Apple(Fruit): pass, class Pear(Fruit): pass and class Banana(Fruit): pass.

    I show this class with no methods, but it could be defined to have a .print method that prints out the type of fruit it is.

  2. Another module can easily add another fruit: class Orange(Fruit): {implement print method here}.
  3. Now suppose there is a third class, and it wants to add an operation to the Fruit; perhaps an .is_ripe() method. There isn't a great way for the module to do this without modifying the other modules.

The point is not that this is an unsolvable problem in either case, but that the languages and/or techniques certainly have a preference. Python, being highly dynamic, will let you simply modify the original Fruit class inline, but if you build a code base pervasively on code that modifies its parent classes, you will come to know pain as you scale up; it prefers the class-based approach. Functional languages have techniques for dealing with this, but they will be less preferred than sum types in their wheelhouse.

However, while the expression problem is often used to talk about programming languages, it's more about the specific techniques. Using sum types anywhere tends to land you in the first fork of the expression problem. Using subclasses or interfaces tends to land you in the second prong. Both can be done in the language in the same program, and you'll get different costs/benefits related to the expression problem.

How Sum Types Are Abused In OO Languages

Sum types are abused in OO languages when they are

  1. used when the priority is either to be able to easily add new instances, or
  2. when neither prong of the expression problem is compelling problem.

I bold that second one, because neither prong of the expression problem being a compelling problem is the majority case!

For instance, many data types never leave their own modules. In that case, the module author has full control over all the types and all the operations at all times, and it's roughly the same amount of work either way because no matter what, you're going to have to fill out the matrix of operations you desire. In this case, it is almost never useful to force the foreign paradigm in. One should prefer the language's preferred mechanism to avoid "Writing X in Y"-type errors, which is an innocuous name for an error that will slowly, but quite surely, strangle any program filled with it.

But even for data types that do leave their package, the most common case again is that they are just going to be used, as they are, in whichever paradigm the local language prefers. For instance, if one downloads an image parsing program, an OO language may offer a unified interface between many types, and an FP language may offer a sum type of various types of images, but the most likely way this library will be used is to be composed into something else, and the user will have little desire to either add operations or image types that other modules will then be able to use. Yes. That is certainly a use case. But it is not the common one.

Sum type abuse has been coming up again in the Go community lately as people struggle to see if generics somehow make them more feasible. (So far, the answer is nothing is better than what was already possible, but stay tuned.)

I write this example in Go, therefore, but this can appear in any OO language. Watch the do stuff comments, the numbers will be meaningful shortly.

// "Argh, I can't have sum types, so let's hack this in:"
type MySumValue1 struct { ... }
type MySumValue2 struct { ... }
type MSV3 struct { ... }
type MSV4 struct { ... }

func Op1(something any) {
     switch val := something.(type) {
     case MySumValue1:
          // do stuff 1.1
     case MySumValue2:
          // do stuff 1.2
     case MSV3:
          // do stuff 1.3
     case MSV4:
          // do stuff 1.4
          panic("argh so angry sum types don't work and Go can't "+
                "stop this clause from happening Go sucks so much")

func Op2(something any) {
     switch val := something.(type) {
     case MySumValue1:
          // do stuff 2.1
     case MySumValue2:
          // do stuff 2.2
     case MSV3:
          // do stuff 2.3
     case MSV4:
          // do stuff 2.4
          panic("oh my gosh look at how Go is forcing me to organize "+
                "my code so ugly :(")

func Op3(something any) {
     switch val := something.(type) {
     case MySumValue1:
          // do stuff 3.1
     case MySumValue2:
          // do stuff 3.2
     case MSV3:
          // do stuff 3.3
     case MSV4:
          // do stuff 3.4
          panic("argh argh argh")

If you do not have a specific need for external package users to add additional operations to your supposed "sum type", the correct way to spell this code is:

type MySomething interface {

type MySumValue struct { ... }
func (msv MySumValue) Op1() { /* do stuff 1.1 */ }
func (msv MySumValue) Op2() { /* do stuff 2.1 */ }
func (msv MySumValue) Op3() { /* do stuff 3.1 */ }

type MySumValue2 struct { ... }
func (msv MySumValue2) Op1() { /* do stuff 1.2 */ }
func (msv MySumValue2) Op2() { /* do stuff 2.2 */ }
func (msv MySumValue2) Op3() { /* do stuff 3.2 */ }

type MSV3 struct { ... }
func (msv MSV3) Op1() { /* do stuff 1.3 */ }
func (msv MSV3) Op2() { /* do stuff 2.3 */ }
func (msv MSV3) Op3() { /* do stuff 3.3 */ }

type MSV4 struct { ... }
func (msv MSV4) Op1() { /* do stuff 1.4 */ }
func (msv MSV4) Op2() { /* do stuff 2.4 */ }
func (msv MSV4) Op3() { /* do stuff 3.4 */ }

func SomethingUsingYourValue(mySumVal MySomething) {
     // now you are statically guaranteed to have all operations.
     // And you don't "switch on type" here, you just call the methods.

You can ensure no-one can make new values by adding an unexported method they have to implement, though usually in Go we just don't bother unless there is proactively a reason to shut down external implementations, and there usually isn't.

This should be your default approach in Go, because the language is set up to make this work well. Interfaces are built into the language. They can be composed with each other, composed into structs, passed around, statically checked, avoids constantly using type switches, used to specify generics, and just generally work better with things. Whereas, if you force sum types in, you're manually implementing them every time, the language doesn't support them, the syntax doesn't support you, the documentation system doesn't support you, you can't feed them into a generic specification, and so on.

Especially when you're just using internal data types, there's no particular reason to prefer forcing "sum types" into the language when the language's supported primitives do the same thing, just in the other direction, so to speak. See how I annotated the do stuff's in the examples above? It's much the same code, just transposed, like a matrix transposition.

Is it worth paying all those prices working against the language you're in just so you can have your code fragments laid out row-wise rather than column-wise?

In a functional programming language, the sum type solution is much easier to work with. You could force your code to use typeclasses and look more OO-ish, but it's going to be a much uglier solution that works against the language and fails to harness its strengths. And you will complain the whole time about how bad the language is as it fights you tooth and nail... hmm... sound familiar?

Because it's the same mistake.

When To Use Sum Types In OO

The time to use sum types in OO is when you have one of the minority of problems it is clearly better at. Go's sum types story is bad, yes, but even so the standard library uses it for the ast.Node type, with (if I'm counting correctly) 57 implementations of that interface. ASTs are a classic "sum type" problem, and even using Go's inferior sum types is better than any other solution here. They exist for the sole purpose of external packages adding more operations to them, and you are not particularly welcome to add more instances of them because it isn't going to do anything useful. (The Go authors do not use their technical ability to stop you with the compiler, but they don't need to. There's nothing useful that can come of trying to make new AST node types.)

But you need to account for the costs of using a foreign paradigm, which are fairly substantial. Those costs are not to be incurred casually and pervasively, just because it is a nice solution in some other language.

I've seen the valid observation that saying "use the right tool for the job" is a vacuous observation without giving some idea of what the right tool for a job is. Well, here's a "use the right tool for the job" that does exactly that.

A Closing Thought

I know my fellow programmers, and many of you are stubborn. So here's my plea to you.

You are frustrated trying to use sum types in Go, and other similar OO languages. The code you are writing annoys and bothers you. You are constantly struggling with it, it doesn't refactor well, it doesn't play well with other code you write, it's just so annoying.

You're presumably stuck in this language or you'd use one that makes you happy. It's a bummer you're stuck in a bad language that sucks. I wish you the best of luck in working your way into a language that makes you happier.

However... can I convince you to just try the technique I outline above a few times? Go back to one of your horrible packages with all the bad code where you were just broken by your language's support of sum types, and try converting it to this approach instead. Pick one that least matches my description of when to use sum types. And once done, maybe see if there's any other refactorings you can do now, you know, clean the code up a bit afterwards, and see how it looks. Maybe now you notice you have an interface that can be exposed to other packages as well and they can provide their own useful implementations without your package coordinating them all.

I don't really care if you do or do not stay angry at the language you're forced to work in for your job. What I do want for you is to be able to make the best of the situation you're in and not be unhappy. So, you know, give this a try. See if it makes the code "worse", but also more effectual, more able to work with your local language's design capabilities, and overall less frustrating. It's cheap. Low-commitment. You can always revert the commit if I'm not right.

But most of you are going to find that whatever other opinions of the resulting code you may have, it is going to be much less... frustrating.