Functional Programming Lessons in Imperative Code

A BlogBook is a collection of blog posts intended to be viewed as a sort of book on their own.

This BlogBook is on how to transfer the lessons that can be taught by strict, pure functional programming (think Haskell) into imperative programming languages.

This BlogBook is complete.

Introduction to Applying Functional Programming Lessons in Imperative Code

I recommend to any professional programmer that they spend some time with a very strict functional language, enough to get to the point they can write non-trivial programs in it.

For the purposes of these posts, you can read functional programming as Haskell. There is an earlier definition of functional programming that just involves having closures, but as all modern1 languages have closures now that is no longer a relevant definition. There are other languages like O’Caml, Clojure, and Scala that have very functional flavors, but there are few languages in common use that take it to the extreme of Haskell.

I believe the lessons learned there can be carried back into any conventional imperative language. That’s hardly a controversial claim. What will be controversial is that I believe that many people are carrying very superficial lessons back while missing the deeper ones.

In this series of posts, which I am gathering up into what I call a “BlogBook” (see link in header if you’re reading this through the blog post), I propose to piss off the entire programming community one way or another.

  • I will anger those who believe there is nothing to learn from functional programming, or that a Sufficiently Smart Programmer 2 could just work everything I’m going to say here out from first principles without ever leaving their first language.

    There will be a several things that I’m going to cover that we may not “learn” from Functional Programming, but it can still put a new spin on it. For instance, I would agree “global variables are bad” is something we can (and historically did) figure out just from the imperative world, but an FP-based viewpoint can enrich that view. There’s also some things imperative programming will eventually teach you, but FP will ramp up to 11 and teach you much more quickly, by forcing it on you. So bear in mind as you read this that I am never claiming this is the only way to learn these lessons, merely a good and fast one.

    • I will also anger people who believe that any claim that is not accompanied by a complete history of everything related to the claim going back to Aristotle is implicitly claiming to have invented the idea from whole cloth with no previous inputs. There aren’t all that many such people numerically but they sure do like to make angry posts online about how dare they claim to have ideas.
  • I will anger those who are very attached to strict functional programming by pointing out some places where I think it gets some things wrong. Even, sometimes, “less wrong than conventional imperative programming”, but still wrong.

  • I will most especially anger those people who think that “practicing functional programming in a conventional languages” means using lots of maps and filters and making vague references to “monads”, but they really only mean Option. (I have already written about what monads are, and this post includes why I believe they are not that useful in your imperative language). I will anger these people because in an entire series of posts about the lessons we can learn from functional programming I am still going to call nested maps generally an antipattern in imperative languages, and things like pipelines generally a parlor trick rather than the Most Important Lesson of Functional Programming.

    So, you know, brace yourself for that. It’s a ways away, though. We’ve got a lot of ground to cover first.

  • I will anger people who can’t process generalizations that have exceptions. “Imperative languages” is a big set. I don’t know every language any more than anyone else does, but when I generalize assume I may know the exceptions. For instance, one of my first generalizations is that trying to program imperative languages like Haskell, with lots of small functions everywhere, results in a parenthesis explosion. Ruby may be flexible enough that this is not the case, to some extent also Perl, assuming normal levels of effort (e.g., no weird source rewriting). Assume my points hold if it is true of most imperative languages. Listing every exception I know, which of course will still miss others, is just tedious for everyone, so please take this particular example as indicative of general understanding of the concept of exceptions. Discuss them, sure, but don’t assume that just because Lua (for example) has some particular feature that implements something I claim imperative languages generally do not invalidates the point I’m making.

I will base the code samples in this on Go, for a couple of reasons:

  1. It’s the language I’m using the most lately, though I’ve done Python and Perl lately too.
  2. It is probably the least “multiparadigm” modern language in any sort of popular use, and the one most syntactically hostile to what looks like “functional” code, so if a lesson from Functional Programming can be imported into Go, it can be imported into anything. It is the worst case scenario for “functional programming”.

 

Right. Enough throat clearing. Let’s get on with it.

Applying Purity To The Imperative World

One of the distinguishing features of functional programming is an emphasis on pure functions. I will start with an initial sloppy definition of purity to get us started, and will refine it into a range of definitions later. If you don’t like this first one, bear with me.

For now we’ll go with, a pure function is a function that considers only its inputs or immutable external values, makes no changes to any other values, and returns the same result given the same parameters every time. I add the consulting external values because in the case of

data Color = RGB Int Int Int deriving Show

red = RGB 255 0 0

maskRed :: Color -> Color
maskRed (RGB rIn gIn bIn) =
    let (RGB rMask gMask bMask) = red
    in RGB (min rIn rMask) (min gIn gMask) (min bIn bMask)

even if red is in some external module, since it is immutable maskRed will still be a pure function. Immutability renders that distinction irrelevant in Haskell, because you can term-rewrite an external term to simply become part of the function transparently. In imperative languages a function may incorporate references to mutable values so that is not automatically a safe assumption.

You can in principle program an imperative language purely, producing something similar to the style you’d produce in Haskell. But it is painful… to put it lightly. Haskell has a lot of affordances for programming with very small-scale pure functions. Your imperative language does not. If you try to introduce them, they will be klunkier. It can be easy to take Haskell’s space-based syntax for granted and miss just how many parentheses it elides compared to almost every other language’s syntax expressing an equivalent thing. Trying to copy Haskell produces an explosion of functions and parentheses everywhere.

And that’s also before we even consider the fact that nearly every library you use in an imperative langauge is not pure, requiring you to jump through hoops to isolate it into a pure value, if it is even possible.

If you try to carry Haskell’s highly granular function style into an imperative language, you will experience a lot of pain, while not experiencing a lot of the benefits. The details and resulting cost/benefit ratio will vary from language to language, but it is not what they are meant to do. In fact I’ve seen very few people try it online, one imagines they get broken of it fairly quickly.

So why am I talking about this if we can’t program this way in our imperative languages?

Well, in one of the statements that I know from experience will now cheese off some of the purest functional advocates…

Purity Is Relative

Another way of expressing the idea of purity is that pure code has no “effects”, though in this context that just moves the question into “ok, well, what is an ’effect’ then?”

Again I will not try to create a universally-agreed upon definition, but call it “a change in the environment that can be witnessed by other code”. By “other code” here we generally mean “code written in the subset of the language defined as ‘safe’ by the language designers”. We say this because if we declare that any unsafety in the language renders the language 100% unsafe and all 100% unsafe languages are therefore equally unsafe with no distinctions, then there is no debate to be had; all languages are completely unsafe. While there may be a mathematical sense in which this is true, it is not a practically useful definition.

In Haskell, it is trivial to make a “pure function” that will crash the process if evaluated fully. Evaluating an infinite list while retaining its head, resulting in Haskell trying and failing to manifest the entire infinite list in RAM is “pure”, that is, evaluating an infinite list is not considered an “effect”. It is hard to handle that as an “effect”; it is trivially a Turing-complete thing to know about a function in advance. Fortunately, we can get a long way on modern computers by just sort of closing our eyes and pretending that’s not an “effect” a computation can have on an OS process, even though “killing every other currently-running computation in the program” is almost the epitome of a witnessable “effect”; other computations will be forced to witness this effect, unless you want to get pedantic about computations that no longer exist being unable to witness anything.

But I am not criticizing Haskell not considering that an effect. It is a choice so practical that it is all but forced on them. Few languages have a useful capacity to deal with out-of-memory issues at all, especially if we ignore “run the GC and hope for the best” as a solution; even fewer can do it well.

Furthermore, real processors are mutable at their core. CPU registers are fundamentally mutable. Memory is fundamentally mutable. Any pure computation you like will inevitably have effects viewable by a debugger, even if there is no way to view them from within the safe subset of the language.

Again, this is not a criticism of the concept. But it does mean that even in Haskell the concept already is relative to your definition of what exactly constitutes an “effect”. We already can’t call any non-trivial Haskell function “pure” if we consider it as mutating CPU registers and RAM or if we consider it relative to what a debugger can witness.

So, in light of that discussion of purity and “effects”, consider this Go code:

type WebpageFetcher interface {
	FetchPage(url string) (contents string, err error)
}

func Tail10(fetcher WebpageFetcher, url string) string {
	page, err := fetcher.FetchPage(url)
	if err != nil {
		return ""
	}
	if len(page) < 10 {
		return page
	}
	return page[len(page)-10:]
}

Is Tail10 a pure function?

The obvious answer is “no”. Fetching a web page is not only an “effect”, it’s a huge set of effects. Sockets opened, packets sent and received, and at the end of that, any arbitrary content could be returned every time it is called because the target webpage need not be constant itself. (And all of that just to throw it all away except the last ten bytes…) It couldn’t be less pure.

The obvious answer is wrong. The correct answer is “insufficient information”.

Consider an implementation of WebpageFetcher:

type TestWeb map[string]string

func (tw TestWeb) FetchPage(url string) (string, error) {
	return tw[url], nil
}

And a call to Tail10 in the test code as such:

func TestTail10(t *testing.T) {
	url := "https://jerf.org/test"
	web := TestWeb{
		url: "more than ten chars content",
	}
	if Tail10(web, url) != "rs content" {
		t.Fatal("Tail10 is not working correctly")
	}
}

This particular invocation of Tail10 is pure. You can’t label a map as immutable in Go, but by inspection we can see it will never mutate it. Two values are passed in to this function, which could without loss be marked as “immutable” in a hypothetical version of Go which supported that, and the code would still run, and have no witnessable effects because it returns an entirely new value.

Of course, if you pass it a more natural implementation of a WebpageFetcher that in fact did reach out to the web, the function would be blatently impure.

In this particular case, the Tail10 function returned a new value. If the “business logic” was expected to reach out and perform some action that would mutate state, like, say, writing to an S3 bucket, you can write an implementation of the interface that records the action in itself. In this case, that implementation is being mutated as a result of the business logic, but this mutation is confined to the implementation of that object, which is only being used for testing. While it may technically be “mutation” as far as Haskell is concerned, there is still a clean boundary you can draw around the mutation to know it doesn’t escape out into your code in the case of testing code.

FP “Pure Core” -> Imperative “Purifiable Core”

Many Haskell writers have observed that a well-factored Haskell program is strongly drawn to an overall architecture that has a “pure core” and some sort of IO shell around it to do whatever the program is supposed to do in the real world. This happen in Haskell because the language is so good at dealing with that pure core that it strongly affords a style in which you move as much of the program into it as possible.

Imperative languages can not directly apply this concept very well because that fluidity of dealing with pure code at the smallest granularity is missing. But we can create packages (modules, whatever your language calls that level of architecture) that are purifiable.

This is now a routine play for me when I’m writing a program in an imperative language. Rather than intermixing all my “business logic” with all the impure code that gathers all the data for that business logic, I generally end up with these pieces:

  1. The business logic.
  2. An implementation of all the impure code in the module broken out into a class or local equivalent.
  3. An stubbed out implementation of that interface used for testing to turn the business logic into relatively pure code.
  4. A real implementation of the impure code.
  5. An “integration” test of the impure code. This is easier than setting up a full integration test of conventional imperative code that casually intermixes the logic and impure code. For instance, the business logic may read and write a number of S3 buckets with arbitrary logic done on the contents, but the impure code to read/write some S3 value in some particular way can be tested in isolation to do just the thing we expect.

This may seem like a lot more than just writing the code. It really isn’t a lot more. It’s mostly what you need to write to have minimally professional code anyhow, just with more structure than you may be used to. It is more, though, I can’t deny. However, it is in the range where if you save only one multi-hour debugging session brought on by complicated mixtures of business logic and impure code, it has already paid off. It also gets easier as you practice this style.

If you design your system with all this purifiable code from the beginning, it begins to compound. When you write a component that depends on three other purifiable subcomponents, the purified components of those three subcomponents can then become part of the purification system for this component.

This is going to be one of the major recurring themes in this series of posts. While imperative languages resist taking the purity down to the granularity that functional programming languages do at the small scale, you will find them much less resistent at medium scales. You will be going somewhat out of your way to program in a style that would likely not come naturally to someone who never stepped out of imperative languages, and it is still a bit more expensive in the short term than a “natural” imperative approach, but it amortizes much more nicely over a medium-sized chunk of code and in my experience almost immediately pays off when you are able to write better test cases. But even if you skip that, it pays off in the medium term anyhow.

Purifiable Code in Haskell

You can do something similar in Haskell; there is more than one way to do it, but free monads are the most similar to the approach I take in imperative code..

However, this is not commonly done. I think that’s probably because Haskell makes it easier to purify things to a much greater extent, plus the aforementioned “more than one way to do it”. Testing pure code is easy, and what is left is something that can generally be integration tested, plus the fact that the type system itself constitutes more testing than in imperative code.

Still, it is not unheard of in the functional world. Here’s an example of it. Note this uses ekmett’s free package for Control.Monad.Free, not control-monad-free.

{-# LANGUAGE DeriveFunctor #-}

module Tmp where

import Control.Monad.Free (Free(..), liftF)
import qualified Data.Map as Map

-- Build the API we want to define
data Fetcher next =
    FetchPage String (String -> next)
    deriving Functor

-- Some machinery to make it easier to use.
fetchPage :: String -> FetchProgram String
fetchPage url = liftF (FetchPage url id)
type FetchProgram = Free Fetcher

-- A simple program to fetch two web pages and do some
-- simple manipulation to them
program :: FetchProgram String
program = do
    p1 <- fetchPage "http://jerf.org"
    p2 <- fetchPage "http://jerf.org/iri"
    return $ take 12 p1 ++ take 12 p2

-- Define a fake web.
urlContents = Map.fromList [
          ("http://jerf.org", "a very good website"),
          ("http://jerf.org/iri", "a very good blog")]

-- Run 'program' completely purely. Note that for this
-- to type check, it is indeed rigidly required to be String.
testRunPure :: FetchProgram String -> String
testRunPure prog =
    case prog of
        Free (FetchPage url g) -> do
            let contents = Map.findWithDefault "" url urlContents
            testRunPure (g contents)
        Pure r -> r

-- Run 'program' but with an implementation that does
-- real IO. In the interests of not boring the reader,
-- this does not actually do fetching, but we print
-- some things to demonstrate that we are indeed in IO.
testRunIO :: FetchProgram String -> IO String
testRunIO prog =
    case prog of
        Free (FetchPage url g) -> do
            let contents = Map.findWithDefault "" url urlContents
            putStrLn $ "Contents of " ++ url ++ ": " ++ contents
            testRunIO (g contents)
        Pure r -> return r

We can interrogate this module to super-double confirm that the purity of program does indeed depend on which interpreter we use to run the program value:

ghci> :t testRunPure program
testRunPure program :: String
ghci> :t testRunIO program
testRunIO program :: IO String

I suspect a second reason Haskell programmers don’t use this very often is that this is actually a bit of a pain. Imperative code lacks the guarantee of purity, but it is easier to set this structure up in an imperative language. Anyone reading the imperative langauge will understand immediately what is happening with purifiable modules. They may not quite understand why you wrote it that way, and may try to “simplify” it out of existence (hopefully the great testing you wrote will talk them back out of it), but it won’t confuse them. This Haskell code works, but it’s a lot of machinery to achieve it, even by Haskell standards, so it’s not worth bringing it out if a conventional pure core does the job, which it usually does.

Component Simplicity

There is a common metaphor for hyperspace travel in science fiction, where some scientist type explains how the FTL works by taking a piece of paper, drawing a line on it to demonstrate “normal” travel, then folding the paper to bring the origin and destination together.

Imperative code affords straight line approaches to problem. I have some code. It does 73 things. I need it to do a 74th thing, in the middle of one of the things it currently does, which is itself a complicated mixture of the other 72 things going on. The most direct approach is just to slap some code in the middle of the thing that does the new thing. I have seen many code bases that are the result of person-centuries of this style of programming. Despite superficially having functions and modules, it is architecturally just a list of things to do, often multiple lists haphazardly mashed together.

Imperative code affords this style of programming because it is often in the moment the locally easiest thing to do, and imperative programming won’t stop you. If you can’t do the thing you want right where you want to do it, it is generally not too difficult to bash on the code until you can. Do you need to add a few more global variables, add a parameter to a dozen functions, and add some more feature flags to interact with the dozen flags already present? No problem. Some imperative programming cultures positively celebrate the number of ways their language offers to bash code into compliance in this way. They are welcome to that celebration, but I part ways with them philosophically on that point.

If you try to program Haskell this way, it will at the very best be an inferior imperative language, and at worst, it’ll beat you with a crowbar as a temporary measure while it finds something to really put you down with.

Instead, you need to think like the scientist with the FTL demo. Sadly, you won’t be able to fold your way straight to the destination in one shot, but one way of looking at a well-structured Haskell program is to see it as a collection of ways of folding the problem space, shaving a bit of dimensionality off here and a bit more there, doing some folds that you really can’t get away with in imperative languages backed by the safety of the strong type system, until eventually your whole program collapses into a handful of lines taking the input and transforming it to the output, each token deeply imbued with exactly the semantics and meaning you need to operate on this domain.

Imperative code often devolves into writing things that exponentially expand the state space of your program and hoping you can find a happy path through your newly-expanded space without wrecking up too many of the other happy paths. Functional programming generally involves slicing away the state space until it is just what you need.

It isn’t that imperative code forbids this approach, and indeed there are many imperative code bases that are written on the “fold the problem space” principle, at least in the imperative languages that provide tools for doing the slicing. (Dynamic scripting languages are hard to use this way; they were born on day one as a rejection of the restriction-based mindset, so they fight you on a profound level if you try to restrict the state space.) But imperative languages will at best permit such an approach, whereas Haskell all but demands it. The meandering, plodding, exhaustive and exhausting list of instructions almost linearly written out and bashed into compliance by years of effort and bugs reported from the field that so many imperative programs naturally devolve into doesn’t work very well in Haskell.

Haskell programs generally acheive this by building small things from the bottom up, and then creating rich ecosystems for combining them. This is, for instance, one of the main virtues of “monads” and why Haskell uses them everywhere; it isn’t because of what any specific monad interface implementation is, it is because once you have conformed a data structure to the monad interface, you get the entire monad ecosystem “for free”, just as implementing an iterator gets you the entire iterator ecosystem for free for that implementation.

Note in my previous section when I showed a “free monad” that the data structure only had to implement the particular aspects of the specific driver that we wanted. Once we had that, the “program” I wrote with that driver did many things other than just fetch web pages. You get the whole world of Control.Monad, you get monad transformers and all sorts of other ways of combining small pieces into large structures. As small of an example as that may be, it demonstrates how that code does not simply “use IO”, but can be combined with IO, or combined with software transactional memory, or combined with many other things at a very fine level.

Of course imperative code allows combining code. We would not have gotten very far in programming if we were literally forced to write nothing but straight-line scripts that do one thing at a time in exhaustive detail as I impugned imperative programming with above. But again, Haskell permits and/or forces you to weave code together at a much more granular level than imperative languages. One of my favorite type signatures in Haskell is STM (IO a), not because I used it a lot but because of what it implies, “truly” pure code blending together with guaranteed-safe transactionally pure imperative-looking code to finally produce instructions to produce some real-world effect, all able to weave together very intimately, yet safely.

I have to remind myself after writing that that in practice it is somewhat less idyllic than I may have made it sound. Sharp corners start emerging if you want to mix in other effects systems or monads. Still, if that is a failure, it is a failure at a task that imperative programming langauges can’t even conceive of.

Again, if imperative languages can’t get down to the level of fluidity that functional programming languages can, what is the lesson here? The lesson is that it is indeed possible to program like that, even with complicated use cases, and the practical takeaway is that imperative languages can still achieve this at larger scales of architecture.

One of my favorite bits of code I’ve ever written is code that can run against our corporate Wiki, which is XML under the hood, and extract bits of the page as JSON. Big deal. It sounds like the sort of thing you’d get an intern to write. But it is not written as a big script that grabs the XML, does all sorts of ad-hoc things to extract whatever ad-hoc data I want, then outputs it into an ad-hoc JSON format. It is written as a set of operators you can define as a data, then interpret those operators to turn it into JSON, in a format that mirrors the desired JSON file format.

It uses XPath to navigate to some desired set of XML nodes, then specifies how to extract the desired data from those nodes. So, suppose you have a simple table you want to extract from the Wiki page, containing, say, username in the first column and their email address on the right. The whole thing can be specified something like:

object:
  users:
    target: //[id="target_table"]/tr
    for_each:
      array:
        - target: //td:1/
          extract: text
        - target: //td:2/
          extract: text

This creates a JSON object with a key users, which goes and gets the table rows of the table identified by the given ID, and then for each of those rows creates an array, each member of which is an array containing the text of the cell, resulting in something like

{
  "users": [
    ["user1", "user1@company.com"],
    ["user2", "user2@othercompany.com"]
  ]
}

There’s about a dozen ways of extracting and operating on the node sets returned by XPath. Composing these together, together with the general power of XPath, makes it quite easy to build even some relatively complicated extractions out of just these simple parts. I knew I was doing well when I was writing the test cases for this, and even in Go I was starting to prefer embedding one of these programs into my code as a string and executing it rather than writing the equivalent code by hand. Subsequently it has been used in enough ways that it has been easier to write these extractions than to write ad-hoc code to do one. Most importantly, it has been enough easier to make things happen that wouldn’t otherwise have happened.

There’s another code base that I’m currently working on that involves exactly that long meandering list of ad-hoc instruction written over decades, all of which is designed to operate on a particular entity. In the replacement, rather than copying the same approach, it was easy to create a uniform interface (which I will be taking as its own distinct lesson later) and turn it into a large set of discrete tasks that can be understood in isolation rather than a whole big general mishmash of dozens of things being done in quasi-random orders.

An in-the-wild example is the popular concept of “middleware” in the web server world. Extracting out common concerns in such a way that you can create “stacks” of handlers, is a very powerful way to write web servers. I particularly like how in architectures where you can attach middleware to specific components of the paths, it allows you to make solid guarantees like “nobody can access the /admin URL or anything under it unless they are an admin” by structuring the middleware stack correctly, so that even if someone adds a new admin page later they can’t forget to check that the user is an admin.

Having a plugin- or interpreter-based architecture is not news in general. When I wrote in my intro that not everything I covered was going to “brand new”, this is one of the biggest examples of that I had in mind. Imperative code was operating like this before functional programming was much more than a gleam in a researcher’s eye. Programming languages themselves are one of the most pure instantiations of this concept, because the composition cuts down to the core. It is arguably impossible to create a truly large code base that doesn’t incorporate this idea into it just for the sheer sanity of the programmers, so things like “Linux kernel modules” and “image editor plugins” and a standardized format for sound sythesis modularity have been around since just after programming languages were invented.

But functional programming, by forcing its users to use it much more often, forces you to get comfortable with it. I can see by the code I find in the wild that programmers in general take a long time to work this concept out on their own. Many more imperative code bases should have this concept. It also, by forcing it on you in a very restrictive context, forces you to get good at dealing with the pitfalls of this architecture. It is very easy to cut the modules apart in the wrong manner. In imperative languages you might just cut holes through it as needed; functional programming forces you to fix the architecture. If you have plugins that sometimes need to print something, then you have to modify the architecture to accomodate the need for occasional IO. If you have to thread data through the various modules, you will need to account for that quite explicitly in the design.

Functional programming shows that you can indeed write a lot more code as simpler individual modules composed together, with a richer concept of “composition” than you might naturally obtain from decades of imperative-only coding. It is also good at teaching you how to write interpreters of all shapes and sizes, even something as small as the interpreters I showed for the free monad example, so that when you come back to imperative programming you know better what you need to do to have one.

Half Constructed Objects Are Unnecessary

Functional programming languages of the type I’m talking about have immutable objects3.

For this particular section I’m just going to be talking about “immutability”, so in addition to Haskell, I also include Erlang and any other language with immutable values only. This does not includes languages that merely encourage them, but permit mutability; this discussion is about 100% immutable langauges, where mutation is held not just at arm’s length, no matter how “long” that arm may be (Rust), but outright eliminated.

Immutable objects can not be changed once created. New objects can be created, even related ones that are “this value with dozens of fields I had before but with this particular field set to 3”, but the old ones can’t be changed. Therefore, before creating any sort of composite value, all the relevant components must be in hand, because they can not be added in later.

A mutable language does not need this discipline. You can half-construct objects, heck, you can construct entirely uninitialized objects (not in the C/C++ “uninitialized memory” sense, but in the sense that they do not contain valid values in accordance with whatever the more-specific local “valid” is). The languages neither particularly encourage this nor discourage this.

But since it doesn’t tend to discourage it, it tends to creep into code when you’re not looking. Even when you think you’re using some sort of constructor or initializer to create an object, it’s still easy to return objects that are valid according to some internal or library definition (e.g., an empty linked list, which is a legal linked list) that are not valid according to some other definition (e.g., “this object must be owned by at least one user”), and so the object is created, then through later mutations, changed to be the valid value for the more restrictive context.

One of the things that functional programmers complain about in conventional imperative languages is the presence of some sort of nil/null value, that if dereferenced will crash the program. I remember my own journey in learning about Maybe and Option and other such values.

But what’s weird is, when I’m programming Go, I can literally go weeks without encountering nil pointer panics. If you ignore “things I quickly picked up in unit tests”, and my most common instance “I added a map without initializing it in the initializer, whoops” (also quickly picked up in tests), I can go months.

I wondered for a long time what the difference is, and I still don’t know; I’d really need to sit down with someone who is constantly encountering nil pointer panics and do some pair programming and compare notes.

But my best theory at the moment is that I’ve adopted the functional programming principle of always fully constructing valid objects in one shot. I do not tend to have invalid objects flying around my programs and getting nil pointer exceptions. If it is invalid for some field to be “nil”, then the program never has an instance of the value for which it is nil.

In fact, nil pointer exceptions are merely the problem people notice, because they crash noisily. In this sense, they are a positive boon, because that’s better than the invalid value that doesn’t crash the program and goes unnoticed! And that, of course, I see all the time in code; it is not some rare thing.

The thing that distiguishes null values particularly isn’t their invalidity. There are many other ways that data can be invalid in practice. It is that you can not remove them from the values the programming language considers valid. In C, for instance, you can not declare “a pointer that is not allowed to be NULL”. There is no such type. It is forced into your data model whether you like it or not. That is the distinguishing thing about null values.

It is far more productive to consider it a particularly bothersome special case of the general problem of having constructed invalid data in general, for which nils are not really all that special, then to overfocus on them and neglect other issues. If you remove all “nils” from your program, whether through careful programming or support in the programming language itself, but you’re still routinely passing around invalid or half-constructed data in the rest of your code, you’ve made some progress, yes, but not nearly enough progress.

It sounds too simple to be true: To not have invalid data in your program, do not construct invalid data.

But it works!

Strongly-typed functional languages strongly afford this by creating very rigorous and rigid types that fully prevent the data types of the program from even representing invalid data in memory, giving you tools to enforce this very well. Imperative languages have tools that can be leveraged to this end as well, but they are generally not as strong. But “not as strong” is not the same as “does not exist at all”. Whatever it is your local language offers for these tasks should be used.

And even in fully dynamic scripting languages that lack these tools almost entirely, you can just… not create invalid data.

Easier Said Then Done… But Probably Not Why You Think

Of all my suggestions here, I suspect this is the one that could most surprise you if you try to put it into practice. You would think that refactoring your code to never half-construct objects would be easy. You take anywhere you construct an object, shift all the mutations necessary to generate the initial values above the initial construction, then construct it all in one go. The theory is simple.

The practice is bizarre. An imperative program written without an awareness of this issue will naturally tend to grind in an expectation of half-constructed objects more deeply than a wine stain on a white shirt. If you try to convert one object creation, the odds that you’ll find that the incomplete object was passed somewhere to do something, which will itself require some other value to be created, which will then call a function that itself implicitly depends on half-constructed objects in a way you didn’t expect, rapidly approach one.

On the plus side, the first time you try to do this refactoring and you find this happening to you, you’ll probably come to a deeper understanding of how hazardous this can be to your program’s architecture.

But even having experienced it, I can’t really explain it. All I can say is, find yourself a nice little 500-1000 line program that ends up half-constructing things and passing them around, and try to refactor it to never half-construct objects. It’s wild how deeply imperative code can grind this antipattern into itself.

You may even get to the point that you believe it is impossible to avoid half-constructed objects in imperative code. Which is why I am again going to bang on the fact that functional programs prove that it is in fact possible, because they give their programmers no choice, and there isn’t any program that functional programmers are particularly stymied by as a result of this constraint on them.

There’s nothing stopping you from simply making that choice yourself in an imperative program.

It’s definitely good to start with a greenfield project though; refactoring is possible but quite difficult. It’s even possible this is one of the primary things that makes refactoring hard in general, though I don’t feel I have enough experience to make that claim. Since I’ve been programming in this style for many years now, I don’t have a lot of recent experience in refactoring code that partially-constructs objects. But it does make sense to me that in the code between the object being constructed and the object actually being “valid” forms an invisible, yet quite potent, barrier to many refactoring operations.

Interface Normalization

This is another lesson that does not require functional programming to notice, but it is a lesson that functional programming puts front and center whereas independently discovering it from imperative programming may take a long time.

Functional programming derives (pun intended) a lot of benefit from its typeclasses, despite them in some sense being second-class citizens in the expression problem to sum types. Much code is written against small interfaces like “functors” or “monads”, which can then be applied to any value that fits into those interfaces. This is one of the important ways that functional programs can have polymorphism.

Obviously, object oriented programs can also have polymorphism. For the purposes of this functional programming language lesson, “object oriented” is going to include “inheritance”. Normally when discussing in a cross-language context I use a very loose definition of “OO” to be basically “some way of bundling data with methods”, which includes Go (no inheritance) and prototype-based Javascript from before the class keyword, and while this may apply somewhat to such languages, the point will be made most clearly by inheritance-focused imperative languages.

Polymorphism is a primary goal of object orientation, obtained through the subclassing relationship. It may offer others, such as the way interfaces can be added on to an OO system fairly cleanly in various ways, but it is guaranteed to at least have polymorphic subclass relationships; being able to pass CDerived to a function or method expecting a CBase is the entire point.

Subclasses can only add to their parent’s class4. As objects become arranged into their various hierarchies, and things get subclassed, it also tends to tie up the parent classes such that changes to them become increasingly difficult due to the scope of the code that the changes would affect, because all changes to a parent class affect all subclasses. This is assuming you can change parent classes at all, since they may be coming from a 3rd party library you don’t want to or can’t fork.

Consequently, OO languages afford a style where details get pushed down the hierarchy. The larger the hierarchy, the more the details end up pushed down as the parent classes tend to become more and more abstract. Thus, while you may nominally have something like a QObject in the QT library, most code is going to take something more specific, and usually, much more specific.

This is difficult to express, but a result of all of this is that an OO designer will tend to develop a design approach where everything in a class has a very specific purpose, and anything that can be pushed down the hierarchy is pushed down the hierarchy, because maintaining things at higher levels becomes more and more difficult as more subclasses hang off of them. Anything that appears in a higher level must be directly applicable to all classes that may inherit from it, not just today, but someday in the indefinite future.

This is not, on its own, a bad thing. The logic I spell out here is valid on its own terms. But it is not the only approach.

Typeclasses in functional programming derive much of their power from going the other way, though, creating interfaces that you can use to operate on things, such as “monad”, and then expanding the reach of those interfaces by happily implementing degenerate implementations of those interfaces where it makes sense.

That is to say, in OO, if a class inheriting from its base class generally had better use all of the base class’ functionality. If it doesn’t, it is probably inheriting from the wrong base class. The “correct” base class may even have to be constructed by tearing the parent apart into multiple base classes, creating those very “abstract” classes that such systems tend to create.

Typeclasses, by contrast, lend themselves well to degenerate implementations of interfaces. “File” interfaces with Close methods that do nothing, for instance, are very useful. Or one of the classic confusing things about monads, “monad” interface implementations that are just degenerate upgrades of a Functor. As noted in my monad “tutorial”, one of the problems people have with understanding “monad” is precisely that the examples people reach for tend to be these degenerate examples, many of which are actually just functors, and using those as your primary examples makes it very difficult to see what the “monad” is actually enabling.

Object oriented programming naturally affords splitting things into hierarchies, making sure that nothing extra appears at any layer. Functional programming derives a great deal of power by flattening things into common interfaces and then writing very generic code against those common interfaces.

Once again, it is difficult to do this at the same level of granularity as a full-on functional programming language allows, as evidenced (if not outright proved) by all the failed attempts to bring ergonomic “monad” implementations into so many imperative languages, but the principle can be carried into OO quite easily at slightly higher levels. I have long since made my peace with interfaces where some of the methods have a defined way of saying “not applicable to this value”, or that may have what are effectively stubs in many, even most of the implementations.

There are OO answers to this; for instance one of my favorite patterns in general (when translated into the local idiom) is the Template pattern, which in OO terms involves creating a superclass that has stub methods for subclasses to fill in that describe how to do certain things, and then code can be written against the superclass that will “do the right thing” for whatever child is passed to it. A common example is simple image manipulation code that may allow you to, say, crop and rotate an image directly, then templates can be filled out for various bit depths, alpha depths, or depending on how aggressive you are being, perhaps will even implement something like very lossless cropping for JPEGs if the requested crop permits it, but everywhere else just does the obvious pixel manipulations and saves it as normal.

On the one hand, Template is nominally a perfectly normal OO pattern that has been in the Gang of Four book since the beginning of “design patterns” as a concept, which itself means it was already in common use before that… and on the other, it is a pattern I have not seen utilized very often in the wild, and I have personally worked with developers who had to have this concept of leaving in method plugin points that some derived classes may not use explained to them, not even because they can’t understand it, but just because using OO languages tends to leave you with a blind spot here. Methods in a superclass that most classes do not use cuts strongly against the training that OO tends to give you that you should have nothing extraneous anywhere because that extraneous stuff will bite you later. It takes a certain amount of swimming against the current of OO to figure this out, but a functional programming language will serve it to you cleanly.

Once you have it, no matter how you have derived it, it is a powerful tool to add to the toolchest.

Simplicity and Global Mutable Values

At the beginning of this series, I said I was going to annoy every programmer in the world in some way or other. So hang on, here comes one of those super hot takes that I was promising:

Global variables… are bad.

Alright, for those of you who were not so offended that you instantly snapped your browser window shut, obviously I am aware that this is not news. The higher-skill end of the programming world was well aware that global variables are dangerous since before functional programming5 was even conceived of. Therefore, clearly, it is completely possible to figure out even in a purely-imperative world that they are a bad idea.

Functional programming provides a good way to view why they are bad. It may not be “the” way, but it is a good way. This idea comes to functional programming via its academic heritage, where this idea was developed for the purposes of proof.

Suppose you have this Go code, which trivially translates to most other imperative languages6:

var message = "Hello World!"
var greetCount = 0

func Greet() {
    fmt.Println(message)
    greetCount++
}

What are the arguments of the Greet function?

The function signature is supposed to answer that question. In this case, it has no arguments. It also returns nothing.

But you can also view this function from another point of view. For a variety of reasons ranging from “generally liking math” to “wanting to be able to use the tools of math to prove things about code”, academia would like to turn the Greet programming-language function into a true mathematical function.

A true mathematical function depends only on its inputs, and has only outputs. It does not “change” anything; it is simply a mapping of inputs to outputs. If you want the function to be affected by something, it must be in its inputs, and there are no secret side channels to communicate or change anything else, there is only the outputs of the function.

The particular combination of the way we teach math and the way we learn programming sometimes makes this difficult to conceive of. We tend to learn “mathematical functions” as these cute little things that take one or two real numbers and output another one. Some teachers even get confused and teach their students that what “a function may only return one value for a given input” means is that a function can only return a real number or some other atomic thing like that. But in reality, the input and the output of functions may be arbitrarily large; nothing stops a function from taking a thousand complicated matrices as input and emitting a million new functions as their output. You can make anything into a mathematical function just by enlarging the input and output until they are big enough to handle the function.

So to model Greet in this style up above, you need something like

func Greet(message string, prevCount int) (newCount int) {
    fmt.Println(message)
    return prevCount + 1
}

But, uh, wait a second here. We’ve still got a change happening in the world with the fmt.Println. That’s not a mathematical function either. So now you need something like

type IOOperation interface {
    // lots of stuff here that is its own lengthy
    // diversion to discuss
}

type IOPrintMessage struct {
    Message string
}

func Greet(message string, prevCount int) ([]IOOperation, newCount int) {
    return []IOOperation{
        IOPrintMessage{message},
    }, prevCount + 1
}

This is now a mathematical function. You could write a function to examine the count and print only if the count is even or something:

func Greet(message string, prevCount int) ([]IOOperation, newCount int) {
    if prevCount % 2 == 0 {
        return []IOOperation{
            IOPrintMessage{message},
        }, prevCount + 1
    }
    return []IOOperation{}, prevCount + 1
}

and so on.

Now you need some sort of execution engine to run this function and process the newCount so that it will be placed correctly into the arguments of any future Greet calls, as well as an engine to run the IOOperations7, and an engine to allow a programmer to somehow put all these functions together into a useful program since now these pure functions are just sitting there, ambiently existing, rather than the vivacious and proactive “functions” we’re used to having in imperative languages. That’s beyond the scope of this article, but to see what it looks like, see Haskell8.

And even this is simplified just to sketch the point and move on; if your function also accepts user input this model breaks down… but that’s a separate point.

My point in this particular post is the view that FP can provide on to why globals are bad. It is also a well-known fact in the imperative world that functions with dozens of arguments are a bad idea. They are difficult to understand due to the size of their state space.

Functional programming gives us a point of view where all global variables are additional parameters to the function call of any function that can see them. For instance, in the Go examples above, the message was not exported, so it would only affect that module. This is technically not “global”, only module-scoped. But you can also export it so the whole program can indeed see it, and then it is basically joining that global variable on to every function parameter in the program, and also every set of returned variables. In the presence of a global value, a programmer must examine a function, and the transitive closure of all functions it may call, in order to determine whether a given global variable may be used or modified.

That means that in a program with hundreds of global variables, the simplest possible function that nominally takes no arguments and returns no values actually takes hundreds of arguments and returns hundreds of values!

No wonder extensive use of globals can trash an entire program.

This means that every component that uses all these values is that much less simple, and that much harder to put together into the component-based architecture I was pitching in the previous post. It is true that it is at least the same set of global variables, so it is not as bad as trying to assemble an architecture out of components that take hundreds of different parameters to each function call, but it still means the code is much more complicated than a casual reading of the source code would indicate.

Mathematically, you could model the functions as only taking the variables they use in the function themselves and only outputting the variables they could conceivably change, which is quite a bit less bad that all funtions simply unconditionally taking them all. However, for a human being trying to use one of these functions, they don’t know simply from the function signature which of these globals may be used or changed, since it isn’t in the function signature, so as a practical matter, every individual global value a function can see adds to the real-world complexity of the function a little bit, enough that it starts to add up over dozens and hundreds of such values.

Obviously, in practice, a global is not literally exactly equivalent to adding an input and output parameter to every function. If it were so, globals would not be “dangerous”, but rather, “unusable”. But they do corrode away the program as they are added, and while at first they tend to not cost too much on the margin, many of us have witnessed programs where they passed an event horizon and interacted until they became insanely complicated.

Call it perhaps the logarithm of the number of global variables being added as complexity to every function… but then, that log value being multiplied by the number of functions you have, so it still hurts pretty badly. The log of the number of global variables may sort of level off as the program increases in size but the number of times you pay the penalty does not.

So What Does Functional Programming Have To Say About Globals?

In Haskell, if you have a function of Int -> Int, that function receives one value and returns one. The inside of that function may reference other values, but since all external values are immutable, it is still equivalent to a constant function that is term-rewritten to have that value directly incorporated, e.g.,

someValue :: Int
someValue = 6

increment :: Int -> Int
increment x = x + someValue

increment may superficially seem to resemble an imperative function that incorporates an external value, but because there is no way to change someValue, increment is in every way equivalent to the constant function increment x = x + 6. In general const values don’t count towards the global variable count, only mutable global values.

As a result, a real Haskell program that is doing real work is doing so from a whole bunch of components that are individually much simpler than most imperative components.

It is tempting to think that imperative code requires the use of these sorts of tools. Granted, the imperative world has largely gotten over the use of globals, but there’s still plenty of languages that provide a variety of ways to make functions “bigger”; for example, as another exercise to the reader you can consider what a method on an instance of some class with a complicated inheritance hierarchy that is changing values on multiple levels looks like from the mathematical perspective. It’s not necessarily hard to work it out, but it is tedious, and going through the process will give you some intuition into why the academic world eschews viewing the world that way. Mutations may still be contained by structured programming’s scopes and the rules of the inheritance hierarchy but can still bloat out of control fairly easily through the transitive closure of all modifiable values accessible through some superficially-simple method call.

The lesson of functional programming is that it provides an existence proof that these are not always necessary. It gives a framework for thinking about how extensive use of these tools can actually get you into trouble. It gives a demonstration of how composing simpler things together, even when you are limited to tools simpler than anything your imperative code provides, can still provide ways to solve problems that you may not have considered if you have not practiced with the simpler tools.

Bear in mind I’m not an absolutist on this question, because the entire thrust of all these essays is to pragmatically adapt our lessons, not try to blindly jam functional programming into the imperative world. You will have to use some of these features the languages you use offer you. However, a purely-imperative understanding of these features often accurately captures the benefits of these features but fails to teach you about their costs. This viewpoint can show the costs of using these features that complicate understanding what a given bit of code does. It can also demonstrate that there are in fact ways to get the benefits without necessarily paying the cost, by putting together simpler pieces in certain idiomatic ways that functional programming can teach you.

Adapting To Mutation

Functional programming, being an absolutist approach, simply eliminates the mutation entirely. While nifty, it can be impractical in conventional languages, and comes with some costs.

However, as hinted at early, the ability to mutate things is not a ravening monster that instantly destroys your program the moment you introduce a single mutable variable, because, again, if that were the case none of our code would work at all. At first, the costs are gentle, and the benefits can be substantial. It becomes problematic only when too much mutable state gets into one place, mutated in too uncontrolled a fashion, and it collapses into a black hole.

So the adaptation to this is to explicitly think about mutation as something that itself flows through your program, and needs to be considered on the architectural level. You make sure no part of the program collapses into a black hole by building fences around the mutation.

This is certainly a view onto a very similar concept as “encapsulation”, but it is not quite the same… it is still possible to build a well-encapsulated program where the encapsulation is not properly leveraged to control mutation. But it is the encapsulation tool set that you will generally turn to to build these fences.

There is also a natural parallel to the Law of Demeter, which I would say is a bit more general in that it worries about any communication, whereas I’m talking here about “things that result in mutation”, which is arguably the worst way to violate the Law of Demeter, but certainly not the only way. It is also bad to simply have a long chain of hard-coded references appear at all, even if it is a read-only operation.

One example of a mutation fence is an “actor”. While actors are generally thought of as providing concurrency services to their callers, they also frequently double as mutation barriers, because the callers don’t get generalized mutation ability on the values being managed by the actors. They can only interact via explicit messages. Hence, an actor forms a barrier against the kind of transitive closure of complexity explosion.

In a recent project I was rewriting some very “imperative”, in all the bad ways9, code to handle email processing. I replaced it with a pluggable module system to do the various tasks we do to the email. Using a pluggable module system is a fairly obvious architectural move, but where I applied the “functional programming” lesson is that the communication between these modules is itself tightly controlled and explicitly logged, as its own moral hazard to the code.

And rather than each job getting direct access to manipulate the email “directly”, there’s a system where the jobs get handles that only allow them to mutate a view of the email. The original email is immutable, so all jobs get equal access to it without the other jobs having interfered, and all changes through these handles can also be directly logged and inspected. As a result, while there have been a couple of issues with job order or two jobs trying to “own” the same header, they’re swiftly identified and dealt with because they were operating through an interface deliberately designed with the idea of constraining mutation on day one.

This is also a good case of what I mean by constraining mutation. The jobs have work to do, headers to change, variables to set for future jobs to save work, etc. It’s not about somehow “preventing” mutation; in general, one way or another the entire point of our programs is to change something in the world. The “constraint” in this case is to push it all through a system that allows for logging and visibility, not to deny it entirely, and to make sure it doesn’t run away unconstrained and untrackable… as it was in the original program.

To my mind, thinking about the flow of value mutation through your program is one of the basics of architecture, from the medium level, all the way up to the “how do I design the whole of AWS?” Everyone draws diagrams showing “who is talking to whom”, and that’s certainly an important diagram, but “what is being mutated, by whom, when, and why” is equally important and provided much less often.

Which is heavily related to the classic:

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious. - Fred Brooks

I would draw a distinction between the micro-scale flow of value mutation and the macro-scale. Programming languages differ substantially in their microscales, ranging from “immutability” meaning that all changes flow through function calls, the burgeoning field of mutation-controlling languages which Rust leads but in not alone in, through to arbitrary mutability in conventional languages.

However, I think that by the time you get to the macro-scale the impact of these differences is substantially muted… perhaps not irrelevant, but muted. It doesn’t matter if your 138 microservices are all written in purely immutable languages if the architecture of the microservices as a whole is sloppy and dependent on wild, unconstrained mutation as messages fly everywhere without a plan; a well-designed monolith in a conventional language may be far easier to understand and change. A well-designed Haskell program and a well-designed C# program may have very similar high-level mutation flows, even though on the micro-level they are very different.

I promised in the intro post to even anger some functional programming advocates by claiming that FP gets things wrong. I think total immutability is generally wrong here. What is much more important is constraining mutability. Rust generally has a better approach. I would love to say Erlang generally has a better approach, except that I think it whiffed here; it needed immutability between actors, not immutability within actors. But a hypothetical Erlang that was mutable within actors but never, ever let any mutability leak across actors would be another better approach10. Mutability isn’t scary in small doses, and therefore there are more solutions around than entirely eliminating it. There is a certain simplicity to it, yes, and as I also mentioned in the intro, if you want to say it’s a better approach than the standard imperative approach I won’t argue, but I do not believe it is the globally optimal approach.

Types as Assertions

Functional programming languages see types not just as buckets of things that can contain values, but as assertions, e.g., this is not just a string that I vaguely hope is a Username, but this is a validated, certain-to-exist Username for a user, and anything that receives an input parameter to some function that is a Username need not run its own validation code on the Username to ensure it is valid, and thus also doesn’t need an error path for invalid Usernames..

Haskell comes with a zoo of assertions you can make in the type system. Imperative programming languages come with fewer such assertions. No question. Lacking sum types can make some basic things like “I am guaranteed to either have an X and a Y or a Z without an X or Y” difficult to express easily. The power is undeniably less.

However, this lesser power that we have in imperative languages is also frequently neglected. I can’t tell you how sad I am to see a program full of functions with signatures like (string, string, int, int, bool) (bool, int) or (string, string, bool, bool, bool, bool, bool, bool, bool, bool) (string, string, string). Why even use a statically-typed language if you’re just going to write stringly-typed (and intly-typed and boolly-typed) code?

Sure, it’s more convenient in the moment to slap down a string somewhere rather than having a typed value, but in the long term, it tends to bring a lot of pain. You can easily get the “every bit of data is randomly validated in random places depending on what happens to be a bug today” pattern, and naturally, no two bits of code to validate a username will validate the username in the exact same way. And faced with a function taking (string, string, bool, bool, bool) along with the traditional “no documentation whatsoever” often leaves precious little to go off of to understand what a given function is doing.

And if you just want to pound out a test case it may be inconvenient to have to go get some sort of “real” Username for a test, rather than just slapping a string down.

But it does not take long at all for a codebase to descend into total madness when all you do is pass around primitives. I implore you for your own sanity’s sake to spend some time putting actual types in your imperative code. I know it’s inconvenient to have to convert things to strings to run a simple “replace” operation and convert back to the custom type, or whatever other hoops you may have to jump through. I know this pain because I feel it frequently. But the middle-term payoff is well worth it.

That first time you write a few thousand lines of code, only to discover that it turns out you need to casefold all usernames in the system, and you either A: write it in to the creation function for Usernames and know it is already everywhere in the program or B: have the compiler guide you completely correctly to every single place in the code that needs to be adapted is one of those life changing programming moments; if you haven’t had it yet, you really should.

After that you’ll start noticing that you can take a lot of the operations that you used to do with string bashing or various tricks with ints or whatever, and turn them into methods on these strong types. Which nicely organizes them in whatever automatic documentation system your language has. Which regardless of its quirks, you really should be using to its fullest rather than just leaving code lying around undocumented.

Dynamically-typed languages can make this experience more difficult (and is one of the primary reasons that despite working in them for the first ~15 years of my career I eschew them today), but used skillfully they can still do quite a lot, the moreso with all the various optional types that they’ve been growing lately.

Your homework assignment is to write some regular expression or something to extract all the types you take in your various methods and functions, and do a chart showing how many of them are primitive types like string versus types like Username and Domain and InvoiceLineItemRevocation.

The Basics of Types As Assertions

As I was re-reading this prior to first publication I realized I probably ought to include an outline of how to use types as assertions. This is an overview of what has a certain art to it, which is somewhat different from language to language, but this is the broad strokes of how to do it.

Here are the keys:

  1. Every type represents some concept of validity within your program. There should be one type per such concept. That’s not a bidirectional one-to-one; you may have many other types that are not necessarily about “valid values” (e.g., “an iterator type” and other such structural types), but there should be one per concept of “valid”.
  2. On the way in to the type, the data is checked to be valid. If it is not, the value should not be created and the code that tried to create it should be forced to handle the resulting failure immediately.
  3. Methods that manipulate never mutate the data into being invalid.
  4. Data within the type should be the “real”, internal values.
  5. Rules for encoding and external validity are checked on the way out.

Expanding on the keys:

Every Type Represents Some Concept Of “Valid”

I will again link my post on what constitutes a ‘valid’ value as I am using the specific definition I established in that post.

A type should correspond to a particular definition of “valid value”. Expanding on the discussion in that post, you may have a type Address that represents a “street address” that has been validated by your street address to exist. If you want to add a concept of “an address I can provide service at”, you probably want to have a type ServicableAddress. The reason for the second type in this case is that the ServicableAddress adds additional promises to the address, so they are not the same.

If you have a type per “valid thing”, then the rest of your program can assert the validity of the incoming values by simply declaring it in the incoming type signature.

Languages vary on how much of this the language itself supports. For instance, dynamically-typed languages, used classically with no additional type annotation support, do not permit any type-level enforcement at all. In this case, this approach is so powerful that I recommend that programmers simply adopt the discipline anyhow. That means if a type’s documentation declares the only valid way to get an instance of it is through some particular mechanism, that should be honored. If a type has any sort of constructor or init function, that should be assumed to be the official way to access the value unless otherwise indicated.

All Data Checked On The Way In

Every way of a value of the given type being created in the program must validate the data in the type in accordance to whatever concept of validity the type represents. If the data can not be fit into the type, the construction attempt must fail and the code must deal with the consequences of it.

The advantage is that this forcibly moves all validation as soon as possible in the program. It is often the case that by the time the rest of the program gets invalid data, that it doesn’t really have anything it can possibly do with the invalid data, by which I mean, it’s even already too late to throw reasonable errors. You want to get this stuff checked as quickly as possible.

Again, languages vary in how strongly they allow you to enforce that all data of a given type passed through this validation step, ranging from very strong to effectively none. Regardless of how strong your local language is, you should program in a way that you never bypass this protection layer. Any place you find you have to bypass it in such a way that you must create nominally invalid data suggests that you have a second type here.

It may feel painful to have two types instead of one but I find it is almost always worth it in the medium term. What I find happens is just as the lack of clarity would have polluted the future trajectory of the code base by forcing more and more of the code base to have a lack of clarity, just giving in and creating the second type causes that clarity to pervade the future trajectory of the code base, a vastly preferable alternative. You will often learn things about the problem space by letting it tell you about what different types are.

I’m not even calling for some sort of long-term “it’ll pay back in a few years”, I’m talking more about it’ll pay back in a few hours, maybe a couple of days. Being a super-short-term focused programmer is very expensive over the course of weeks and months and years.

Never Invalidate The Data

Once the previous is established, this should be pretty obvious.

Again, if you find there’s an operation that you feel “must” invalidate the data for some reason, there’s probably a second type trying to emerge there that represents some new concept of validity.

To a large degree, the solution to “these ideas don’t seem to work” is, yes, “go ahead and make more types”. It’s OK to have methods that just convert types, as long as those methods also maintain the validity.

Data Within Should Be The “Real” Data

That is to say, in general once a value is in a typed value, it should be all decoded into the most “real” representation of the data. For example, a “QuerystringValue” should contain Hello World and not Hello+World. Unicode value should be in whatever the local environment considers the “default” encoding. JSON values should be their decoded values, not their raw string values.

If for some reason you need the original values, which is exceptional but can happen, these should be retained but only available through methods that clearly indicate that they are the exceptional case. For example, the QuerystringValue might have a basic .Get that returns the value as a string, but if you needed exactly what was in the original value for some reason, the method should be something like .GetOriginalProtocolLevelValue, yes, clunky name and all. It should be clunky, and the docs for the method should explain what exactly it is and why you should generally not use it.

This fixes up the mishmash of things getting haphazardly decoded here and there and everywhere, sometimes multiple times, sometimes not enough times, etc. You use the type itself to enforce that it is decoded precisely once. This is also very powerful for types whose values may come from multiple places with different decoding rules.

Any Outbound Encoding Rules Enforced Upon Leaving The Type

For example, if you are writing a raw JSON output, you can have a method on some type that returns a JSON-encoded string, but it should not change the internal values. At most it may cache the result or something, but it does not change the internal value for the type.

The combination of the value being the “real” value internally and this rule for only encoding into the final output once the final output is known dodges all the many, many issues with the concept of “sanitization” that make sanitization an actively wrong idea. In this setup, no part of the system is responsible for knowing things it can’t know, in particular, the input layer is not required to guess what output may occur someday. The input layer just handles whatever proper decoding is, which it knows, because it is in the input context. The output layer is only responsible for correctly outputting into whatever the output may be, which it knows, because it is in the output context.

What Does A Bare Type Mean?

I program like this fairly routinely. You will find that I still do have some string and int and such in my code. However, in my code that takes on a particular meaning, which is that the value is truly unconstrained. When you get something from a network, you don’t need a NetworkBytes type. A standard byte array will be fine, because a standard byte array with no further type annotation already means the correct thing, which is, this is an array of bytes which have no further guarantees. A bare string means that the value is not constrained, or depending on your langauge, that it is constrained only by encoding or whatever local considerations your “string” type may have, as long as the definition matches the value.

The principles in this post are absolutely something that you can work out from first principles from years of imperative programming with strong types. If Functional Programming teaches anything here it is mostly that by having immutable values it becomes impossible to create invalid values that will be “fixed up” later, so it really leads you in this direction by virtue of removing that crutch from you, and really drives you harder in the direction of realizing that you often need a second type rather than having a single type with “modes”, or a single type that straddles two concepts of validity at the same time. Strict functional programming is excellent practice and a great way to be taught by fire by being all but forced to learn thees practices, but you can learn it from imperative code alone for sure. However, I observe that in practice most do not and I see the same characteristic mistakes over and over again.

The Imperative Programming Monad

Up to this point I have not followed through on my promise to offend everyone in general, and in particular, functional-programming-in-imperative-languages advocates. Everything’s been pretty positive about functional programming.

This is where the tide is going to start to turn, but I’m going to lead into it a bit before getting really offensive.

“Programmable Semicolons”

Let’s talk about monads, and in particular, monads in their capacity as “programmable semi-colons”.

If you understand why monads are “programmable semicolons” it’s a reasonably good sign you have a good understanding of them. I don’t think it’s the best way to understand them from the beginning, but it’s a good development of an initial understanding. I am not jamming a full monad tutorial in here when I’ve already got one (see prior link), so I’m just going to assume that you’ve read that and conclude with why this makes sense.

result :: [(Integer, Integer)]
result = do
  x <- [1, 2, 3]
  y <- [4, 5, 6]
  return (x, y)

This operates “in the list monad”, or as I prefer to phrase it, “with the list’s implementation of the monad interface”. To imperative eyes it looks like this assigns a list to x and y and returns the two lists together, but this actually “rewrites the semicolon” to say something like “when taking values from a list through <-, instead of taking the list, feed the values one at a time to the value and gather a list of the results of running the rest of this function.” So first x is assigned 1, the rest of the function is run and gathered into a list, then x is assigned 2 and gathered into a list, then 3. This also recursively applies to the y line, so the end result is that we end up with 9 things:

[
  (1,4), (1,5), (1,6),
  (2,4), (2,5), (2,6),
  (3,4), (3,5), (3,6)
]

As I mentioned in my post about monad, this exact functionality is a bit of a parlor trick because it gets out of control exponentially quickly, but the general principle has a lot of utility to it and leads to all those wonderful monad operations you’ve heard of, especially the ones that go beyond some variant of “value container”, like software transactional memory, which “rewrites the semicolon” into being resumable transactions instead of statements that either succeed (do what the line says they do) or fail (throw some variation of exception and start walking up the stack for a handler), creating a “semicolon” that defines a “transaction” that can be restarted safely.

“Programmable semicolon” is a bit weird because we don’t expect our semicolons to affect the semantics of the lines of code this deeply, and I would certainly not push it as what monads “really are”, though. It’s more of a convenient way of thinking about how some particular types that implement the monad interface are used.

Now, imperative languages can be said to “lack monads”, or, you could say, imperative langauges are locked into the one “semicolon” that they ship with. However… consider the following realignment of your mental map.

An Example: Stream Processing

Let’s suppose you’ve got yourself a network application. It’s going to read in some newline-delimited JSON statements, run a couple of transforms over them, and wait until a statement matching a certain predicate occurs. Then it will return that statement and return.

There’s many ways to do this, but let’s take a favorite and do this as a pipeline. You can easily construct this as:

  1. A pipeline to take a newline-delimited value off the stream.
  2. A component to parse the JSON into a specific type.
  3. The transforms or filters you want to run as some set of transforms.
  4. A component to select out the target value.

And your overall pipeline library will probably have a “driver” of sorts to allow something like this selection operation at the end to return a value in some manner.

You can easily specify this pipeline in a monadic syntax style in Haskell. While you may lack the syntax support in your imperative language for such a thing, it is not hard to either grab a library to do it, or even to bash something out that allows operation on a pipeline like this.

I’m not about to tell you this is automatically “bad”, either. In some cases it may well be an optimal solution; what comes particularly to my mind is the case where the filters and transforms in step #3 are user-specified, meaning that at the time you are writing the code you don’t know what they are or how many there will be. At that point you’re going to have to represent these as some sort of defunctionalized, code as data structures that can be executed by an interpreter (another instance of the power of that general approach), so that you can construct the pipeline at runtime.

It would be nice if imperative languages offered the ability to specify such a pipeline in a monad.

Well… allow me to introduce…

The Imperative Programming Monad

Consider the psuedocode:

def pipeline(inFile, output):
    while !inFile.empty():
        nextMessageBytes = inFile.nextLine()
        message = parseMessage(nextMessageBytes)
        transformed = transformMessage(message)
        summary = summarize(transformed)
        if summary.Important:
            output.write(summary)

It isn’t that hard to squint and see this as something like:

pipeline inFile output = 
    while (not empty inFile) $ do
        nextMessageBytes <- nextLine inFile
        message <- parseMessage nextMessageBytes
        transformed <- transformMessage message
        summary <- summarize transformed
        if summary.Important then
            write output summary
        else
            ()
            

That is not Haskell. For one thing a Haskeller would probably have a nextLine >=> parseMessage >=> transformMessage >=> summarize in there. But that’s just a succintness argument rather than a code structure one, and I’ve talked already about the general futility of trying to emulate Haskell in imperative langauges at such a small scale so I won’t rehash that.

You could say that functional programming critiques imperative programming for having only the “imperative programming monad”, or if you prefer, the “sequencing monad”. This is a valid critique.

However, when what you want is the sequencing monad… there’s nothing wrong with using what is already present in your language.

And wanting sequencing is a rather common thing. Less common perhaps than an imperative programmer who has never tried out functional programming might recognize, but still fairly common. One might be able to mathematically visualize a transformation pipeline in such an abstract manner that the order of operations ceases to be of interest to the mathematical model, but here in the real world, where “causality” is a thing that we have to deal with no matter what hash it makes of our real mathematical models, it’s still pretty common.

When it is in fact what you need, there is no crime in using the built-in monad that imperative languages come with. In fact reaching for a “pipeline” library, if you are not going to use that library for any other purpose, such as concurrency control, distribution across multiple nodes, or using some external event bus, if all you’re doing is sequencing several operations… what you’ve got there isn’t “better than imperative code”, what you’ve got there is an inner platform. That’s the opposite of “better than imperative”.

Now that’s going to annoy some people.

What Does Actual Functional Programming Look Like?

I am reliably informed by many people that the core essense of functional programming is the extensive use of map, filter, and reduce, that these functions are superior to for loops in every way, to a degree that they should be jammed in to every possible language with no need to analyze the cost/benefits of such an approach because the benefits are just so incomparably amazing that the costs couldn’t possibly be relevant, and even wondering about the costs just proves I Don’t Get It.

Most of this blogbook (for those coming in first to this post) has been spent observing the benefits of functional programming that are not directly related to map, filter, and reduce. But I have not yet examined the question… is this in fact the core essense of functional programming in the first place?

To heck with theory; let’s go to practice. Let’s take a look at a real Haskell project. I know of two major Haskell projects that have transcended the Haskell ecosystem to become useful programs people download for their own utility, xmonad and pandoc. But xmonad is a strange program, doing extensive binding with C libraries and interacting with all sorts of system services… “not that there’s anything wrong with that”, as the saying goes, but as a result it’s not very typical. By contrast, pandoc is core Haskell functionality; parsing, manipulating ASTs, re-emitting those ASTs, basically a big compiler suite for documents. It couldn’t be more core Haskell. Let’s take a look at that.

Pandoc Source

I’m going to get the latest released version as I write this, the source for 3.6.2.

First, let’s take a quick look at the lay of the land. The source code is conveniently in a src directory, nicely isolate from the test code, so I cd src. Running cloc from within that directory:

$ cloc .
     221 text files.
     221 unique files.                                          
       0 files ignored.

github.com/AlDanial/cloc v 1.98  T=0.20 s (1107.4 files/s, 448342.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Haskell                        221           7655           8157          73661
-------------------------------------------------------------------------------
SUM:                           221           7655           8157          73661
-------------------------------------------------------------------------------

So we’re dealing with 73,661 lines of Haskell source.

map/filter/reduce

Obviously, it should be full of map/filter/reduce, right? Let’s take a quick survey.

Note that for all of these greps and counts I’m about to do, I manually examined them to make sure they are at least approximately correct. I’m sure they’re off here and there, but changing a 47 to a 49 in these discussions isn’t going to change my conclusions much.

I’m going to exclude import statements because those aren’t actual uses. Haskell uses a few different kinds of fold functions for what is referred to elsewhere as reduce, so:

$ egrep -r fold[lr]\'?\\b * | grep -v import | wc -l
154

This regular expression specifically catches foldl, foldr, and foldl', and excludes other functions named with fold in the name. If that is bothering you, bear with me; I will be justifying this choice later.

That’s 0.21% of the lines of code in pandoc.

For comparison, function composition . appears on 1767 lines, 2.4% of the lines of code, and many of those are more than one use per line.

But that’s obviously the weakest of the bunch. How about filter?

$ egrep -r `\bfilter\b' * | wc -l
196

Better, but still only %0.27 of the lines of code in pandoc.

But obviously, I’m sandbagging, what about the big one? What about the sign of good code, the thing that separates functional programming from filthy imperative programming, the thing that obsoletes all for loops forever and ever?

$ egrep -r '\bmap\b' * | wc -l
847

1.1%.

For comparison, mapM:

$ egrep -r '\bmapM\b' * | wc -l
581

Which is 0.8%, fairly similar.

“A-HA! I got you, jerf! mapM is just another type of map!”

No, it really isn’t. The code I see people claiming is the way to code specifically use map, and only map. Endless, endless invocations of map, specifically operating on arrays. If there was any further abstraction on this I would be less perturbed, but it’s map map map, over arrays or the closest equivalent, specifically.

In fact what these people strike me as advocating isn’t functional programming at all, but some sort of degenerate array-based programming, with a very restrictive definition of “array”. Now, that is an interesting paradigm in its own right, but truly array-based languages tend to use more than just map. Jamming degenerate array-based programming into foreign languages is even worse than trying to take functional programming lessons into those langauges.

mapM is another recursion scheme entirely, about which more in a moment, but first there is another point that people who know Haskell need to have addressed, which is, what about <$>?

The answer is that 2,311 lines, or 3.1% of the lines in pandoc, involve that. However, there are two problems with that:

  1. <$> is useful for any Applicative, and even a cursory scan through the results for that grep will show many, many uses clearly not related to lists. The vast majority, from my eyeball scans. If there’s an easy way to extract which of those are specifically instantiated on lists, I don’t know what it is.
  2. However, I also don’t really care much, because I would assert from personal experience that even when used with lists, <$> takes on its own internal meaning, and after a while you aren’t really thinking “map on this list” but “apply this function to this data structure (whatever it may be) through the applicative instance”. In an experienced Haskell programmer’s hand, there is a real difference of intent between <$> on a list and map, even if they do the same thing.

Still, you have to admit, that’s a non-trivial usage of all those functions.

Yes. However, the question is, when people are claiming they’re “writing functional code” in Javascript, what does it look like? Well, from what I’ve seen, it’s not so much that map is 1.1% of the lines, which is to say, if you keep your functions to reasonable lengths you use it on average zero to one times in a function; these people are outputing “correct” code that looks like 25% map calls, or more.

Other Abstractions

So I agree that pandoc still shows non-trivial uses of those functions, but you should also remember that I’m writing these words, and unless you know Haskell fairly well, downloaded and unpacked pandoc as I’m going, and have been running your own greps to cross-check me, I’m controlling the spotlight of your attention. So let’s widen the focus to some other operators in other recursion schemes, because what pandoc uses even more than lists is applicative functors. Let’s take a look at <$ and other specifically Applicative operators:

$ egrep -r '<\$[^>]|[^<]\$>|<\*>|<\|>|liftA|[^>]\*>|<\*[^>]' *  | wc -l
2201

That’s 3% of the lines. <\$[^>] and its friend in the opposite direction is to avoid picking up the very common <$> in this grep.

What about monad-based abstractions?

$ egrep -r '<-|>>=|mapM|return|mzero|mplus|mapM|forM|sequence|>=>|<=<|msum|mfilter|filterM|foldM|replicateM' * | grep -v -- -- | wc -l
10579

The grep -v -- --, which has the effect of removing all lines with --, is because sequence picks up a lot of comments that contain the word “sequence” in their text, so I just discarded any lines with comments.

Monad abstractions are operating on 14.4% of the lines in the codebase, again, quite often more than once per line. I’ve also not counted the usages of the mtl library, also based on monads.

Ah, yes, jerf, of course, monads. Why, everyone advocating highly functional programming in imperative languages knows how important Either is.

Whoa, there, strawman cowboy. I said monad, not Either. (Or Option as it is known in other languages.) Pandoc uses monads, not just Either.

In fact, let’s recalibrate a bit here, and look at the number of function signatures in Pandoc:

$ grep -r :: *  | wc -l
6477

(Using double-colon for type declarations in Haskell makes some people grumpy since most peer langauges use a single colon, but it really helps me out here. Also, by visual inspection, Pandoc has few, if any, uses of :: to annotate anything other than a function.)

How many of them involve Either or Maybe in the tail position, such as one would expect when using its monad interface?

$ grep -r :: *  | grep '(Maybe|Either)(?!.*?->)' | wc -l
490

(The (?!.*?->) says “not followed by a -> on the same line”.)

Not all that many. How many simply mention Either or Maybe at all?

$ grep -r :: *  | grep '(Maybe|Either)' | wc -l
540

Again, for the supposed sine qua non of functional programming and the distinctive reason to have “monads” in your language, that is not really all that much.

Especially when we compare to…

$ grep -r :: *  | grep PandocMonad | wc -l
2491

Oh ho. What have we here?

The PandocMonad typeclass contains all the potentially IO-related functions used in pandoc’s readers and writers. Instances of this typeclass may implement these functions in IO (as in PandocIO) or using an internal state that represents a file system, time, and so on (as in PandocPure).

Well, doesn’t that just sound like a familiar concept if you’ve been reading this whole series. Definitely not a Maybe or an Option.

Many other types used for their monad implementation are used in Pandoc. In this case my point is not about line counts, but about the diversity of the monad interface. These include, but are not limited to:

  • ParsecT, which also involves the entire concept of monad transformers. If your language can’t even properly express the monad interface you should probably forget about monad transformers.
  • MonadIO, as mentioned before, to abstract out the IO monad implementation and allow swapping it out.
  • A variety of utliity functions used by things operating on the PandocMonad value, but since they don’t need the IO, are implemented simply on Monad.
  • Other motley monads like Reader, Writer, and so on.

“monad” usage in functional languages extends well beyond “Maybe” and “Option”, which are also basically Babby’s First Monads. Again, nothing wrong with them, programming is all about using those foundations over and over again, but if you say “monad” when you mean “Option”, you should know you are using the term incorrectly, e.g.,

I wish Language X would add support for monads.

is emphatically not a correct way to say

I wish Language X would add an official Either/Option type.

The latter is a fairly small subset of the former, and can be done in many useful ways that often don’t entail adding full support for “monads”. Adding support for monads themselves is actually a fairly heavy ask for most langauges due to the fact it is generally not just a matter of “adding support” but significantly rearchitecting some core elements of the language for it to work well.

What’s The Point Of All Of This?

The point of all of this is to demonstrate with a concrete example of one of the definitive functional code bases that functional programming is not about map, filter, and reduce.

The point is that anyone who is claiming that it is the point of functional programming is loudly proclaiming that they did not learn functional programming and do not understand it.

Which is, on its own terms, fine. There’s no obligation, no licensing authority requiring passing some curriculum about functional programming. But I would appreciate it if you’d tone down the demands that we all switch to map/filter/reduce “because functional”.

Functional programming is about taking small building blocks and composing them into larger blocks based on using various recursion schemes to tie them together in a way that retains their original properties, and building larger and larger schemes out of that. In this formulation, map/filter/reduce are some of the basic bricks, but just that: bricks. You won’t get far if you try to architect things on a brick-by-brick basis. Good functional programming builds up its own useful abstractions, like the PandocMonad, and the things built on that, and the things built on those things, to build the tools to solve problems while never losing the fundamentals of functional programming, like immutable data.

Moreover, both in theory and in the practice I observe, map in imperative programs actively fights the sorts of abstractions we need to deploy in our programs, which is the core reason I dislike it so much in non-functional languages. Pervasive use of map qua map, where it is generally not being used to create higher-level abstractions, is a bad sign. A single use of map is to my mind neutral, but when it starts getting chained is when it starts creating blobs of code that are more difficult to refactor than non-map-based code, and it’s hard to overstate the size of that disadvantage.

Imperative code can not do exactly what functional programming does, for all sorts of reasons, but you can use a similar mindset of building up from primitives ever more useful abstractions that still retain the fundamentally useful properties of the lower abstractions. map/filter/reduce enters that conversation only on a very low level, whereas as this blogbook has repeatedly hammered on, you’re much better thinking about the medium levels and up with imperative programming.

A program in an imperative language is far more “functional” if it uses many for loops but all of its components are easily “purifiable” then if all of the for loops have been replaced by map/filter/reduce but the architecture is fundamentally based on impure mutations from top to bottom, because it’s just a bog-standard, imperative-in-all-the-bad-ways program that had its for loops replaced with map/filter/reduce. That’s because map/filter/reduce in an imperative language hardly improves the ability to compose code together in those languages, even at a small scale, and I would contend, frequently does great damage to it.

But if you have a big pile of purifiable components, you have a big pile of components that can be easily composed together into larger systems, easily pulled apart into a microservice when and if it becomes necessary, and all the architectural benefits we need to build large programs. That medium-scale-and-up composition is vastly more important for quality programming than how we spell loops.

Functional Programming Lessons Conclusion

As many others observe as well, one of the major reasons to write is to firm up ideas in one’s own head. Serializing an idea out into English words is still no guarantee one understands it deeply, but it is great progress over a very fuzzy idea that has never been fleshed out at all.

I’ve known for a long time I wanted to sit down and write this out. But I had no idea how many principles I’ve either drawn out from my times with functional languages, or had sharpened by them forcing me on to certain paths that perhaps I would have walked eventually, in bits and pieces, but functional programming forced upon me in toto.

And it wasn’t until I’d written several of these posts that I noticed the recurring theme of scale, that functional programming principles are best brought in to imperative languages at the medium scale rather than the micro scale. I find this insight to have been almost worth the writing on its own.

I arranged these points roughly in order from least offensive to most offensive, so it’s possible that you arrive here thinking I do not like functional programming. But consider how much I have drawn from it. This is a good thing. I like it.

What I do not like is what I consider the surface level application of the lessons of functional programming. I do not like people obsessing about the micro-scale of functional programming, often to the point they actually harm their own code bases, while at the same time writing higher-level structures indistinguishable from the rest of the imperative world.

If all you’ve done is walk through your program and change all your for loops into maps and reduces, but your high-level architecture is still a confused mess with no discernable boundaries or structure… at best you’re got a marginal gain, and most likely, you’ve taken a step backwards!

If you are lucky enough to work in a domain where you can use high-end functional programming languages, by all means, adopt the relevant idioms wholesale.

But if you are not, I encourage to worry less about whether you should replace a for loop with a map and which library offers the best map call and more about things like:

  • How does mutation flow through this program? Can I use mutation flow fences in my architecture to isolate parts effectively?
  • How complicated are the parts in my architecture? How can I simplify their requirements?
  • Can I harmonize what seem to be disparate types of things into single interfaces that I can then build on?
  • Can I prevent strings and ints from flowing around my program naked? How can I derive synergistic benefits from stronger types?
  • Even if I’m not using Haskell or a proof language, how can I use the local type system to prevent bugs from existing in the first place? Even in fully dynamic languages there are things that can be done.
  • How can I prevent invalid objects from being created in the first place, thus solving all problems created by invalid objects? I know it can be done because functional languages do it.

And the other sorts of things I have described in this essay series. These are far more important then questions like whether these ten lines of code are formatted this way or that way, much more consequential in the long term.

The Pragmatist’s Secret

So often in engineering we have the “80/20” option; 80% of the benefit for 20% of the effort. The temptation of a purist is to sneer at an 80/20 for not trying hard enough when clearly the correct answer is 100/100, and anything less is failing to get all the benefits that are possible.

Most of the time, as one puts more effort in, the benefit obtained rather smoothly increases, usually quickly at the beginning and then asymptotically approaching 100% towards the end. It is unusual for there to be a sudden spike in utility at 100%, which makes the cost/benefits analysis of 100/100 often quite unappealing.

However, I fully agree that functional programming is indeed one of those exceptions! The difference between a programming language in which values are always immutable and one in which values are almost always immutable is quite substantial. In one, you can build on that guarantee of total immutability; in the second, you can’t, because it just isn’t quite reliable, and as you scale up, the odds of that “not quite reliable” biting you approach 1. I consider this a characteristic of type systems in general; a type system that you can rely on is vastly more useful than a type system you can almost rely on, and it doesn’t take much “almost” to greatly diminish the utility of a given type system. Even unsafe functionality is generally fenced in by the local type system, and where that is not possible, by language convention.

So I agree there is value in the 100/100 solution for functional programming.

If that is the case, then the question looms over programming as a whole, why don’t we all, all the time, just use functional programming languages then?

The answer to that is an underappreciated cost of the 100/100 approach, which is that it tends to forstall other possible benefits. You spent all your effort getting to that 100%.

But the pragmatist’s secret, unappreciated by the purist, is that the pragmatist does not have just one 80/20 solution. Having spent only 20%, the pragmatist also has another 80/20 solution, and another, and another, and another, altogether adding up to 400% worth of solutions.

… ok, sure, that’s taking the “80/20” numbers too literally, but the general point holds. For the same “cost” you can often buy a lot more with a bunch of 80/20 solutions than one 100/100 solution, or you can buy 2 of them and a 95/60 for the thing you really need. The diverse problems we face in the real world are often better addressed with such a diverse portfolio.

Not always. There are times and places to go all-in with a certain tool to the exclusion of all else. But those are broadly the exceptions, not the rules. Exceptions a wise engineer does well to be aware of!… but exceptions nonetheless.

And so, when considered against the at-times bewildering constellations of requirements being made upon our programs, it is often the case that a 100% commitment to a pure functional programming language is a bad choice, for reasons ranging the full spectrum of “technical” to “human”.

In this it is not unique; 100% pure commitments to any choice can turn out poorly. Those around during the late 1990s to early 2000s probably still carry scars from the XML all the things! of the era. There have been plenty of other examples, and senior engineers who probably shouldn’t be senior engineers are remarkably creative in their ability to come up with dogmatic “100% solutions” to some problem that you’ve never even heard of that will destroy the local code base in time.

But you don’t have to give up everything you’ve learned. You are not doomed to the outer pit of Imperative Hell just because local engineering considerations preclude a pure functional language. You still have the knowledge and skills you picked up from your foray into the Functional Programming world. You can still apply them. You can still benefit from them.

But those benefits don’t come from forcing micro-styles on to languages that don’t benefit from them. They come from these medium-scale-and-larger lessons.

Closing

If I have infuriated you, I hope I have at least given you something to think about, and I hope that the process of chewing on these comments benefits you and your programming skills. It isn’t even necessary to agree with me to find your own ideas and principles sharpened, and if that happens, even if you disagree, I will still consider this a success.


  1. I will arbitrarily define “modern language” as one born after 1990 or so, so this includes Python and what we today call the dynamic scripting languages, and several modern static languages, but excludes C, and to a large degree, C++. Although C++ is such a mess that even figuring out if it has a given feature can itself devolve into a language lawyer cage match. ↩︎

  2. In deliberate parallel to the mythical Sufficiently Smart Compiler↩︎

  3. Since I’m discussing a very wide slice of language, “object” here means something very general; think of it as closer to the standard English meaning rather than the C++ meaning. I need some way to discuss things across languages here. ↩︎

  4. I mean, you can hack something where you subclass a method and implement it by unconditionally throwing an exception or something, but generally that’s a bad idea and only to be used in desperation. If you program in an OO language, you should learn the Liskov Substitution Principle. And then recognize that in practice it is a principle broken as often as it is followed… but your OO programming can still improve for at least being aware of it. ↩︎

  5. As a reminder or for those new to the series, I am using this term in its modern sense. Lisp predates the general recognition of globals being bad, but Lisp is not a functional programming language by the modern definition, even though it was called “functional programming” back then. ↩︎

  6. The global-ist of globals, such as C had, where there is only one global namespace, are fairly rare nowadays, but the key aspect of the variables in this example is that there is only one instance of them, shared by all users. Namespacing may allow for multiple instances of the same base name, but if they’re trivially available by all code just for the asking they’re global enough for today’s discussion. This may be called a “static” variable, or “defined on the module”, or a “package variable”, or any number of other things. The key is that any other bit of code has some way of reaching it and potentially mutating it. ↩︎

  7. This all gets horrifyingly worse if you introduce concurrency. Imagining what sort of inputs and outputs are necessary to fully model that is left as an exercise for the reader. And, indeed, we collectively know from experience global variables become even more dangerous in highly concurrent contexts. ↩︎

  8. The much-ballyhooed IO type is less about “monads” and more about being the implementation of this approach for Haskell, meaning that technically, all function in Haskell really are completely pure. It is the combination of these functions with the aforementioned engines that makes a Haskell language actually do something. The purpose of the IO type is to abstract away the RealWorld (as Haskell calls it) and the various actions you can take, and also to wrap away concurrency (see previous footnote), giving an interface to interact with other computations without having to directly witness their input/output as explicitly declared inputs and outputs of every function. ↩︎

  9. One thing I hope you understand by the time this series is done is that this is not an apologia for “business as usual” in the imperative world! Every criticism a functional programmer has about imperative code in the general sense, I’ve lived. Something does need to be done about those problems, I just do not necessarily agree that “throw everything away and switch languages” is the only valid choice. A valid choice sometimes, yes, but not the only. ↩︎

  10. As I understand it, Elixir sits on top of BEAM and provides an implementation of a language on top of it that allows for mutability, which in this case amounts too, “allows you to rebind the same name to multiple values”. Which just goes to show once again that purity is relative. ↩︎