Functional Programming Lessons in Imperative Code
A BlogBook is a collection of blog posts intended to be viewed as a sort of book on their own.
This BlogBook is on how to transfer the lessons that can be taught by strict, pure functional programming (think Haskell) into imperative programming languages.
This blog book is completely written but not yet completely posted.
Introduction to Applying Functional Programming Lessons in Imperative Code
I recommend to any professional programmer that they spend some time with a very strict functional language, enough to get to the point they can write non-trivial programs in it.
For the purposes of these posts, you can read functional programming as Haskell. There is an earlier definition of functional programming that just involves having closures, but as all modern1 languages have closures now that is no longer a relevant definition. There are other languages like O’Caml, Clojure, and Scala that have very functional flavors, but there are few languages in common use that take it to the extreme of Haskell.
I believe the lessons learned there can be carried back into any conventional imperative language. That’s hardly a controversial claim. What will be controversial is that I believe that many people are carrying very superficial lessons back while missing the deeper ones.
In this series of posts, which I am gathering up into what I call a “BlogBook” (see link in header if you’re reading this through the blog post), I propose to piss off the entire programming community one way or another.
-
I will anger those who believe there is nothing to learn from functional programming, or that a Sufficiently Smart Programmer 2 could just work everything I’m going to say here out from first principles without ever leaving their first language.
There will be a several things that I’m going to cover that we may not “learn” from Functional Programming, but it can still put a new spin on it. For instance, I would agree “global variables are bad” is something we can (and historically did) figure out just from the imperative world, but an FP-based viewpoint can enrich that view. There’s also some things imperative programming will eventually teach you, but FP will ramp up to 11 and teach you much more quickly, by forcing it on you. So bear in mind as you read this that I am never claiming this is the only way to learn these lessons, merely a good and fast one.
- I will also anger people who believe that any claim that is not accompanied by a complete history of everything related to the claim going back to Aristotle is implicitly claiming to have invented the idea from whole cloth with no previous inputs. There aren’t all that many such people numerically but they sure do like to make angry posts online about how dare they claim to have ideas.
-
I will anger those who are very attached to strict functional programming by pointing out some places where I think it gets some things wrong. Even, sometimes, “less wrong than conventional imperative programming”, but still wrong.
-
I will most especially anger those people who think that “practicing functional programming in a conventional languages” means using lots of maps and filters and making vague references to “monads”, but they really only mean
Option
. (I have already written about what monads are, and this post includes why I believe they are not that useful in your imperative language). I will anger these people because in an entire series of posts about the lessons we can learn from functional programming I am still going to call nested maps generally an antipattern in imperative languages, and things like pipelines generally a parlor trick rather than the Most Important Lesson of Functional Programming.So, you know, brace yourself for that. It’s a ways away, though. We’ve got a lot of ground to cover first.
-
I will anger people who can’t process generalizations that have exceptions. “Imperative languages” is a big set. I don’t know every language any more than anyone else does, but when I generalize assume I may know the exceptions. For instance, one of my first generalizations is that trying to program imperative languages like Haskell, with lots of small functions everywhere, results in a parenthesis explosion. Ruby may be flexible enough that this is not the case, to some extent also Perl, assuming normal levels of effort (e.g., no weird source rewriting). Assume my points hold if it is true of most imperative languages. Listing every exception I know, which of course will still miss others, is just tedious for everyone, so please take this particular example as indicative of general understanding of the concept of exceptions. Discuss them, sure, but don’t assume that just because Lua (for example) has some particular feature that implements something I claim imperative languages generally do not invalidates the point I’m making.
I will base the code samples in this on Go, for a couple of reasons:
- It’s the language I’m using the most lately, though I’ve done Python and Perl lately too.
- It is probably the least “multiparadigm” modern language in any sort of popular use, and the one most syntactically hostile to what looks like “functional” code, so if a lesson from Functional Programming can be imported into Go, it can be imported into anything. It is the worst case scenario for “functional programming”.
Right. Enough throat clearing. Let’s get on with it.
Applying Purity To The Imperative World
One of the distinguishing features of functional programming is an emphasis on pure functions. I will start with an initial sloppy definition of purity to get us started, and will refine it into a range of definitions later. If you don’t like this first one, bear with me.
For now we’ll go with, a pure function is a function that considers only its inputs or immutable external values, makes no changes to any other values, and returns the same result given the same parameters every time. I add the consulting external values because in the case of
data Color = RGB Int Int Int deriving Show
red = RGB 255 0 0
maskRed :: Color -> Color
maskRed (RGB rIn gIn bIn) =
let (RGB rMask gMask bMask) = red
in RGB (min rIn rMask) (min gIn gMask) (min bIn bMask)
even if red
is in some external module, since it is immutable maskRed
will still be a pure function. Immutability renders that distinction
irrelevant in Haskell, because you can term-rewrite an external term to
simply become part of the function transparently. In imperative languages
a function may incorporate references to mutable values so that is not
automatically a safe assumption.
You can in principle program an imperative language purely, producing something similar to the style you’d produce in Haskell. But it is painful… to put it lightly. Haskell has a lot of affordances for programming with very small-scale pure functions. Your imperative language does not. If you try to introduce them, they will be klunkier. It can be easy to take Haskell’s space-based syntax for granted and miss just how many parentheses it elides compared to almost every other language’s syntax expressing an equivalent thing. Trying to copy Haskell produces an explosion of functions and parentheses everywhere.
And that’s also before we even consider the fact that nearly every library you use in an imperative langauge is not pure, requiring you to jump through hoops to isolate it into a pure value, if it is even possible.
If you try to carry Haskell’s highly granular function style into an imperative language, you will experience a lot of pain, while not experiencing a lot of the benefits. The details and resulting cost/benefit ratio will vary from language to language, but it is not what they are meant to do. In fact I’ve seen very few people try it online, one imagines they get broken of it fairly quickly.
So why am I talking about this if we can’t program this way in our imperative languages?
Well, in one of the statements that I know from experience will now cheese off some of the purest functional advocates…
Purity Is Relative
Another way of expressing the idea of purity is that pure code has no “effects”, though in this context that just moves the question into “ok, well, what is an ’effect’ then?”
Again I will not try to create a universally-agreed upon definition, but call it “a change in the environment that can be witnessed by other code”. By “other code” here we generally mean “code written in the subset of the language defined as ‘safe’ by the language designers”. We say this because if we declare that any unsafety in the language renders the language 100% unsafe and all 100% unsafe languages are therefore equally unsafe with no distinctions, then there is no debate to be had; all languages are completely unsafe. While there may be a mathematical sense in which this is true, it is not a practically useful definition.
In Haskell, it is trivial to make a “pure function” that will crash the process if evaluated fully. Evaluating an infinite list while retaining its head, resulting in Haskell trying and failing to manifest the entire infinite list in RAM is “pure”, that is, evaluating an infinite list is not considered an “effect”. It is hard to handle that as an “effect”; it is trivially a Turing-complete thing to know about a function in advance. Fortunately, we can get a long way on modern computers by just sort of closing our eyes and pretending that’s not an “effect” a computation can have on an OS process, even though “killing every other currently-running computation in the program” is almost the epitome of a witnessable “effect”; other computations will be forced to witness this effect, unless you want to get pedantic about computations that no longer exist being unable to witness anything.
But I am not criticizing Haskell not considering that an effect. It is a choice so practical that it is all but forced on them. Few languages have a useful capacity to deal with out-of-memory issues at all, especially if we ignore “run the GC and hope for the best” as a solution; even fewer can do it well.
Furthermore, real processors are mutable at their core. CPU registers are fundamentally mutable. Memory is fundamentally mutable. Any pure computation you like will inevitably have effects viewable by a debugger, even if there is no way to view them from within the safe subset of the language.
Again, this is not a criticism of the concept. But it does mean that even in Haskell the concept already is relative to your definition of what exactly constitutes an “effect”. We already can’t call any non-trivial Haskell function “pure” if we consider it as mutating CPU registers and RAM or if we consider it relative to what a debugger can witness.
So, in light of that discussion of purity and “effects”, consider this Go code:
type WebpageFetcher interface {
FetchPage(url string) (contents string, err error)
}
func Tail10(fetcher WebpageFetcher, url string) string {
page, err := fetcher.FetchPage(url)
if err != nil {
return ""
}
if len(page) < 10 {
return page
}
return page[len(page)-10:]
}
Is Tail10
a pure function?
The obvious answer is “no”. Fetching a web page is not only an “effect”, it’s a huge set of effects. Sockets opened, packets sent and received, and at the end of that, any arbitrary content could be returned every time it is called because the target webpage need not be constant itself. (And all of that just to throw it all away except the last ten bytes…) It couldn’t be less pure.
The obvious answer is wrong. The correct answer is “insufficient information”.
Consider an implementation of WebpageFetcher
:
type TestWeb map[string]string
func (tw TestWeb) FetchPage(url string) (string, error) {
return tw[url], nil
}
And a call to Tail10 in the test code as such:
func TestTail10(t *testing.T) {
url := "https://jerf.org/test"
web := TestWeb{
url: "more than ten chars content",
}
if Tail10(web, url) != "rs content" {
t.Fatal("Tail10 is not working correctly")
}
}
This particular invocation of Tail10
is pure. You can’t label a map
as immutable in Go, but by inspection we can see it will never mutate
it. Two values are passed in to this function, which could without loss be
marked as “immutable” in a hypothetical version of Go which supported that,
and the code would still run, and have no witnessable effects because it
returns an entirely new value.
Of course, if you pass it a more natural implementation of a
WebpageFetcher
that in fact did reach out to the web, the function would
be blatently impure.
In this particular case, the Tail10
function returned a new value. If the
“business logic” was expected to reach out and perform some action that
would mutate state, like, say, writing to an S3 bucket, you can write an
implementation of the interface that records the action in itself. In this
case, that implementation is being mutated as a result of the business
logic, but this mutation is confined to the implementation of that object,
which is only being used for testing. While it may technically be
“mutation” as far as Haskell is concerned, there is still a clean boundary
you can draw around the mutation to know it doesn’t escape out into your
code in the case of testing code.
FP “Pure Core” -> Imperative “Purifiable Core”
Many Haskell writers have observed that a well-factored Haskell program is strongly drawn to an overall architecture that has a “pure core” and some sort of IO shell around it to do whatever the program is supposed to do in the real world. This happen in Haskell because the language is so good at dealing with that pure core that it strongly affords a style in which you move as much of the program into it as possible.
Imperative languages can not directly apply this concept very well because that fluidity of dealing with pure code at the smallest granularity is missing. But we can create packages (modules, whatever your language calls that level of architecture) that are purifiable.
This is now a routine play for me when I’m writing a program in an imperative language. Rather than intermixing all my “business logic” with all the impure code that gathers all the data for that business logic, I generally end up with these pieces:
- The business logic.
- An implementation of all the impure code in the module broken out into a class or local equivalent.
- An stubbed out implementation of that interface used for testing to turn the business logic into relatively pure code.
- A real implementation of the impure code.
- An “integration” test of the impure code. This is easier than setting up a full integration test of conventional imperative code that casually intermixes the logic and impure code. For instance, the business logic may read and write a number of S3 buckets with arbitrary logic done on the contents, but the impure code to read/write some S3 value in some particular way can be tested in isolation to do just the thing we expect.
This may seem like a lot more than just writing the code. It really isn’t a lot more. It’s mostly what you need to write to have minimally professional code anyhow, just with more structure than you may be used to. It is more, though, I can’t deny. However, it is in the range where if you save only one multi-hour debugging session brought on by complicated mixtures of business logic and impure code, it has already paid off. It also gets easier as you practice this style.
If you design your system with all this purifiable code from the beginning, it begins to compound. When you write a component that depends on three other purifiable subcomponents, the purified components of those three subcomponents can then become part of the purification system for this component.
This is going to be one of the major recurring themes in this series of posts. While imperative languages resist taking the purity down to the granularity that functional programming languages do at the small scale, you will find them much less resistent at medium scales. You will be going somewhat out of your way to program in a style that would likely not come naturally to someone who never stepped out of imperative languages, and it is still a bit more expensive in the short term than a “natural” imperative approach, but it amortizes much more nicely over a medium-sized chunk of code and in my experience almost immediately pays off when you are able to write better test cases. But even if you skip that, it pays off in the medium term anyhow.
Purifiable Code in Haskell
You can do something similar in Haskell; there is more than one way to do it, but free monads are the most similar to the approach I take in imperative code..
However, this is not commonly done. I think that’s probably because Haskell makes it easier to purify things to a much greater extent, plus the aforementioned “more than one way to do it”. Testing pure code is easy, and what is left is something that can generally be integration tested, plus the fact that the type system itself constitutes more testing than in imperative code.
Still, it is not unheard of in the functional world. Here’s an example
of it. Note this uses ekmett’s free
package
for Control.Monad.Free
, not control-monad-free.
{-# LANGUAGE DeriveFunctor #-}
module Tmp where
import Control.Monad.Free (Free(..), liftF)
import qualified Data.Map as Map
-- Build the API we want to define
data Fetcher next =
FetchPage String (String -> next)
deriving Functor
-- Some machinery to make it easier to use.
fetchPage :: String -> FetchProgram String
fetchPage url = liftF (FetchPage url id)
type FetchProgram = Free Fetcher
-- A simple program to fetch two web pages and do some
-- simple manipulation to them
program :: FetchProgram String
program = do
p1 <- fetchPage "http://jerf.org"
p2 <- fetchPage "http://jerf.org/iri"
return $ take 12 p1 ++ take 12 p2
-- Define a fake web.
urlContents = Map.fromList [
("http://jerf.org", "a very good website"),
("http://jerf.org/iri", "a very good blog")]
-- Run 'program' completely purely. Note that for this
-- to type check, it is indeed rigidly required to be String.
testRunPure :: FetchProgram String -> String
testRunPure prog =
case prog of
Free (FetchPage url g) -> do
let contents = Map.findWithDefault "" url urlContents
testRunPure (g contents)
Pure r -> r
-- Run 'program' but with an implementation that does
-- real IO. In the interests of not boring the reader,
-- this does not actually do fetching, but we print
-- some things to demonstrate that we are indeed in IO.
testRunIO :: FetchProgram String -> IO String
testRunIO prog =
case prog of
Free (FetchPage url g) -> do
let contents = Map.findWithDefault "" url urlContents
putStrLn $ "Contents of " ++ url ++ ": " ++ contents
testRunIO (g contents)
Pure r -> return r
We can interrogate this module to super-double confirm that the purity of
program
does indeed depend on which interpreter we use to run the
program
value:
ghci> :t testRunPure program
testRunPure program :: String
ghci> :t testRunIO program
testRunIO program :: IO String
I suspect a second reason Haskell programmers don’t use this very often is that this is actually a bit of a pain. Imperative code lacks the guarantee of purity, but it is easier to set this structure up in an imperative language. Anyone reading the imperative langauge will understand immediately what is happening with purifiable modules. They may not quite understand why you wrote it that way, and may try to “simplify” it out of existence (hopefully the great testing you wrote will talk them back out of it), but it won’t confuse them. This Haskell code works, but it’s a lot of machinery to achieve it, even by Haskell standards, so it’s not worth bringing it out if a conventional pure core does the job, which it usually does.
Component Simplicity
There is a common metaphor for hyperspace travel in science fiction, where some scientist type explains how the FTL works by taking a piece of paper, drawing a line on it to demonstrate “normal” travel, then folding the paper to bring the origin and destination together.
Imperative code affords straight line approaches to problem. I have some code. It does 73 things. I need it to do a 74th thing, in the middle of one of the things it currently does, which is itself a complicated mixture of the other 72 things going on. The most direct approach is just to slap some code in the middle of the thing that does the new thing. I have seen many code bases that are the result of person-centuries of this style of programming. Despite superficially having functions and modules, it is architecturally just a list of things to do, often multiple lists haphazardly mashed together.
Imperative code affords this style of programming because it is often in the moment the locally easiest thing to do, and imperative programming won’t stop you. If you can’t do the thing you want right where you want to do it, it is generally not too difficult to bash on the code until you can. Do you need to add a few more global variables, add a parameter to a dozen functions, and add some more feature flags to interact with the dozen flags already present? No problem. Some imperative programming cultures positively celebrate the number of ways their language offers to bash code into compliance in this way. They are welcome to that celebration, but I part ways with them philosophically on that point.
If you try to program Haskell this way, it will at the very best be an inferior imperative language, and at worst, it’ll beat you with a crowbar as a temporary measure while it finds something to really put you down with.
Instead, you need to think like the scientist with the FTL demo. Sadly, you won’t be able to fold your way straight to the destination in one shot, but one way of looking at a well-structured Haskell program is to see it as a collection of ways of folding the problem space, shaving a bit of dimensionality off here and a bit more there, doing some folds that you really can’t get away with in imperative languages backed by the safety of the strong type system, until eventually your whole program collapses into a handful of lines taking the input and transforming it to the output, each token deeply imbued with exactly the semantics and meaning you need to operate on this domain.
Imperative code often devolves into writing things that exponentially expand the state space of your program and hoping you can find a happy path through your newly-expanded space without wrecking up too many of the other happy paths. Functional programming generally involves slicing away the state space until it is just what you need.
It isn’t that imperative code forbids this approach, and indeed there are many imperative code bases that are written on the “fold the problem space” principle, at least in the imperative languages that provide tools for doing the slicing. (Dynamic scripting languages are hard to use this way; they were born on day one as a rejection of the restriction-based mindset, so they fight you on a profound level if you try to restrict the state space.) But imperative languages will at best permit such an approach, whereas Haskell all but demands it. The meandering, plodding, exhaustive and exhausting list of instructions almost linearly written out and bashed into compliance by years of effort and bugs reported from the field that so many imperative programs naturally devolve into doesn’t work very well in Haskell.
Haskell programs generally acheive this by building small things from the bottom up, and then creating rich ecosystems for combining them. This is, for instance, one of the main virtues of “monads” and why Haskell uses them everywhere; it isn’t because of what any specific monad interface implementation is, it is because once you have conformed a data structure to the monad interface, you get the entire monad ecosystem “for free”, just as implementing an iterator gets you the entire iterator ecosystem for free for that implementation.
Note in my previous section when I showed a “free monad” that the data structure only had to implement the particular aspects of the specific driver that we wanted. Once we had that, the “program” I wrote with that driver did many things other than just fetch web pages. You get the whole world of Control.Monad, you get monad transformers and all sorts of other ways of combining small pieces into large structures. As small of an example as that may be, it demonstrates how that code does not simply “use IO”, but can be combined with IO, or combined with software transactional memory, or combined with many other things at a very fine level.
Of course imperative code allows combining code. We would not have gotten
very far in programming if we were literally forced to write nothing but
straight-line scripts that do one thing at a time in exhaustive detail as I
impugned imperative programming with above. But again, Haskell permits
and/or forces you to weave code together at a much more granular level than
imperative languages. One of my favorite type signatures in Haskell is STM (IO a)
, not because I used it a lot but because of what it implies,
“truly” pure code blending together with guaranteed-safe transactionally
pure imperative-looking code to finally produce instructions to produce
some real-world effect, all able to weave together very intimately, yet
safely.
I have to remind myself after writing that that in practice it is somewhat less idyllic than I may have made it sound. Sharp corners start emerging if you want to mix in other effects systems or monads. Still, if that is a failure, it is a failure at a task that imperative programming langauges can’t even conceive of.
Again, if imperative languages can’t get down to the level of fluidity that functional programming languages can, what is the lesson here? The lesson is that it is indeed possible to program like that, even with complicated use cases, and the practical takeaway is that imperative languages can still achieve this at larger scales of architecture.
One of my favorite bits of code I’ve ever written is code that can run against our corporate Wiki, which is XML under the hood, and extract bits of the page as JSON. Big deal. It sounds like the sort of thing you’d get an intern to write. But it is not written as a big script that grabs the XML, does all sorts of ad-hoc things to extract whatever ad-hoc data I want, then outputs it into an ad-hoc JSON format. It is written as a set of operators you can define as a data, then interpret those operators to turn it into JSON, in a format that mirrors the desired JSON file format.
It uses XPath to navigate to some desired set of XML nodes, then specifies how to extract the desired data from those nodes. So, suppose you have a simple table you want to extract from the Wiki page, containing, say, username in the first column and their email address on the right. The whole thing can be specified something like:
object:
users:
target: //[id="target_table"]/tr
for_each:
array:
- target: //td:1/
extract: text
- target: //td:2/
extract: text
This creates a JSON object with a key users
, which goes and gets the
table rows of the table identified by the given ID, and then for each of
those rows creates an array, each member of which is an array containing
the text of the cell, resulting in something like
{
"users": [
["user1", "user1@company.com"],
["user2", "user2@othercompany.com"]
]
}
There’s about a dozen ways of extracting and operating on the node sets returned by XPath. Composing these together, together with the general power of XPath, makes it quite easy to build even some relatively complicated extractions out of just these simple parts. I knew I was doing well when I was writing the test cases for this, and even in Go I was starting to prefer embedding one of these programs into my code as a string and executing it rather than writing the equivalent code by hand. Subsequently it has been used in enough ways that it has been easier to write these extractions than to write ad-hoc code to do one. Most importantly, it has been enough easier to make things happen that wouldn’t otherwise have happened.
There’s another code base that I’m currently working on that involves exactly that long meandering list of ad-hoc instruction written over decades, all of which is designed to operate on a particular entity. In the replacement, rather than copying the same approach, it was easy to create a uniform interface (which I will be taking as its own distinct lesson later) and turn it into a large set of discrete tasks that can be understood in isolation rather than a whole big general mishmash of dozens of things being done in quasi-random orders.
An in-the-wild example is the popular concept of “middleware” in the web server world. Extracting out common concerns in such a way that you can create “stacks” of handlers, is a very powerful way to write web servers. I particularly like how in architectures where you can attach middleware to specific components of the paths, it allows you to make solid guarantees like “nobody can access the /admin URL or anything under it unless they are an admin” by structuring the middleware stack correctly, so that even if someone adds a new admin page later they can’t forget to check that the user is an admin.
Having a plugin- or interpreter-based architecture is not news in general. When I wrote in my intro that not everything I covered was going to “brand new”, this is one of the biggest examples of that I had in mind. Imperative code was operating like this before functional programming was much more than a gleam in a researcher’s eye. Programming languages themselves are one of the most pure instantiations of this concept, because the composition cuts down to the core. It is arguably impossible to create a truly large code base that doesn’t incorporate this idea into it just for the sheer sanity of the programmers, so things like “Linux kernel modules” and “image editor plugins” and a standardized format for sound sythesis modularity have been around since just after programming languages were invented.
But functional programming, by forcing its users to use it much more often, forces you to get comfortable with it. I can see by the code I find in the wild that programmers in general take a long time to work this concept out on their own. Many more imperative code bases should have this concept. It also, by forcing it on you in a very restrictive context, forces you to get good at dealing with the pitfalls of this architecture. It is very easy to cut the modules apart in the wrong manner. In imperative languages you might just cut holes through it as needed; functional programming forces you to fix the architecture. If you have plugins that sometimes need to print something, then you have to modify the architecture to accomodate the need for occasional IO. If you have to thread data through the various modules, you will need to account for that quite explicitly in the design.
Functional programming shows that you can indeed write a lot more code as simpler individual modules composed together, with a richer concept of “composition” than you might naturally obtain from decades of imperative-only coding. It is also good at teaching you how to write interpreters of all shapes and sizes, even something as small as the interpreters I showed for the free monad example, so that when you come back to imperative programming you know better what you need to do to have one.
Half Constructed Objects Are Unnecessary
Functional programming languages of the type I’m talking about have immutable objects3.
For this particular section I’m just going to be talking about “immutability”, so in addition to Haskell, I also include Erlang and any other language with immutable values only. This does not includes languages that merely encourage them, but permit mutability; this discussion is about 100% immutable langauges, where mutation is held not just at arm’s length, no matter how “long” that arm may be (Rust), but outright eliminated.
Immutable objects can not be changed once created. New objects can be created, even related ones that are “this value with dozens of fields I had before but with this particular field set to 3”, but the old ones can’t be changed. Therefore, before creating any sort of composite value, all the relevant components must be in hand, because they can not be added in later.
A mutable language does not need this discipline. You can half-construct objects, heck, you can construct entirely uninitialized objects (not in the C/C++ “uninitialized memory” sense, but in the sense that they do not contain valid values in accordance with whatever the more-specific local “valid” is). The languages neither particularly encourage this nor discourage this.
But since it doesn’t tend to discourage it, it tends to creep into code when you’re not looking. Even when you think you’re using some sort of constructor or initializer to create an object, it’s still easy to return objects that are valid according to some internal or library definition (e.g., an empty linked list, which is a legal linked list) that are not valid according to some other definition (e.g., “this object must be owned by at least one user”), and so the object is created, then through later mutations, changed to be the valid value for the more restrictive context.
One of the things that functional programmers complain about in
conventional imperative languages is the presence of some sort of
nil/null value, that if dereferenced will crash the program. I
remember my own journey in learning about Maybe
and Option
and
other such values.
But what’s weird is, when I’m programming Go, I can literally go weeks without encountering nil pointer panics. If you ignore “things I quickly picked up in unit tests”, and my most common instance “I added a map without initializing it in the initializer, whoops” (also quickly picked up in tests), I can go months.
I wondered for a long time what the difference is, and I still don’t know; I’d really need to sit down with someone who is constantly encountering nil pointer panics and do some pair programming and compare notes.
But my best theory at the moment is that I’ve adopted the functional programming principle of always fully constructing valid objects in one shot. I do not tend to have invalid objects flying around my programs and getting nil pointer exceptions. If it is invalid for some field to be “nil”, then the program never has an instance of the value for which it is nil.
In fact, nil pointer exceptions are merely the problem people notice, because they crash noisily. In this sense, they are a positive boon, because that’s better than the invalid value that doesn’t crash the program and goes unnoticed! And that, of course, I see all the time in code; it is not some rare thing.
The thing that distiguishes null values particularly isn’t their invalidity. There are many other ways that data can be invalid in practice. It is that you can not remove them from the values the programming language considers valid. In C, for instance, you can not declare “a pointer that is not allowed to be NULL”. There is no such type. It is forced into your data model whether you like it or not. That is the distinguishing thing about null values.
It is far more productive to consider it a particularly bothersome special case of the general problem of having constructed invalid data in general, for which nils are not really all that special, then to overfocus on them and neglect other issues. If you remove all “nils” from your program, whether through careful programming or support in the programming language itself, but you’re still routinely passing around invalid or half-constructed data in the rest of your code, you’ve made some progress, yes, but not nearly enough progress.
It sounds too simple to be true: To not have invalid data in your program, do not construct invalid data.
But it works!
Strongly-typed functional languages strongly afford this by creating very rigorous and rigid types that fully prevent the data types of the program from even representing invalid data in memory, giving you tools to enforce this very well. Imperative languages have tools that can be leveraged to this end as well, but they are generally not as strong. But “not as strong” is not the same as “does not exist at all”. Whatever it is your local language offers for these tasks should be used.
And even in fully dynamic scripting languages that lack these tools almost entirely, you can just… not create invalid data.
Easier Said Then Done… But Probably Not Why You Think
Of all my suggestions here, I suspect this is the one that could most surprise you if you try to put it into practice. You would think that refactoring your code to never half-construct objects would be easy. You take anywhere you construct an object, shift all the mutations necessary to generate the initial values above the initial construction, then construct it all in one go. The theory is simple.
The practice is bizarre. An imperative program written without an awareness of this issue will naturally tend to grind in an expectation of half-constructed objects more deeply than a wine stain on a white shirt. If you try to convert one object creation, the odds that you’ll find that the incomplete object was passed somewhere to do something, which will itself require some other value to be created, which will then call a function that itself implicitly depends on half-constructed objects in a way you didn’t expect, rapidly approach one.
On the plus side, the first time you try to do this refactoring and you find this happening to you, you’ll probably come to a deeper understanding of how hazardous this can be to your program’s architecture.
But even having experienced it, I can’t really explain it. All I can say is, find yourself a nice little 500-1000 line program that ends up half-constructing things and passing them around, and try to refactor it to never half-construct objects. It’s wild how deeply imperative code can grind this antipattern into itself.
You may even get to the point that you believe it is impossible to avoid half-constructed objects in imperative code. Which is why I am again going to bang on the fact that functional programs prove that it is in fact possible, because they give their programmers no choice, and there isn’t any program that functional programmers are particularly stymied by as a result of this constraint on them.
There’s nothing stopping you from simply making that choice yourself in an imperative program.
It’s definitely good to start with a greenfield project though; refactoring is possible but quite difficult. It’s even possible this is one of the primary things that makes refactoring hard in general, though I don’t feel I have enough experience to make that claim. Since I’ve been programming in this style for many years now, I don’t have a lot of recent experience in refactoring code that partially-constructs objects. But it does make sense to me that in the code between the object being constructed and the object actually being “valid” forms an invisible, yet quite potent, barrier to many refactoring operations.
Interface Normalization
This is another lesson that does not require functional programming to notice, but it is a lesson that functional programming puts front and center whereas independently discovering it from imperative programming may take a long time.
Functional programming derives (pun intended) a lot of benefit from its typeclasses, despite them in some sense being second-class citizens in the expression problem to sum types. Much code is written against small interfaces like “functors” or “monads”, which can then be applied to any value that fits into those interfaces. This is one of the important ways that functional programs can have polymorphism.
Obviously, object oriented programs can also have polymorphism. For
the purposes of this functional programming language lesson,
“object oriented” is going to include “inheritance”. Normally when
discussing in a cross-language context I use a very loose definition
of “OO” to be basically “some way of bundling data with methods”,
which includes Go (no inheritance) and prototype-based Javascript from
before the class
keyword, and while this may apply somewhat to such
languages, the point will be made most clearly by inheritance-focused
imperative languages.
Polymorphism is a primary goal of object
orientation, obtained through the subclassing relationship. It may offer
others, such as the way interfaces can be added on to an OO system
fairly cleanly in various ways, but it is guaranteed to at
least have polymorphic subclass relationships; being able to pass
CDerived
to a function or method expecting a CBase
is the entire point.
Subclasses can only add to their parent’s class4. As objects become arranged into their various hierarchies, and things get subclassed, it also tends to tie up the parent classes such that changes to them become increasingly difficult due to the scope of the code that the changes would affect, because all changes to a parent class affect all subclasses. This is assuming you can change parent classes at all, since they may be coming from a 3rd party library you don’t want to or can’t fork.
Consequently, OO languages afford a style where details get pushed
down the hierarchy. The larger the hierarchy, the more the details end
up pushed down as the parent classes tend to become more and more
abstract. Thus, while you may nominally have something like a
QObject
in the QT library, most code is going to take something more
specific, and usually, much more specific.
This is difficult to express, but a result of all of this is that an OO designer will tend to develop a design approach where everything in a class has a very specific purpose, and anything that can be pushed down the hierarchy is pushed down the hierarchy, because maintaining things at higher levels becomes more and more difficult as more subclasses hang off of them. Anything that appears in a higher level must be directly applicable to all classes that may inherit from it, not just today, but someday in the indefinite future.
This is not, on its own, a bad thing. The logic I spell out here is valid on its own terms. But it is not the only approach.
Typeclasses in functional programming derive much of their power from going the other way, though, creating interfaces that you can use to operate on things, such as “monad”, and then expanding the reach of those interfaces by happily implementing degenerate implementations of those interfaces where it makes sense.
That is to say, in OO, if a class inheriting from its base class generally had better use all of the base class’ functionality. If it doesn’t, it is probably inheriting from the wrong base class. The “correct” base class may even have to be constructed by tearing the parent apart into multiple base classes, creating those very “abstract” classes that such systems tend to create.
Typeclasses, by contrast, lend themselves well to degenerate implementations of interfaces. “File” interfaces with Close methods that do nothing, for instance, are very useful. Or one of the classic confusing things about monads, “monad” interface implementations that are just degenerate upgrades of a Functor. As noted in my monad “tutorial”, one of the problems people have with understanding “monad” is precisely that the examples people reach for tend to be these degenerate examples, many of which are actually just functors, and using those as your primary examples makes it very difficult to see what the “monad” is actually enabling.
Object oriented programming naturally affords splitting things into hierarchies, making sure that nothing extra appears at any layer. Functional programming derives a great deal of power by flattening things into common interfaces and then writing very generic code against those common interfaces.
Once again, it is difficult to do this at the same level of granularity as a full-on functional programming language allows, as evidenced (if not outright proved) by all the failed attempts to bring ergonomic “monad” implementations into so many imperative languages, but the principle can be carried into OO quite easily at slightly higher levels. I have long since made my peace with interfaces where some of the methods have a defined way of saying “not applicable to this value”, or that may have what are effectively stubs in many, even most of the implementations.
There are OO answers to this; for instance one of my favorite patterns in general (when translated into the local idiom) is the Template pattern, which in OO terms involves creating a superclass that has stub methods for subclasses to fill in that describe how to do certain things, and then code can be written against the superclass that will “do the right thing” for whatever child is passed to it. A common example is simple image manipulation code that may allow you to, say, crop and rotate an image directly, then templates can be filled out for various bit depths, alpha depths, or depending on how aggressive you are being, perhaps will even implement something like very lossless cropping for JPEGs if the requested crop permits it, but everywhere else just does the obvious pixel manipulations and saves it as normal.
On the one hand, Template is nominally a perfectly normal OO pattern that has been in the Gang of Four book since the beginning of “design patterns” as a concept, which itself means it was already in common use before that… and on the other, it is a pattern I have not seen utilized very often in the wild, and I have personally worked with developers who had to have this concept of leaving in method plugin points that some derived classes may not use explained to them, not even because they can’t understand it, but just because using OO languages tends to leave you with a blind spot here. Methods in a superclass that most classes do not use cuts strongly against the training that OO tends to give you that you should have nothing extraneous anywhere because that extraneous stuff will bite you later. It takes a certain amount of swimming against the current of OO to figure this out, but a functional programming language will serve it to you cleanly.
Once you have it, no matter how you have derived it, it is a powerful tool to add to the toolchest.
Simplicity and Global Mutable Values
At the beginning of this series, I said I was going to annoy every programmer in the world in some way or other. So hang on, here comes one of those super hot takes that I was promising:
Global variables… are bad.
Alright, for those of you who were not so offended that you instantly snapped your browser window shut, obviously I am aware that this is not news. The higher-skill end of the programming world was well aware that global variables are dangerous since before functional programming5 was even conceived of. Therefore, clearly, it is completely possible to figure out even in a purely-imperative world that they are a bad idea.
Functional programming provides a good way to view why they are bad. It may not be “the” way, but it is a good way. This idea comes to functional programming via its academic heritage, where this idea was developed for the purposes of proof.
Suppose you have this Go code, which trivially translates to most other imperative languages6:
var message = "Hello World!"
var greetCount = 0
func Greet() {
fmt.Println(message)
greetCount++
}
What are the arguments of the Greet
function?
The function signature is supposed to answer that question. In this case, it has no arguments. It also returns nothing.
But you can also view this function from another point of view. For a
variety of reasons ranging from “generally liking math” to
“wanting to be able to use the tools of math to prove things about code”,
academia would like to turn the Greet
programming-language function
into a true mathematical function.
A true mathematical function depends only on its inputs, and has only outputs. It does not “change” anything; it is simply a mapping of inputs to outputs. If you want the function to be affected by something, it must be in its inputs, and there are no secret side channels to communicate or change anything else, there is only the outputs of the function.
The particular combination of the way we teach math and the way we learn programming sometimes makes this difficult to conceive of. We tend to learn “mathematical functions” as these cute little things that take one or two real numbers and output another one. Some teachers even get confused and teach their students that what “a function may only return one value for a given input” means is that a function can only return a real number or some other atomic thing like that. But in reality, the input and the output of functions may be arbitrarily large; nothing stops a function from taking a thousand complicated matrices as input and emitting a million new functions as their output. You can make anything into a mathematical function just by enlarging the input and output until they are big enough to handle the function.
So to model Greet
in this style up above, you need something like
func Greet(message string, prevCount int) (newCount int) {
fmt.Println(message)
return prevCount + 1
}
But, uh, wait a second here. We’ve still got a change happening in the
world with the fmt.Println
. That’s not a mathematical function
either. So now you need something like
type IOOperation interface {
// lots of stuff here that is its own lengthy
// diversion to discuss
}
type IOPrintMessage struct {
Message string
}
func Greet(message string, prevCount int) ([]IOOperation, newCount int) {
return []IOOperation{
IOPrintMessage{message},
}, prevCount + 1
}
This is now a mathematical function. You could write a function to examine the count and print only if the count is even or something:
func Greet(message string, prevCount int) ([]IOOperation, newCount int) {
if prevCount % 2 == 0 {
return []IOOperation{
IOPrintMessage{message},
}, prevCount + 1
}
return []IOOperation{}, prevCount + 1
}
and so on.
Now you need some sort of execution engine to run this function and
process the newCount
so that it will be placed correctly into the
arguments of any future Greet
calls, as well as an engine to run the
IOOperations
7, and an engine to allow a programmer to
somehow put all these functions together into a useful program since
now these pure functions are just sitting there, ambiently existing,
rather than the vivacious and proactive “functions” we’re used to
having in imperative languages. That’s beyond the scope of this
article, but to see what it looks like, see Haskell8.
And even this is simplified just to sketch the point and move on; if your function also accepts user input this model breaks down… but that’s a separate point.
My point in this particular post is the view that FP can provide on to why globals are bad. It is also a well-known fact in the imperative world that functions with dozens of arguments are a bad idea. They are difficult to understand due to the size of their state space.
Functional programming gives us a point of view where all global
variables are additional parameters to the function call of any
function that can see them. For instance, in the Go examples above,
the message
was not exported, so it would only affect that
module. This is technically not “global”, only module-scoped. But you
can also export it so the whole program can indeed see it, and then it
is basically joining that global variable on to every function
parameter in the program, and also every set of returned variables. In
the presence of a global value, a programmer must examine a function,
and the transitive closure of all functions it may call, in order to
determine whether a given global variable may be used or modified.
That means that in a program with hundreds of global variables, the simplest possible function that nominally takes no arguments and returns no values actually takes hundreds of arguments and returns hundreds of values!
No wonder extensive use of globals can trash an entire program.
This means that every component that uses all these values is that much less simple, and that much harder to put together into the component-based architecture I was pitching in the previous post. It is true that it is at least the same set of global variables, so it is not as bad as trying to assemble an architecture out of components that take hundreds of different parameters to each function call, but it still means the code is much more complicated than a casual reading of the source code would indicate.
Mathematically, you could model the functions as only taking the variables they use in the function themselves and only outputting the variables they could conceivably change, which is quite a bit less bad that all funtions simply unconditionally taking them all. However, for a human being trying to use one of these functions, they don’t know simply from the function signature which of these globals may be used or changed, since it isn’t in the function signature, so as a practical matter, every individual global value a function can see adds to the real-world complexity of the function a little bit, enough that it starts to add up over dozens and hundreds of such values.
Obviously, in practice, a global is not literally exactly equivalent to adding an input and output parameter to every function. If it were so, globals would not be “dangerous”, but rather, “unusable”. But they do corrode away the program as they are added, and while at first they tend to not cost too much on the margin, many of us have witnessed programs where they passed an event horizon and interacted until they became insanely complicated.
Call it perhaps the logarithm of the number of global variables being added as complexity to every function… but then, that log value being multiplied by the number of functions you have, so it still hurts pretty badly. The log of the number of global variables may sort of level off as the program increases in size but the number of times you pay the penalty does not.
So What Does Functional Programming Have To Say About Globals?
In Haskell, if you have a function of Int -> Int
, that function
receives one value and returns one. The inside of that function may
reference other values, but since all external values are immutable,
it is still equivalent to a constant function that is term-rewritten
to have that value directly incorporated, e.g.,
someValue :: Int
someValue = 6
increment :: Int -> Int
increment x = x + someValue
increment
may superficially seem to resemble an imperative function
that incorporates an external value, but because there is no way to
change someValue
, increment
is in every way equivalent to the
constant function increment x = x + 6
. In general const
values
don’t count towards the global variable count, only mutable global
values.
As a result, a real Haskell program that is doing real work is doing so from a whole bunch of components that are individually much simpler than most imperative components.
It is tempting to think that imperative code requires the use of these sorts of tools. Granted, the imperative world has largely gotten over the use of globals, but there’s still plenty of languages that provide a variety of ways to make functions “bigger”; for example, as another exercise to the reader you can consider what a method on an instance of some class with a complicated inheritance hierarchy that is changing values on multiple levels looks like from the mathematical perspective. It’s not necessarily hard to work it out, but it is tedious, and going through the process will give you some intuition into why the academic world eschews viewing the world that way. Mutations may still be contained by structured programming’s scopes and the rules of the inheritance hierarchy but can still bloat out of control fairly easily through the transitive closure of all modifiable values accessible through some superficially-simple method call.
The lesson of functional programming is that it provides an existence proof that these are not always necessary. It gives a framework for thinking about how extensive use of these tools can actually get you into trouble. It gives a demonstration of how composing simpler things together, even when you are limited to tools simpler than anything your imperative code provides, can still provide ways to solve problems that you may not have considered if you have not practiced with the simpler tools.
Bear in mind I’m not an absolutist on this question, because the entire thrust of all these essays is to pragmatically adapt our lessons, not try to blindly jam functional programming into the imperative world. You will have to use some of these features the languages you use offer you. However, a purely-imperative understanding of these features often accurately captures the benefits of these features but fails to teach you about their costs. This viewpoint can show the costs of using these features that complicate understanding what a given bit of code does. It can also demonstrate that there are in fact ways to get the benefits without necessarily paying the cost, by putting together simpler pieces in certain idiomatic ways that functional programming can teach you.
Adapting To Mutation
Functional programming, being an absolutist approach, simply eliminates the mutation entirely. While nifty, it can be impractical in conventional languages, and comes with some costs.
However, as hinted at early, the ability to mutate things is not a ravening monster that instantly destroys your program the moment you introduce a single mutable variable, because, again, if that were the case none of our code would work at all. At first, the costs are gentle, and the benefits can be substantial. It becomes problematic only when too much mutable state gets into one place, mutated in too uncontrolled a fashion, and it collapses into a black hole.
So the adaptation to this is to explicitly think about mutation as something that itself flows through your program, and needs to be considered on the architectural level. You make sure no part of the program collapses into a black hole by building fences around the mutation.
This is certainly a view onto a very similar concept as “encapsulation”, but it is not quite the same… it is still possible to build a well-encapsulated program where the encapsulation is not properly leveraged to control mutation. But it is the encapsulation tool set that you will generally turn to to build these fences.
There is also a natural parallel to the Law of Demeter, which I would say is a bit more general in that it worries about any communication, whereas I’m talking here about “things that result in mutation”, which is arguably the worst way to violate the Law of Demeter, but certainly not the only way. It is also bad to simply have a long chain of hard-coded references appear at all, even if it is a read-only operation.
One example of a mutation fence is an “actor”. While actors are generally thought of as providing concurrency services to their callers, they also frequently double as mutation barriers, because the callers don’t get generalized mutation ability on the values being managed by the actors. They can only interact via explicit messages. Hence, an actor forms a barrier against the kind of transitive closure of complexity explosion.
In a recent project I was rewriting some very “imperative”, in all the bad ways9, code to handle email processing. I replaced it with a pluggable module system to do the various tasks we do to the email. Using a pluggable module system is a fairly obvious architectural move, but where I applied the “functional programming” lesson is that the communication between these modules is itself tightly controlled and explicitly logged, as its own moral hazard to the code.
And rather than each job getting direct access to manipulate the email “directly”, there’s a system where the jobs get handles that only allow them to mutate a view of the email. The original email is immutable, so all jobs get equal access to it without the other jobs having interfered, and all changes through these handles can also be directly logged and inspected. As a result, while there have been a couple of issues with job order or two jobs trying to “own” the same header, they’re swiftly identified and dealt with because they were operating through an interface deliberately designed with the idea of constraining mutation on day one.
This is also a good case of what I mean by constraining mutation. The jobs have work to do, headers to change, variables to set for future jobs to save work, etc. It’s not about somehow “preventing” mutation; in general, one way or another the entire point of our programs is to change something in the world. The “constraint” in this case is to push it all through a system that allows for logging and visibility, not to deny it entirely, and to make sure it doesn’t run away unconstrained and untrackable… as it was in the original program.
To my mind, thinking about the flow of value mutation through your program is one of the basics of architecture, from the medium level, all the way up to the “how do I design the whole of AWS?” Everyone draws diagrams showing “who is talking to whom”, and that’s certainly an important diagram, but “what is being mutated, by whom, when, and why” is equally important and provided much less often.
Which is heavily related to the classic:
Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious. - Fred Brooks
I would draw a distinction between the micro-scale flow of value mutation and the macro-scale. Programming languages differ substantially in their microscales, ranging from “immutability” meaning that all changes flow through function calls, the burgeoning field of mutation-controlling languages which Rust leads but in not alone in, through to arbitrary mutability in conventional languages.
However, I think that by the time you get to the macro-scale the impact of these differences is substantially muted… perhaps not irrelevant, but muted. It doesn’t matter if your 138 microservices are all written in purely immutable languages if the architecture of the microservices as a whole is sloppy and dependent on wild, unconstrained mutation as messages fly everywhere without a plan; a well-designed monolith in a conventional language may be far easier to understand and change. A well-designed Haskell program and a well-designed C# program may have very similar high-level mutation flows, even though on the micro-level they are very different.
I promised in the intro post to even anger some functional programming advocates by claiming that FP gets things wrong. I think total immutability is generally wrong here. What is much more important is constraining mutability. Rust generally has a better approach. I would love to say Erlang generally has a better approach, except that I think it whiffed here; it needed immutability between actors, not immutability within actors. But a hypothetical Erlang that was mutable within actors but never, ever let any mutability leak across actors would be another better approach10. Mutability isn’t scary in small doses, and therefore there are more solutions around than entirely eliminating it. There is a certain simplicity to it, yes, and as I also mentioned in the intro, if you want to say it’s a better approach than the standard imperative approach I won’t argue, but I do not believe it is the globally optimal approach.
-
I will arbitrarily define “modern language” as one born after 1990 or so, so this includes Python and what we today call the dynamic scripting languages, and several modern static languages, but excludes C, and to a large degree, C++. Although C++ is such a mess that even figuring out if it has a given feature can itself devolve into a language lawyer cage match. ↩︎
-
In deliberate parallel to the mythical Sufficiently Smart Compiler. ↩︎
-
Since I’m discussing a very wide slice of language, “object” here means something very general; think of it as closer to the standard English meaning rather than the C++ meaning. I need some way to discuss things across languages here. ↩︎
-
I mean, you can hack something where you subclass a method and implement it by unconditionally throwing an exception or something, but generally that’s a bad idea and only to be used in desperation. If you program in an OO language, you should learn the Liskov Substitution Principle. And then recognize that in practice it is a principle broken as often as it is followed… but your OO programming can still improve for at least being aware of it. ↩︎
-
As a reminder or for those new to the series, I am using this term in its modern sense. Lisp predates the general recognition of globals being bad, but Lisp is not a functional programming language by the modern definition, even though it was called “functional programming” back then. ↩︎
-
The global-ist of globals, such as C had, where there is only one global namespace, are fairly rare nowadays, but the key aspect of the variables in this example is that there is only one instance of them, shared by all users. Namespacing may allow for multiple instances of the same base name, but if they’re trivially available by all code just for the asking they’re global enough for today’s discussion. This may be called a “static” variable, or “defined on the module”, or a “package variable”, or any number of other things. The key is that any other bit of code has some way of reaching it and potentially mutating it. ↩︎
-
This all gets horrifyingly worse if you introduce concurrency. Imagining what sort of inputs and outputs are necessary to fully model that is left as an exercise for the reader. And, indeed, we collectively know from experience global variables become even more dangerous in highly concurrent contexts. ↩︎
-
The much-ballyhooed IO type is less about “monads” and more about being the implementation of this approach for Haskell, meaning that technically, all function in Haskell really are completely pure. It is the combination of these functions with the aforementioned engines that makes a Haskell language actually do something. The purpose of the IO type is to abstract away the
RealWorld
(as Haskell calls it) and the various actions you can take, and also to wrap away concurrency (see previous footnote), giving an interface to interact with other computations without having to directly witness their input/output as explicitly declared inputs and outputs of every function. ↩︎ -
One thing I hope you understand by the time this series is done is that this is not an apologia for “business as usual” in the imperative world! Every criticism a functional programmer has about imperative code in the general sense, I’ve lived. Something does need to be done about those problems, I just do not necessarily agree that “throw everything away and switch languages” is the only valid choice. A valid choice sometimes, yes, but not the only. ↩︎
-
As I understand it, Elixir sits on top of BEAM and provides an implementation of a language on top of it that allows for mutability, which in this case amounts too, “allows you to rebind the same name to multiple values”. Which just goes to show once again that purity is relative. ↩︎