What Does Actual Functional Programming Look Like?
I am reliably informed by many people that the core essense of
functional programming is the extensive use of map
, filter
, and
reduce
, that these functions are superior to for
loops in every
way, to a degree that they should be jammed in to every possible
language with no need to analyze the cost/benefits of such an approach
because the benefits are just so incomparably amazing that the costs
couldn’t possibly be relevant, and even wondering about the costs
just proves I Don’t Get It.
Most of this
blogbook
(for those coming in first to this post) has been spent observing the benefits of
functional programming that are not directly related to map
,
filter
, and reduce
. But I have not yet examined the question… is
this in fact the core essense of functional programming in the first
place?
To heck with theory; let’s go to practice. Let’s take a look at a real Haskell project. I know of two major Haskell projects that have transcended the Haskell ecosystem to become useful programs people download for their own utility, xmonad and pandoc. But xmonad is a strange program, doing extensive binding with C libraries and interacting with all sorts of system services… “not that there’s anything wrong with that”, as the saying goes, but as a result it’s not very typical. By contrast, pandoc is core Haskell functionality; parsing, manipulating ASTs, re-emitting those ASTs, basically a big compiler suite for documents. It couldn’t be more core Haskell. Let’s take a look at that.
Pandoc Source
I’m going to get the latest released version as I write this, the source for 3.6.2.
First, let’s take a quick look at the lay of the land. The source code
is conveniently in a src
directory, nicely isolate from the test
code, so I cd src
. Running cloc
from within that directory:
$ cloc .
221 text files.
221 unique files.
0 files ignored.
github.com/AlDanial/cloc v 1.98 T=0.20 s (1107.4 files/s, 448342.4 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Haskell 221 7655 8157 73661
-------------------------------------------------------------------------------
SUM: 221 7655 8157 73661
-------------------------------------------------------------------------------
So we’re dealing with 73,661 lines of Haskell source.
map
/filter
/reduce
Obviously, it should be full of map
/filter
/reduce
, right? Let’s take a
quick survey.
Note that for all of these greps and counts I’m about to do, I manually examined them to make sure they are at least approximately correct. I’m sure they’re off here and there, but changing a 47 to a 49 in these discussions isn’t going to change my conclusions much.
I’m going to exclude import statements because those aren’t actual
uses. Haskell uses a few different kinds of fold
functions for what
is referred to elsewhere as reduce
, so:
$ egrep -r fold[lr]\'?\\b * | grep -v import | wc -l
154
This regular expression specifically catches foldl
, foldr
,
and foldl'
, and excludes other functions named with fold in the
name. If that is bothering you, bear with me; I will be justifying
this choice later.
That’s 0.21% of the lines of code in pandoc.
For comparison, function composition .
appears on 1767 lines, 2.4%
of the lines of code, and many of those are more than one use per
line.
But that’s obviously the weakest of the bunch. How about filter
?
$ egrep -r `\bfilter\b' * | wc -l
196
Better, but still only %0.27 of the lines of code in pandoc.
But obviously, I’m sandbagging, what about the big one? What about
the sign of good code, the thing that separates functional
programming from filthy imperative programming, the thing that
obsoletes all for
loops forever and ever?
$ egrep -r '\bmap\b' * | wc -l
847
1.1%.
For comparison, mapM
:
$ egrep -r '\bmapM\b' * | wc -l
581
Which is 0.8%, fairly similar.
“A-HA! I got you, jerf! mapM
is just another type of map
!”
No, it really isn’t. The code I see people claiming is the way to
code specifically use map
, and only map
. Endless, endless
invocations of map
, specifically operating on arrays. If there was
any further abstraction on this I
would be less perturbed, but it’s map
map
map
, over arrays or
the closest equivalent, specifically.
In fact what these people strike me as advocating isn’t functional
programming at all, but some sort of degenerate array-based
programming, with a very restrictive definition of “array”. Now, that
is an interesting paradigm in its own right, but truly array-based
languages tend to use more than just map
. Jamming degenerate
array-based programming into foreign languages is even worse than
trying to take functional programming lessons into those langauges.
mapM
is another recursion scheme entirely, about which more in a
moment, but first there is another point that people who know Haskell
need to have addressed, which is, what about <$>
?
The answer is that 2,311 lines, or 3.1% of the lines in pandoc, involve that. However, there are two problems with that:
<$>
is useful for any Applicative, and even a cursory scan through the results for thatgrep
will show many, many uses clearly not related to lists. The vast majority, from my eyeball scans. If there’s an easy way to extract which of those are specifically instantiated on lists, I don’t know what it is.- However, I also don’t really care much, because I would assert from
personal experience that even when used with lists,
<$>
takes on its own internal meaning, and after a while you aren’t really thinking “map on this list” but “apply this function to this data structure (whatever it may be) through the applicative instance”. In an experienced Haskell programmer’s hand, there is a real difference of intent between<$>
on a list andmap
, even if they do the same thing.
Still, you have to admit, that’s a non-trivial usage of all those functions.
Yes. However, the question is, when people are claiming they’re
“writing functional code” in Javascript, what does it look like? Well,
from what I’ve seen, it’s not so much that map
is 1.1% of the lines,
which is to say, if you keep your functions to reasonable lengths you
use it on average zero to one times in a function; these people are
outputing “correct” code that looks like 25% map
calls, or more.
Other Abstractions
So I agree that pandoc still shows non-trivial uses of those
functions, but you should also remember that I’m writing these words,
and unless you know Haskell fairly well, downloaded and unpacked
pandoc as I’m going, and have been running your own greps to
cross-check me, I’m controlling the spotlight of your attention. So
let’s widen the focus to some other operators in other recursion
schemes, because what pandoc uses even more than lists is applicative
functors. Let’s
take a look at
<$
and other specifically Applicative operators:
$ egrep -r '<\$[^>]|[^<]\$>|<\*>|<\|>|liftA|[^>]\*>|<\*[^>]' * | wc -l
2201
That’s 3% of the lines. <\$[^>]
and its friend in the opposite
direction is to avoid picking up the very common <$>
in this grep.
What about monad-based abstractions?
$ egrep -r '<-|>>=|mapM|return|mzero|mplus|mapM|forM|sequence|>=>|<=<|msum|mfilter|filterM|foldM|replicateM' * | grep -v -- -- | wc -l
10579
The grep -v -- --
, which has the effect of removing all lines with
--
, is because sequence
picks up a lot of comments that
contain the word “sequence” in their text, so I just discarded any
lines with comments.
Monad abstractions are operating on 14.4% of the lines in the codebase, again, quite often more than once per line. I’ve also not counted the usages of the mtl library, also based on monads.
Ah, yes, jerf, of course, monads. Why, everyone advocating highly functional programming in imperative languages knows how important Either is.
Whoa, there, strawman cowboy. I said monad, not Either. (Or Option as it is known in other languages.) Pandoc uses monads, not just Either.
In fact, let’s recalibrate a bit here, and look at the number of function signatures in Pandoc:
$ grep -r :: * | wc -l
6477
(Using double-colon for type declarations in Haskell makes some people
grumpy since most peer langauges use a single colon, but it really
helps me out here. Also, by visual inspection, Pandoc has few, if any,
uses of ::
to annotate anything other than a function.)
How many of them involve Either or Maybe in the tail position, such as one would expect when using its monad interface?
$ grep -r :: * | grep '(Maybe|Either)(?!.*?->)' | wc -l
490
(The (?!.*?->)
says “not followed by a ->
on the same line”.)
Not all that many. How many simply mention Either or Maybe at all?
$ grep -r :: * | grep '(Maybe|Either)' | wc -l
540
Again, for the supposed sine qua non of functional programming and the distinctive reason to have “monads” in your language, that is not really all that much.
Especially when we compare to…
$ grep -r :: * | grep PandocMonad | wc -l
2491
Oh ho. What have we here?
The PandocMonad typeclass contains all the potentially IO-related functions used in pandoc’s readers and writers. Instances of this typeclass may implement these functions in IO (as in PandocIO) or using an internal state that represents a file system, time, and so on (as in PandocPure).
Well, doesn’t that just sound like a familiar concept if you’ve been reading this whole series. Definitely not a Maybe or an Option.
Many other types used for their monad implementation are used in Pandoc. In this case my point is not about line counts, but about the diversity of the monad interface. These include, but are not limited to:
- ParsecT, which also involves the entire concept of monad transformers. If your language can’t even properly express the monad interface you should probably forget about monad transformers.
- MonadIO,
as mentioned before, to abstract out the
IO
monad implementation and allow swapping it out. - A variety of utliity functions used by things operating on the
PandocMonad
value, but since they don’t need the IO, are implemented simply onMonad
. - Other motley monads like Reader, Writer, and so on.
“monad” usage in functional languages extends well beyond “Maybe” and “Option”, which are also basically Babby’s First Monads. Again, nothing wrong with them, programming is all about using those foundations over and over again, but if you say “monad” when you mean “Option”, you should know you are using the term incorrectly, e.g.,
I wish Language X would add support for monads.
is emphatically not a correct way to say
I wish Language X would add an official Either/Option type.
The latter is a fairly small subset of the former, and can be done in many useful ways that often don’t entail adding full support for “monads”. Adding support for monads themselves is actually a fairly heavy ask for most langauges due to the fact it is generally not just a matter of “adding support” but significantly rearchitecting some core elements of the language for it to work well.
What’s The Point Of All Of This?
The point of all of this is to demonstrate with a concrete example of
one of the definitive functional code bases that functional
programming is not about map
, filter
, and reduce
.
The point is that anyone who is claiming that it is the point of functional programming is loudly proclaiming that they did not learn functional programming and do not understand it.
Which is, on its own terms, fine. There’s no obligation, no licensing authority requiring passing some curriculum about functional programming. But I would appreciate it if you’d tone down the demands that we all switch to map/filter/reduce “because functional”.
Functional programming is about taking small building blocks and
composing them into larger blocks based on using various recursion
schemes to tie them together in a way that retains their original
properties, and building larger and larger schemes out of that. In
this formulation, map
/filter
/reduce
are some of the basic bricks, but
just that: bricks. You won’t get far if you try to architect things on
a brick-by-brick basis. Good functional programming builds up its own
useful abstractions, like the PandocMonad
, and the things built on
that, and the things built on those things, to build the tools to
solve problems while never losing the fundamentals of functional
programming, like immutable data.
Moreover, both in theory and in the practice I observe, map
in
imperative programs actively fights the sorts of abstractions we
need to deploy in our programs, which is the core reason I dislike it
so much in non-functional languages. Pervasive use of map
qua map
,
where it is generally not being used to create higher-level
abstractions, is a bad sign. A single use of map
is to my mind
neutral, but when it starts getting chained is when it starts creating
blobs of code that are more difficult to refactor than
non-map
-based code, and it’s hard to overstate the size of that
disadvantage.
Imperative code can not do exactly what functional programming does,
for all sorts of reasons, but you can use a similar mindset of
building up from primitives ever more useful abstractions that still
retain the fundamentally useful properties of the lower
abstractions. map
/filter
/reduce
enters that conversation only on a
very low level, whereas as this blogbook has repeatedly hammered on,
you’re much better thinking about the medium levels and up with
imperative programming.
A program in an imperative language is far more “functional” if it
uses many for
loops but all of its components are easily
“purifiable” then if all of the for
loops have been replaced by
map
/filter
/reduce
but the architecture is fundamentally based on
impure mutations from top to bottom, because it’s just a bog-standard,
imperative-in-all-the-bad-ways program that had its for
loops replaced
with map
/filter
/reduce
. That’s because map
/filter
/reduce
in an imperative language hardly improves the ability to compose code
together in those languages, even at a small scale, and I would
contend, frequently does great damage to it.
But if you have a big pile of purifiable components, you have a big pile of components that can be easily composed together into larger systems, easily pulled apart into a microservice when and if it becomes necessary, and all the architectural benefits we need to build large programs. That medium-scale-and-up composition is vastly more important for quality programming than how we spell loops.