Found a motivating questions on r/haskell a couple of days ago. OP alleges to having a «decent amount of experience writing numerical code», mentions a few non-functional languages, says they are considering using Haskell and «as a test» they are trying to solve a trivial differential equation using iterative methods with literally this
= 0.01
dt
x :: Integer -> Double
0 = 1
x = (x (n-1)) + 5 * (x (n-1)) * dt x n
and wonders why time and space are doubling when n
grows,
asks if there are better ways of doing it, and (of course)
blasts how he thinks Haskell might just be too slow.
The example’s naïveté, using Integer
, superfluous parentheses,
entry-level floating-point operation grouping, question
formulation, and all-around gall, spell «homework» to the
trained eye. I normally don’t engage with these kind of questions
at all, but being in a good mood felt compelled to give the hints
one would to an industrious student:
Your function has two non-tail recursive calls. It’s like a naïve Fibonacci implementation. There’s your exponential time and space complexity.
Alternatives:
Rewrite it into tail-recursive form with accumulators to push down intermediate values,
Use Dynamic Programming techniques to save computed values into an immutable array, or
Generate values as an
unfoldr
Of course OP did not answer or asked for clarification. If you are familiar with numerical computation, you understand that it is impossible to give performance hints that will work for all cases, but the above techniques should be familiar to someone who has a decent amount of experience writing numerical code, amirite?
Anyway, I decided to write these notes for people that might be interested in understanding the techniques, for numerical and non-numerical computation cases. As a bonus, I will show how to effectively compare them using proper tooling available in the Haskell ecosystem.
There’s no Computer Science without Fibonacci
The Fibonacci Sequence is everywhere, it is naturally defined as a recursive function, and can be expressed as succintly as naïvely in Haskell like so
fib0 :: Int -> Int
0 = 1
fib0 1 = 1
fib0 = fib0 (n-1) + fib0 (n-2) fib0 n
First of all, I’m using Int
(64-bit machine integers) instead
of Integer
(arbitrary precision simulated integers) because
I feel the need for speed.
The above functions is not tail-recursive because both recursive calls have to return before their values can be added up and returned. Basic algorithm analysis and understading of how programming languages work, will lead you to exponential complexity: each call generates two new calls that must complete, hence it will need O(2n) calls total, with the necessary stack space to boot.
It is extremely expensive in time and space so it will
slow down quickly as you try larger n
.
> fib0 42
λ433494437
it :: Int
148.13 secs, 159,526,065,008 bytes) (
Rewriting in tail-recursive form
Contrary to popular belief, not every function in Haskell must be written in tail-recursive form to be efficient. When a function is total and produces a single result it’s usually a good idea to write it in tail-recursive form. Most number-crunching functions are, so this is a skill that must be learned and practiced.
In general, the trick to rewrite a function in tail-recursive
form is to create an auxiliary function with additional
arguments meant to carry intermediate values. Each
recursive call computes on the arguments before actually
recursing, until the base case simply returns an
expression based on said arguments. GHC will notice this
pattern and turn your function into a closed efficient
assembler loop. Better than the for
you were thinking
about.
The tail-recursion version would be
fib1 :: Int -> Int
= go 0 1 0
fib1 n where
= if i == n then f1 else go (i+1) (f1+f2) f1 go i f1 f2
The auxiliary function go
has type signature
go :: Int -> Int -> Int
Its first argument, i
, represents the index for the Fibonacci number
we pretend to compute. Its second argument, f1
, will always
be the actual i
-ith Fibonacci number. The third argument, f2
,
will always be the (i-1)
-ith Fibonacci number.
Now fib1 n
becomes go 0 1 0
, which reads «the 0
-th
Fibonacci number is 1
, and the -1
-th Fibonacci number is 0
.
Looking at how go
works, you’ll see that:
If the current number we pretend to compute is precisely the
n
we are looking for, then we can immediately returnf1
, otherwise,go
is called recursively, to compute the(i+1)
Fibonacci, by having the second argument be the sum of the current two ones (precisely what the next-Fibonacci is supposed to be), and the third argument be the currenti
-th Fibonacci. This maintains the invariant forgo
’s computation.
Now, all operations on arguments can be done before actually making the call, due to a GHC optimization technique called Strictness Analysis. This means that even though Haskell is lazy, the compiler can identify opportunities to be more strict, particularly over arithmetic expressions. This turns out to be way more efficient
> fib1 42
λ433494437
it :: Int
0.00 secs, 123,832 bytes) (
This is probably the fastest way to implement this. It requires constant space (one stack frame), and time proportional to the number of additions. But we don’t have access to the intermediate values, just the final one. What if we need to compute Fibonacci many times on account of «reasons»?
Dynamic Programming technique
There are many numerical and non-numerical problems having solutions that can be written as recurrences, because the problem can be expressed as a combination of two or more smaller versions of itself. If we are interested in computing solutions for different values of the input over and over again, a standard technique known as Memoization is used. Saving intermediate results in an array, and re-using them instead of computing them again, saves time at the expense of space.
If we need to repeatedly compute Fibonacci numbers up to,
but never over an upper limit N
, we can create an
immutable array as we go. Haskell’s arrays become
immutable, extremely compact, and have constant time access
to any position once they are built. And they can be built
in self-referential fashion, leveraging Haskell’s laziness
to compute every single cell exactly once.
import Data.Array ( (!), array )
fib2 :: Int -> Int
= a ! n
fib2 n where
= array (0,n)
a $ [(0,1),(1,1)] ++
! (i-2) + a ! (i-1)) | i <- [2..n]] [(i, a
In this implementation we compute the n
-th Fibonacci number
by simply looking at the n
-th position of the array a
. Of
course we can always have the array as a top-level definition
and use it as many times as we want. But we’re here for
the construction trick.
Focusing on how the array is constructed, there are a few key elements:
The tuple
(0,n)
defines the range of the array. This will be an uni-dimensional array with subindexes ranging from0
to the value ofn
. The array is created dynamically and it’s stored in a compact efficient way – you cannot change the size of the array once built.The «body» of the array is created using a list of pairs: the first element being the index, the second being the value.
The first two element of the array are explicitly set as
(0,1)
, i.e. «the 0-th Fibonacci is 1», and(1,1)
, i.e. «the 1-st Fibonacci is 1».The
array
function takes an arbitrarily long list of these pairs, in the order they are placed in the list. This allows later elements in the list to replace earlier elements. But, more importantly, elements can refer to elements that have already made part of the array. And that’s precisely the trick in this list comprehension! (i-2) + a ! (i-1)) | i <- [2..n]] [(i, a
when setting up the
i
-th element of the array, its value should be whatever it’s already in positionsi-2
andi-1
of the array. So, each position is computed using pre-computed values, from2
ton
, filling the table.
Once array
has had a chance to process the whole list, the
array is compacted, and can be queried very fast.
> fib2 42
λ433494437
it :: Int
0.00 secs, 137,032 bytes) (
So fast that it is apparently indistinguishable from our tail-recursive solution. It is indeed linear in the number of computations, and also linear in the space used to compute.
There’s no Functional Programming without morphisms
The impulse to think in while
or for
constructs for
data processing is a very hard vice to overcome. We functional
programmers (most of the time) do not care or need to know how
many elements are there to process. We think in morphisms
over data structures. A morphism that takes a given data
structure and processes all elements of it to produce
a result is called a
Catamorphism.
Catamorphisms are embodied as the fold
family in Haskell;
more formally the Foldable
typeclass. We don’t think
in while
or for
, we think in fold
s. Yet, we
don’t have a data structure with things we can combine to
produce a Fibonacci number, so fold
s don’t seem to fit
the bill.
The dual of a Catamorphism is called an Anamorphism. It produces a data structure out of a seed value, each step using the current seed to generate the next value and a new seed. This is exactly what we need
import Data.List ( genericDrop, unfoldr )
fib3 :: Int -> Int
= head $ genericDrop n $ unfoldr next (0,1)
fib3 n where
= Just (f2,(f2,f1+f2)) next (f1,f2)
Haskell lists have a unique anamorphism, called unfoldr
.
It needs a function that, given a seed, will produce the
next element and seed, or signal to stop collecting.
Given the initial seed, unfoldr
will start collecting
all elements, by repeatedly applying the «next step»
function, forever, or until no new value is produced.
For Fibonacci numbers, it is convenient to generate
them all: see how next
takes the current seed,
produces the next (f2
) Fibonacci to collect, and then
does the math to build the next seed? It’s an abstracted
implementation of our tail-recursive version.
Now, unfoldr
is going to produce a list: an infinite
one given how next
is written. That’s why we apply
genericDrop
to ignore the first n
elements, and
then just take the first one of the remaining ones.
Thanks to Haskell’s laziness, evaluation will stop
there, and we’ll get our result
> fib3 42
λ433494437
it :: Int
0.00 secs, 128,976 bytes) (
Read. That. Again.
It’s as fast as our carefully hand-written tail-recursive version, almost as cheap in terms of space, and definitely cheaper than the array-based one. What kind of sorcery is this?
You see, unfoldr
in and on its own is an anamorphism,
and if used alone will create a full Haskell list. It so
happens that genericDrop
is a form of catamorphism.
When a catamorphism is composed with an anamorphism, there
are certain algebraic rules that allow extremely clever
optimizations that are possible on pure languages only.
The result of these optimizations is that there will be
no list: at every stage of the computation, the element produced
by unfoldr
will be immediately fed to genericDrop
(to be dropped), but unfoldr
will continue with the
next seed until the desired value is produced, kept
by genericDrop
and returned by head
.
There will be no list constructor, no garbage-collection cost, no intermediate list decomposition, but simple shuffling of values between machine registers. And we wrote it in a very high-level composable and abstract way. Beat that.
Which one is better, then?
We can’t really compare in terms of flexiblity. I hope you realize that for different problems you might want just the final result, some, or all the intermediate results. I intentionally wrote the last one to illustrate that for really advanced languages, abstractions can result in performant code, without having to think much about implementation details.
But if we restrict our comparison to «what is the fastest
way to compute the n
-th Fibonacci», we can use one of
Haskell’s ecosystem’s best tools: Criterion.
In a nutshell, the Criterion library provides an extremely accurate microbenchmarking toolkit that allows comparing the performance of pure functions. Analysis is extremely high-resolution, and uses statistical models to ensure garbage-collection and other optimization factors do not influence the results. You end up comparing raw performance, and informed of how much «noise» your testing environment caused.
In our scenario, it suffices to write this
import Criterion.Main
= defaultMain [ bgroup "fib0" [ bench "42" $ whnf fib0 42 ]
main "fib1" [ bench "42" $ whnf fib1 42 ]
, bgroup "fib2" [ bench "42" $ whnf fib2 42 ]
, bgroup "fib3" [ bench "42" $ whnf fib3 42 ]
, bgroup "fib4" [ bench "42" $ whnf fib4 42 ]
, bgroup "fib5" [ bench "42" $ whnf fib5 42 ]
, bgroup ]
to compare how fast the different implementations are. After building and running the tests
$ stack ghc -- -O2 fib.hs -o fibbench
$ ./fibbench --output fib-comparison.html
...
we end up with this report. You want to look at the estimate Mean execution time for each implementation, and then continue reading…
Wait… Where is fib4
?
It’s not the fastest, but it’s a cute Ouroboros once you figure it out.
fib4 :: Int -> Int
= wat !! n
fib4 n where wat = 1 : 1 : zipWith (+) wat (tail wat)
I’m afraid to ask about fib5
…
Well
> fib5 42
λ433494437
it :: Int
0.00 secs, 121,528 bytes) (
it is the fastest in execution time, and is the cheapest in terms of space
fib5 :: Int -> Int
= snd $ foldl' wtf (0,1) (replicate n undefined)
fib5 n where
= (b,a+b) wtf (a,b) _
I apologize for misleading you, dear reader: it is possible to write an extremely performant, space efficient, Fibonacci function using the simplest of Catamorphisms. Just collapse an unexistent list of nothingness, into the result you need.
Just my two nanoseconds faster.