Emulators are fun! Even when they are unidimensional like in this challenge. The CPU has just one register, a cycle counter in lieu of a clock, and just two instructions. No jumps, no RAM, no I/O. Enough to power a pixel display, that also happens to be unidimensional.
But we still need to process the «assembly language» instructions and keep track of a changing state and side-effects. If pure functional programming is about forbidding side-effects, how can we manage this? Monads, Monoids, and polymorphism…
The Problem
A simplistic display device being driven by an even simpler CPU are described. The CPU is extremely simple:
It has a constant ticking rate, each step being a cycle.
It has a single integer register
X
, preloaded with a1
.It has a
noop
instruction that takes one cycle to complete, doing nothing.It has an
addx V
instruction that takes two cycles to complete, increasing the contents of registerX
byV
at the end of the second cycle.
The input looks like
noop\n
addx 3\n
addx -5\n
The general problem is to simulate the behavior of the CPU by «running»
the instructions and keeping track of the cycles and changes to
the X
register.
The particular problem for Part A is to compute the product
of the cycle number times the contents of register X
on
cycles 20, 60, 100, 140, 180 and 220, and add them up.
Main Program
The main program is pretty much the same as Day 1.
However, Part B turned out different, because it
returned a multi-line String
that needs to be presented
using putStrLn
instead of print
.
Loading instructions
Simple lazy list parsing and a data type is all we need.
Let’s start with an INST
data type to model CPU instructions
and a function to parse the corresponding String
into a
value of INST
, i.e.
data INST = NOOP | ADDX Int
deriving (Show,Eq)
toINST :: String -> INST
"noop" = NOOP
toINST = ADDX $ read $ snd $ tail <$> break (==' ') inst toINST inst
As long as the instruction is on a line of its own without the
newline, we can use toINST
. The ADDX
case is mildly interesting:
break
ing on the lone space will produce a tuple with both parts,
and we fmap
on the second one to remove the leading space, before
extracting and read
ing as an Int
.
Now, in order to get our arbitrarily long input into a [INST]
we use lines
to split the input getting rid of the newlines,
and apply toINST
to every element of the list.
loadProgram :: String -> [INST]
= map toINST . lines loadProgram
Modeling the CPU
The CPU can be modeled with a simple data type
data CPU = CPU { x :: Int
c :: Int
,
}deriving (Show)
and the initial state as a constant value
initialCPU :: CPU
= CPU { x = 1
initialCPU = 0
, c }
Modeling the machine
There are many ways to model an abstract virtual machine. The one I chose is aligned with what the problem statement requests, my intuition regarding what could’ve been asked for Part B, and my experience writing translators, interpreters, and compilers.
The way I saw it, the machine needed to:
Have a set of read-only «rules» — a data-driven way to trigger the «probe during this cycle» to compute the required products. Emphasis on read-only.
Have a way to collect results — each probe is going to produce a result, and I want to collect them all while interpreting. I did not want this collection to be «the result» of the emulation because it’s not. Emphasis in collect as we go.
Have a way to manage the changing CPU state — interpreting each instructions will require updating the cycle counter, maybe changing the contents of register
X
, and inspect them when probing at the required intervals. Emphasis in mutable state.
Enter the Monad
There are multiple definitions for what a Monad
is, and that’s
not as important as how to use them. For this particular problem,
we want a Monad
that is able to implicitly carry read-only reference
data (an «environment») that we can query, an append-only storage
(a «log») where we can add information as we go along, and a read-write
piece of data (a «state») that we can manipulate. By implicitly I mean
the shuffling around happens without needing to express it in code.
That is, if we write
do step1
step2
step3
...
each step could access environment, log, or state. Regardless
of what is accessed, the environment, log, and state, are
passed along from step1
to step2
, and so on and so forth.
That way, we can write our code in «imperative fashion» thinking
there is some «program state» being threaded as we go, when in
reality is just pure values being passed around. This is the
«programmable semicolon» interpretation of a Monad
: when passing
from step to step, the implicit semicolon has this «passing around»
meaning. You can think of the triad (environment,log,state) being
conveniently passed «on the side», hidden from view, but available
to be used at any or all steps.
But we will not need I/O. Monad
s can be used for I/O but aren’t
only for that, and in fact are way more flexible.
The RWS Monad
Haskell’s standard library already provides a Monad
with the
behavior we’re looking for. As a matter of fact, my modeling of
the machine was intentional to show how this Monad
can be useful.
import Control.Monad.RWS
The RWS Monad
is a polymorphic type for a monadic computation
producing results of any desired type. The computation implicitly
carries along a read-only environment, a logging container,
and a type for modeling state. Without going into details outside
the scope of this article, it looks like this
type RWS r w s = ...
We just need to fill the proper types for r
ead-only, w
rite-log,
and s
tate types. It is customary to simplify complex monadic types
using type aliases, so I wrote
type Machine w a = RWS Check w CPU a
where:
Check
is the type for the read-only environment. I used this for the actual cycles to probe on — more on it in a bit. When writing monadic code, functionask
can be used to access the value stored provided on initial evaluation.The second type has to be a
Monoid
. Part A requires adding numbers as we probe. Thinking that Part B would require saving things of a different type, I opted for a polymorphicw
so I can choose the properMonoid
as long as my type signatures are correct. When writing monadic code, functiontell
can be used to append (as inMonoid.(<>)
) to the running log.The third type,
CPU
, is the one used for the mutable state. When writing monadic code, functionsget
andput
can be used to read the current state and overwrite it, respectively.
The API for RWS
provides more functions to handle each part
of the implicit data being carried on. I avoided the fancy
stuff sticking to the basic ones.
Monadic simulation
There are three parts to this simulation:
How to keep track of (or
tick
) the cycle counter. For Part A it also means saving the probes at the required intervals. I expected Part B to want something different.How to
perform
a single CPU instruction.How to «bootstrap» the simulation — give initial environment and state, to then «run the assembler».
Part (3) is quite generic: it does not need to provide an initial log (because it is empty) so it can be made polymorphic on its type.
I decided to decouple (1) from (2) in order to make (2) generic
in the same sense that (3) is generic. By not doing any logging
during the CPU operations, but shifting that to the tick
er. That
way, only tick
would need to know about the particulars of logging,
with (2) and (3) not caring. That’s how general Monad
programming
can and should get — but you must write your function signatures!
I started with perform
, to implement the CPU behavior.
perform :: Monoid w => Machine w () -> INST -> Machine w ()
NOOP = tick
perform tick ADDX v) = do tick
perform tick (
tick<- get
cpu $ cpu { x = x cpu + v } put
As per the definition:
NOOP
does nothing and takes onetick
.ADDX v
takes twotick
s and updates registerx
by addingv
to it. Notice the use ofget
to read the current state as a pure value, and thenput
to overwrite it with the updated value.
The signature is as interesting as it is important:
perform :: Monoid w => Machine w () -> INST -> Machine w ()
The restriction in order to use any
Monoid
.The first argument is a monadic
Machine
carrying a log of some typew
we don’t care about in this function, that does not produce a result — that would be the ticker.The second argument is the single
INST
to perform.This in turn is a monadic
Machine
carrying a log of the same typew
we don’t care about and producing no result.
The next generic component is the simulate
itself. This has to
take care of «bootstrapping» the whole computation providing
the initial read-only environment and state, and them process
a full assembly program in order.
We already have initialCPU
for our initial state, so we have
to write initialCheck
to provide instructions for the probing
points. According to the problem description, the first probing
event has to happen on cycle 20
, and the rest every other 40
cycles, therefore
data Check = Check { first :: Int
modulus :: Int
,
}
initialCheck :: Check
= Check 20 40 initialCheck
Now, for the bootstrapping, we look at RWS
API and find two
functions that can help
runRWS :: RWS r w s a -> r -> s -> (a, s, w)
evalRWS :: RWS r w s a -> r -> s -> (a, w)
In both cases the first argument is the full monadic computation
(not «a single step»), the second argument is the initial read-only
environment, and the third argument is the initial state. The returned
values are different: runRWS
gives back the computation result, the
final state, and the log; while evalRWS
gives back the computation
result, and the log.
We surely need the log. Our computations do not return anything
useful (notice the ()
on the signatures) so that doesn’t make a
difference. I did not know if Part B was going to ask anything about
the final CPU state or not, so I chose runRWS
and wrote
simulate :: Monoid w => Machine w () -> [INST] -> ((), CPU, w)
= runRWS (mapM_ (perform tick) is)
simulate tick is
initialCheck initialCPU
Again, the signature is explicit: first argument is the tick
er,
second argument is the [INST]
to interpret, and the returning
triplet will have no return value, the final CPU
state, and
the log, whatever it holds. And we want to use any Monoid
that is convenient. Generic as it can be.
Recall the first argument to runRWS
must be the «whole computation».
We can use perform
to simulate the behavior of one instruction. Since
tick
is abstracted as an argument, we can use (perform tick)
to use
the «appropriate» tick
er when interpreting an instruction. We’d like
to use (perform tick)
over every instruction on our [INST]
, keeping
the order from left to right and, more importantly, keeping the
«implicit monadic flow» so environment, log, and state, are passed
around.
And for that we use mapM_
mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()
First, we notice that []
is a Foldable
so we can t ~ [INST]
to get
mapM_ :: (Monad m) => (INST -> m b) -> [INST] -> m ()
Next, we notice that according to simulate
’s signature
tick :: Machine w ()
therefore
tick :: INST -> Machine w () perform
and if we make b ~ ()
and m ~ Machine w
, we get
mapM_ :: (INST -> Machine w ()) -> [INST] -> Machine w ()
which means:
We can pass
(perform tick)
as first argument tomapM_
We can pass our list of instructions as second argument.
It produces a combined monadic computation
Machine w ()
suitable for bootstrapping withrunRWS
And mapM_
is useful because it receives a list of individual monadic
computations
[m0,m1,m2,m3,...,mN]
and turns it into
m0 >> m1 >> m2 >> m3 >> .. >> mN
which is equivalent to having written
do m0
m1
m2
...
mN
except you didn’t write it, but built it dynamically. Monadic actions are first-order values: they can be created, combined, and passed around, but they don’t «happen» until you bootstrap them. That’s part of what we mean by «programmable semicolon».
At this point, we have a generic simulator that only needs a particular
tick
er to run the assembly program. Here’s were use a specific Monoid
to built our logs with minimal effort.
Solving Part A
Going back to the requirements for Part A, we are asked to probe
the CPU status on cycles 20, 60, 100, 140, 180 and 220, multiply
the value of register X
and add them up. Notice how the first
value is «odd», and the rest go in increments of 40 — that’s what
initialCheck
carries for easy reference.
We need to write a tick
er to do the probing and adding as we go.
Turns out there exists a Monoid
named Sum a
that’s able to
automatically carry out additions every time we (<>)
. Therefore,
we can use tell
to provide a Sum
, and the addition will be
computed as we go. We write an explicit signature mentioning it
tickA :: Machine (Sum Int) ()
= do cpu <- get
tickA let c' = succ (c cpu)
<- ask
chk $ tell $ Sum $ x cpu * c'
when (check chk c') $ cpu { c = c' }
put where
check :: Check -> Int -> Bool
= cy == first ck
check ck cy || (cy - first ck) `mod` modulus ck == 0
Now tickA
explicitly mentions the use of Monoid Sum Int
to automatically
add numbers every time the code tell
s it to log. At the start of the
cycle, we get
the current CPU status and create a local variable to
increment the cycle counter while we compute. We immediately ask
for the read-only environment to have access to probing instructions.
If the check
confirms that we must probe within this cycle, we
simply do the math and tell
the Sum
Monoid
to do its thing.
Finally, we put
the CPU state with the updated cycle counter.
We can finally solve partA
by running the simulator with this
custom ticker. It will type-check, because we’re providing a concrete
Monoid
to work with.
partA :: String -> Int
= getSum total
partA input where (_,_,total) = simulate tickA (loadProgram input)
We ignore the computation result and the final state, grabbing the
final log via pattern matching. The log is a value of type Sum
,
and its API provides a function getSum
to get the actual value.
Solving Part B
The description of Part B turned out to be quite confusing at first. The provided example is helpful, but there’s room for improvement. I ended up understanding it this way:
The display actually has positions 1 to 240, grouped 40 a row.
The cycles go from 1 to 240.
When working on cycle
n
, ifn mod 40
is in the range[x-1 .. x+1]
, we need to output a hash, otherwise we output a period.To mimic the screen’s two dimensions, output a newline every 40 cycles.
We just need to provide a tick
er for this particular problem. Using
[String]
works because [a]
is a Monoid
. The log is going to be short
enough, so I won’t worry about (++)
being expensive. We will create
all the output on the log, and print it after the simulation has
completed. This is different to doing interleaved I/O, which is
possible but I chose not to use, in order to show a pure solution.
tickB :: Machine [String] ()
= do cpu <- get
tickB if check (c cpu) (x cpu) then tell ["#"] else tell ["."]
let c' = succ (c cpu)
`mod` 40 == 0) $ tell ["\n"]
when (c' $ cpu { c = c' }
put where
check :: Int -> Int -> Bool
= (cy `mod` 40) `elem` [pred rX .. succ rX] check cy rX
Now tickB
explicitly mentions the use of Monoid [String]
for the log.
This means we need to tell
using [String]
for them to be appended
as we go. At the start of the cycle we get
the state of the CPU,
and check if we need to log a hash or a period. We then increment
the cycle counter in a local variable, and log a newline if we’ve
reached the right edge of the «screen». Finally, we put
the CPU state
with the updated cycle counter.
We can finally solve partB
by running the simulator with this
custom ticker. It will type-check, because we’re providing a concrete
Monoid
to work with.
partB :: String -> String
= concat ls
partB input where (_,_,ls) = simulate tickB (loadProgram input)
We ignore the computation result and the final state, grabbing the
final log via pattern matching. The log is a value of type [String]
so we simply concat
into a long String
that should be printed
using putStrLn
as mentioned when describing the main program.
> readFile "input.txt" >>= putStrLn . partB
ghci###..###..###...##..###...##...##..####.
#..#.#..#.#..#.#..#.#..#.#..#.#..#.#....
#..#.###..#..#.#..#.#..#.#..#.#....###..
###..#..#.###..####.###..####.#.##.#....
#.#..#..#.#....#..#.#.#..#..#.#..#.#....
#..#.###..#....#..#.#..#.#..#..###.#....
Conclusion
The Monad
abstraction is very powerful. There’s more to Monad
s
than the simple «oh, they are used for I/O». Since Monad
s model
order-dependent computations in a generic and polymorphic way,
they can be used and combined to build first-order pure
computations that have «side effects» or «implicit effects»,
besides the actual results they produce.
I decided to use RWS
to illustrate the ability to separate
«the process» from «the specifics». This is a particularly
useful way of structuring monadic computation that favors
reusability and composability, in ways that are either
impossible or prohibitively expensive in other programming
languages.
Each of the RWS
components can be used separately. That is,
there exist separate Reader Monad
, Writer Monad
, and State Monad
that can be used individually when you need only one of the
behaviors. There are many other data types that can act as
Monad
, such as Maybe
, Either
and [a]
, to model computations
that can fail, produce an exception, or be predictably
non-deterministic. There are Monad
s for parallelism, parsing,
continuation-passing style, Prolog-style backtracking,
and Software Transactional Concurrency.
Every single flow-control strategy can be modeled as a
Monad
. Haskell gives you the opportunity to choose which
strategy to use, and to combine more than one using the
Monad Transforming technique. On top of that, you can
write generic code with replaceable parts — this example
scratches the surface by keeping the Monoid
generic,
so I can abstract the particular away from the general.
On the topic of Monoid
, knowing about Sum
and Product
is a matter of consulting the API. I used [a]
as a
cheap Monoid
for accumulating. For production-grade
code I’d use Data.Seq
instead.
As a final comment, I used «log» in the sense of
in-memory record. If you are thinking about doing actual
application logging, the Writer Monad
is not the way.
Guess what, there’s a LoggingT Monad
specifically
designed for logging to stderr
or stdout
, and for
people doing serious syslog
logging, check out
monad-logger-syslog.
They abstract the logging out of the explicit I/O
operations in a general way.