This is the first problem on AOC 2022 that required thinking a bit about proper data structures.
The input data provides an «initial state» and a set of instructions to modify it. Their respective syntactic structure is noticeably different, so I opted for a mixed approach: a formal parser for the first part, with lazy list processing for the rest.
How to handle the changing state is also interesting. There are several stacks and they need to be accessed randomly, so we need a good data structure here. Within that, each stack can then be efficiently modeled with a Haskell list.
The Problem
We are given a dataset of unknown length. The example describes it like this
[D] \n
[N] [C] \n
[Z] [M] [P] \n
1 2 3 \n
\n
move 1 from 2 to 1\n
move 3 from 1 to 3\n
move 2 from 2 to 1\n
move 1 from 1 to 2\n
The first part is a set of stacks, presented top-to-bottom. We
could look at the data file to see how many are there effectively,
but we learn more if we try to parse then in a generic way. Note
how the spaces are significant not only for separating, but also
for indicating «no data here». This suggests using a custom
Parsec
parser for this part.
Then there's an empty line separating the stack specification, from the list of operations to perform. Moreover, the list of operations has a pretty static structure per line. This suggests using the lazy list processing technique to parse it.
The list of instructions is based on popping a number of elements from a source stack, and pushing them into a destination stack. This suggests using a data structure that makes it easy to search and modify elements using a number as a key.
Main Program
The main program is pretty much the same as Day 1.
Parse and lazy parse
As usual, the first step is to parse the input. The top
level parser has to process the input String
and produce
two results: the initial stack setup, and the list of
movements.
Initial Stack setup
We use Haskell lists to model stacks, because they can
be accessed efficiently on the left end: one pushes element
by cons
-ing with (:)
, and pops elements using pattern
matching (e:es)
and keeping the tail. Both operations are O(1).
Since the problem description shows the stacks storing
letters, [Char]
(equivalently String
) will be our data type.
For this problem we need to store an arbitrary number
of stacks, indexed using Int
numbers. Haskell's base library
provides many data structures for indexed access, and for this
particular problem I chose
import qualified Data.IntMap as DIM
because it's a map having Int
as keys, with efficient search
and modify operations, and clever construction and flattening
utilities.
Instructions for changing the stacks
We are interested in knowing how many elements to move, as well as the source and destination stacks. This suggests
data Move = Move { cnt :: Int
, src :: Int
, dst :: Int
}
deriving Show
Loading the problem specification
Having settled on the IntMap
and Move
data types, it follows
that our top level parser must have signature
loadSpec :: String -> (DIM.IntMap String,[Move])
The first observation we make to structure our parser, is the presence of an empty line after the stacks and before the instructions. We've already seen that
ghci> :type break null . lines
break null . lines :: String -> ([String], [String])
will produce a tuple with all lines before the empty one as first element, and the rest of the lines (including the empty one) on the second element. This means we need to
getInitialStacks :: [String] -> DIM.IntMap String
over the first element, and
map toMove . tail
over the second element, provided we have
toMove :: String -> Move
Thus, our top level parser is written as
loadSpec :: String -> (DIM.IntMap String,[Move])
loadSpec = (getInitialStacks *** (map toMove . tail))
. break null . lines
Lazy string processing to parse Move
s
We must provide an implementation for
toMove :: String -> Move
that allows us to capture the numbers on a String
such as
move 4 from 5 to 6
Taking advantage of the standard functions words
and (!!)
-- note
the latter is 0
-based in its counting.
ghci> words "move 4 from 5 to 6"
["move","4","from","5","to","6"]
ghci> (words "move 4 from 5 to 6") !! 1
"4"
ghci> (read $ (words "move 4 from 5 to 6") !! 1) :: Int
4
We write
toMove :: String -> Move
toMove s = Move { cnt = read $ ws !! 1
, src = read $ ws !! 3
, dst = read $ ws !! 5
}
where ws = words s
Parsing the stacks
I will be using lists to simulate stacks. That requires parsing this
[D] \n
[N] [C] \n
[Z] [M] [P] \n
1 2 3 \n
into
[['N','Z'],['D','C','M'],['P']]
regardless of the number of stacks present in the input.
The first thing to notice is that each line is actually «things» separated by one space each. Sometimes, the thing is three spaces, sometimes is an uppercase letter between square brackets. So to parse a thing
thing :: Parser String
thing = between (char '[') (char ']')
(satisfy isUpper >>= \c -> pure [c])
<|> (count 3 (char ' ') >> pure [])
Taking full advantage of Parsec
combinators, the thing is
either an uppercase letter between
angle brackets, or (<|>)
exactly three spaces. In the first case, we return a String
holding the single character. In the second case, we return
an empty String
.
But each line is actually one or more thing
separated by
exactly one space. So we write
parseRow :: Parser [String]
parseRow = thing `sepBy1` char ' '
and now we have a parser that can go from this characters
[N] [C] \n
to this Haskell list
["N","C",""]
For the final parsing step, we note that the whole stack
specification section is nothing more than several rows
separated and terminated with a '\n'
. That means we can
write something like
parseRow `sepEndBy` newline
to collect all the lines. The astute reader probably figured out that we will end up with with something like
[["","D",""],["N","C",""],["Z","M","P"]]
Very close but not quite right yet. If it were a matrix,
we would need to transpose it. Haskell's standard base
library Data.List
provides a function transpose
that does
exactly that, so
parseStacks :: Parser [String]
parseStacks = do
rows <- parseRow `sepEndBy1` newline
pure $ map concat $ transpose rows
and we end up with the desired
["NZ","DCM","P"]
All stacks in «stack order» (top to the left), and in the same order they appear in the specification.
A map of stacks
The last thing to figure out in order to write
getInitialStacks :: [String] -> DIM.IntMap String
is to get all the stacks into our IntMap
. As mentioned before,
IntMap
uses Int
as key. The stacks will be in order from left
to right thanks to parseStacks
. The API for IntMap
provides
fromList :: [(Key, a)] -> IntMap a
so all we need to do is build the list of key-stack tuples
for the map to be built. We don't know how many stacks there
are, and lazy functional programming allows us to pair keys
with stacks without having to count them thanks to zip
.
One more thing: we don't need the line with numbers after the
stacks. It is going to be the last element of the list of String
we receive as the first component of break null
. We write
getInitialStacks :: [String] -> DIM.IntMap String
getInitialStacks s = case parse parseStacks "" (unlines $ init s) of
Right stacks -> DIM.fromList $ zip [1..] stacks
Left _error -> DIM.empty
thus completing the last piece needed by loadSpec
.
We can now read the String
from the file, and end up with an
IntMap
of stacks in the proper order, and the list of movements
to execute.
Simulating moves
You pop things from a stack, and push them to another. Our stacks are stored in a map, and after whatever move we simulate, we'd like to have the map updated accordingly. This means we can start writing
perform :: DIM.IntMap String
-> Move
-> DIM.IntMap String
The signature might look odd at first, but the order of
arguments is carefully crafted so we can use to efficiently
process a [Move]
as you'll see below.
Notice that when you pop from one stack and push onto another,
the order of things gets reversed, i.e. if we pop three elements
from [1,2,3,4] while pushing on top of [5,6], we end up with
[4] and [3,2,1,5,6]. This allows writing the «pops» as simply
grabbing the needed elements, reverse
-ing, and adding in front
of the destination stack.
And that's exactly what I did when solving Part A.
The «twist» for Part B is that there's no popping-and-pushing, it's just grabbing and stacking, so the order of element is kept as is. That's why I refactored the function and wrote it as
perform :: (String -> String)
-> DIM.IntMap String
-> Move
-> DIM.IntMap String
perform crane stacks move = DIM.adjust (const srcStack') (src move)
$ DIM.adjust (const dstStack') (dst move) stacks
where
srcStack = stacks DIM.! src move
dstStack = stacks DIM.! dst move
(transfer,srcStack') = splitAt (cnt move) srcStack
dstStack' = crane transfer ++ dstStack
The new first argument, crane
, abstracts the way to modify the
elements grabbed on every move: it's going to be reverse
for Part A,
and good old id
for Part B.
The rest of the function is simply using the information in the
Move
value to:
Find the
srcStack
anddstStack
in the mapBreak the
srcStack
usingsplitAt
to grab the elements to transfer, keeping the remaining ones assrcStack'
Transforming (
crane
) and tacking the elements on top ofdstStack
to create the newdstStack'
.
Replace the old stacks with the new ones in the IntMap
using
adjust
, and the Move
has been simulated.
The solutions
Part A asks to find all the letters that are on top of the respective stacks after the movements have been performed. This requires signature
partA :: String -> String
partA s = ...
We can use
(stacks,program) = loadSpec s
to parse our input into the initial stack setup, and the list of movements to apply.
The list of movements is finite, and we need to process is completely from left to right, simulating each movement over the stacks as they were left by the previous movements. This is a poster child for
foldl' :: Foldable t => (b -> a -> b) -- what each step does
-> b -- initial value
-> t a -- collection to process
-> b -- final value
This (very) polymorphic signature fits into our problem nicely.
Lists are Foldable
, so it makes sense for [Move]
if we substitute
t ~ []
and a ~ Move
foldl' :: (b -> Move -> b)
-> b
-> [Move]
-> b
And we have an initial IntMap
full of stacks, and we want to
produce the final IntMap
as a result. If we substitute
b ~ IntMap String
, we get
foldl' :: (IntMap String -> Move -> IntMap String)
-> IntMap String
-> [Move]
-> IntMap String
It's very close, except that our perform
has an extra first
argument, the crane
thing. That's not a problem, because we
can always use it as a «section», i.e. a partially applied
curried function (indentation is mine)
ghci> :type perform
perform :: (String -> String)
-> IntMap String
-> Move
-> IntMap String
ghci> :type perform reverse
perform reverse :: IntMap String
-> Move
-> IntMap String
ghci> :type perform id
perform id :: IntMap String
-> Move
-> IntMap String
Part A
This is the regular pop-push behavior. Every time we grab («crane»)
things, they will be reversed before putting on top of destination.
After all movements are processed using foldl'
we will need to
list all stacks contained in the IntMap
in order. The API for
IntMap
provides function elems
precisely for listing all elements
of the map in key order. Finally, recalling that our stacks are
actually lists, we collect the first element of each list.
partA :: String -> String
partA s = map head
$ DIM.elems
$ foldl' (perform reverse) stacks program
where
(stacks,program) = loadSpec s
Part B
This is the alternate don't reverse grabbed things. Same code, different «crane» behavior.
partB :: String -> String
partB s = map head
$ DIM.elems
$ foldl' (perform id) stacks program
where
(stacks,program) = loadSpec s
Conclusion
Choosing the right data structure not only improves the efficiency of your programs, but will also make their logic clearer. Data structures for functional programming are sometimes similar to those on imperative programming, but this does not mean they can or should be used in the same way.
A Haskell list is not an array unless it extremely small. You
should not go about asking for things' length
, but try and
process them lazily. And there are more than one type of map,
each one better suited for a different type of problem -- and no, that
«universal» map you have in your favorite dynamic language is
not better than picking the appropriate one.
Get yourself familiar with the API's for the standard data
types, so you can immediately identify the one you need for
each particular problem. Using Map
, HashMap
or Array
would've
been terrible.
Finally, make a conscious effort to avoid writing recursion
explicitly, and learn to use the Functor
, Applicative
, and
Foldable
behaviors. They lead to clean, fast, concise code,
because the compiler can perform optimizations that generate
code rivaling your painfully crafted imperative code. But
Haskell's composes better.
The more you practice, the more you'll find yourself writing functions in «curried fold-friendly form», and decomposing your problems as a flow between building structures (anamorphisms) and collapsing structures (catamorphisms).