This challenge felt like «busy work» more than «creative work». It boils down to keeping track of a grid's occupied positions, until a particle goes out of bounds or there's no room for more particles under the rules.
As usual for «grid-centric» problem solving in Haskell, is better
to use a Map
to focus on occupied positions. After defining a
custom Ord
instance to sort things the way I needed, it was
mostly «busy work».
The simulation turned out adequate to illustrate why unfoldr
matters, specially when you fuse into an hylomorphism...
The Problem
We are given the description of a wall grid, in the form of horizontal and vertical segments. The grid is empty, except for the places where rocks form the horizontal and vertical segments. The sample input looks like
498,4 -> 498,6 -> 496,6\n
503,4 -> 502,4 -> 502,9 -> 494,9\n
Each pair of number between ->
describes a segment. The input
guarantees that each pair has a common first element thus representing
an horizontal segment, or a common second element thus representing
a vertical segment.
A «grid unit» of sand is dropped from a particular starting position. It drops until it cannot progress further according to the movement rules (down, down left, down right, or stop). Units are blocked by the walls as defined by the grid, or by previously dropped units. Once a unit stops, a new one is dropped.
Main Program
The main program is pretty much the same as Day 1.
The Grid
When modeling «grid» problems, Haskell programmers tend to focus on
what's being used in the grid, rather than on empty spaces. Only
keeping track of used positions is surely space-efficient when compared
with traditional arrays. That's why we choose a map that's optimized
for the kind of position index we need to handle. For this problem,
positions are two-dimensional, so it makes sense to use the traditional
Data.Map
.
import qualified Data.Map as DM
The Data.Map
API imposes only one restriction on the type used as key
for lookups
DM.lookup :: Ord k => k -> DM.Map k a -> Maybe a
Since most standard data types already have an Ord
instance, and we
can always ask the compiler to derive an automatic Ord
instance, this
is quite convenient: we could use plain tuples (Int,Int)
as keys,
and be done with that.
The default Ord
instance for tuples follows the lexicographic order.
That is, it sorts on the first component and then on the second component.
This is fine for most applications, but not for this problem. Part A
asks to detect when a unit is «falling into the void». Nothing
more than detecting when its Y coordinate is larger than the largest
Y coordinate for a block. If we model coordinates as (x,y)
, finding
the «largest» coordinate becomes contrived. If we model coordinates
as (y,x)
then we are surely to run into confusion.
I opted to create a custom type for Pos
itions, and providing a
custom Ord
instance
data Pos = Pos { x :: Int
, y :: Int
}
deriving (Show,Eq)
instance Ord Pos where
(Pos x0 y0) `compare` (Pos x1 y1) = case y0 `compare` y1 of
EQ -> x0 `compare` x1
c -> c
The compiler-derived Eq
instance is fine. The custom Ord
instance
sorts based on the y
coordinate first, comparing on x
only to break
ties.
As for dropping things into the grid, I separated the actual
movement from the keeping track of grid contents. That means providing
a simple API for Pos
to figure out the three possible movements
down :: Pos -> Pos
down (Pos x0 y0) = Pos x0 (succ y0)
downLeft :: Pos -> Pos
downLeft (Pos x0 y0) = Pos (pred x0) (succ y0)
downRight :: Pos -> Pos
downRight (Pos x0 y0) = Pos (succ x0) (succ y0)
and just give a name for the fixed initial position whence units are dropped.
origin :: Pos
origin = Pos { x = 500, y = 0 }
Using DM.lookup
returns a Maybe
, so looking for empty positions would
result in Nothing
. I thought it would be a good idea to have a separate
type to model what's occupying a grid position,
data Content = Rock | Sand
deriving (Show,Eq)
We need to parse our initial DM.Map Pos Content
and write the functions
to simulate the droppings.
Building the initial grid while parsing
All the numbers in the input are positive, so we start with the usual
parseInt :: Parser Int
parseInt = read <$> many1 digit
and then take advantage of Parser
being an Applicative
to turn a pair
of numbers separated by a comma, into a proper Pos
. Note the use
of (<*)
to take care of the comma, but returning the first parseInt
result.
parsePair :: Parser Pos
parsePair = Pos <$> (parseInt <* char ',') <*> parseInt
Now, each input line is just a collection of Pos
itions separated
by the arrow, therefore
parseLine :: Parser [Pos]
parseLine = parsePair `sepBy1` string " -> "
The problem description asserts that every input line describes
a continuous segment. For every pair of Pos
in the list, there's
one common coordinate, so we can expand the whole segment between
them, thus computing all grid elements that must be occupied with
rocks.
We start by writing
segments :: [Pos] -> [Pos]
segments ps = concat $ zipWith points ps (tail ps)
where
points :: Pos -> Pos -> [Pos]
points (Pos x0 y0) (Pos x1 y1)
| x0 == x1 = [ Pos x0 y' | y' <- [min y0 y1 .. max y0 y1]]
| y0 == y1 = [ Pos x' y0 | x' <- [min x0 x1 .. max x0 x1]]
Consider a list of Pos
such as
[p0,p1,p2,...,pN]
we need to consider them two at a time, figure out if they represent a
vertical or horizontal line, and generate all the points in between.
The auxiliary points
does just that for two Pos
. Since the input
could define these top-bottom, bottom-up, left-to-right, or right-to-left,
we use min
and max
so the list comprehension generates all the points
for the particular segment.
After writing points
we should realize that in order to consider all
pairs of Pos
in the list, we can just zipWith
the list with its own
tail
, e.g.
ghci> let l = [1,2,3,4]
ghci> zip l (tail l)
[(1,2),(2,3),(3,4)]
Using zipWith points
we will process each pair of Pos
on the original [Pos]
,
and produce a new [Pos]
for each pair, resulting in a [[Pos]]
, hence
the concat
to flatten all segments into a single [Pos]
.
But we need to load multiple lines from the actual input. Each line can
be parsed into a [Pos]
thanks to parseLine
, and we can use sepEndBy
to
parse them all into a [[Pos]]
. We need to apply segments
to every
internal list, and then concat
their results:
parseInput :: Parser [Pos]
parseInput = concatMap segments <$> parseLine `sepEndBy1` newline
This will result in a [Pos]
holding all the occupied positions for
the initial grid. We can load them into the initial DM.Map
with a
convenient top-level parsing function
loadGrid :: String -> DM.Map Pos Content
loadGrid input = case parse parseInput "loadGrid" input of
Right ws -> loadWalls ws
Left _e -> undefined
where
loadWalls :: [Pos] -> DM.Map Pos Content
loadWalls = foldl' (\m p -> DM.insert p Rock m) DM.empty
Loading a DM.Map
using foldl'
over the list of initial elements is
a standard idiom: use an DM.empty
map as accumulator, and perform
the appropriate DM.insert
as we go through each element of the list.
In this case, just insert a Rock
in place.
Making things fall
The problem description explains that at every given time only one unit is falling until stopping. This lead me to explicitly decouple the «falling down» with the «keeping track». That is, I try and make a unit fall as much as possible one step at a time is separate from actual grid maintenance.
A unit's fall
depends on the current grid state and the unit's
Pos
ition. It can either fall to a new position, or be stuck
in place («landed»).
fall :: DM.Map Pos Content -> Pos -> Pos
fall grid p0 = tryFalling [ down p0, downLeft p0, downRight p0 ]
where
tryFalling :: [Pos] -> Pos
tryFalling [] = p0
tryFalling (p:ps) = case DM.lookup p grid of
Nothing -> p
Just _ -> tryFalling ps
The unit is in Pos p0
so we tryFalling
according to the rules. Each try
looks into the current map to see if the would be next position is
empty, in which case we can move there. If the would be next position
is not empty, tryFalling
in the next direction as per the rules. If
there are no more alternatives to fall down, return the same position:
the unit has landed. This is just one step where the unit can either
fall down to a new position or stay stuck.
Part A asks to detect when a unit is falling into the void. This means
we need to stop falling when we detect the lowest wall has been found.
Therefore, we need to be able to dropSandA
with a particular maxY
bound
passed as first argument. Also, this being a simulation, we need to
signal whether or not the simulation has ended: my choosing a
Maybe
to wrap the newest resting position and updated state will
become obvious later on.
dropSandA :: Int -> DM.Map Pos Content -> Maybe (Pos, DM.Map Pos Content)
dropSandA maxY grid = dropAnother origin
where
dropAnother p
| y p' > maxY = Nothing
| p' == p = Just (p', DM.insert p' Sand grid)
| otherwise = dropAnother p'
where
p' = fall grid p
The simulation is performed by dropAnother
from its current Pos
ition p
.
It tries to fall
using the current grid
onto a new Pos
ition p'
. If that
new Pos
ition is greater than the upper bound, the simulation is over.
If that new Pos
ition is the same as the original one, the unit has
landed so we return it alongside the updated grid
. Otherwise, we try
falling again. Each unit has to drop from origin
, so the top-level
definition from the function just tries to dropAnother origin
.
Part A solution unfolds
The signature for dropSandA
seems odd at first glance
dropSandA :: Int -> DM.Map Pos Content -> Maybe (Pos, DM.Map Pos Content)
Being a curried function, we can partially apply it to any given Int
that would be the upper bound. That would result on a signature
dropSandA 42 :: DM.Map Pos Content -> Maybe (Pos, DM.Map Pos Content)
The description for dropSandA
establishes that it attempts to simulate one
dropping until it either succeeds or goes over the bound. In the former case
we get a Just
with the final position placed and the updated grid; the latter
just provides a Nothing
.
A simulation such as this is an instance of an «unbound repetition». We don't know how many units can fall (in fact, that's what we've been asked to figure out!). We know each step takes a current grid (a «seed») and can produce a new position and updated grid (a «result» and «new seed»), or signal the simulation has stopped when going out of bounds.
In the same way that «bound repetition» can be abstracted as a fold
,
«unbound repetition» can be abstracted as an unfoldr
.
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
The second argument is the initial «seed». The first argument is
a function that's able to perform one step, taking the current seed
to produce a result and a new seed, or signal the end of the simulation.
Using unfoldr
will automatically collect the intermediate results,
and return them lazily as they are produced.
We can solve Part A with
partA :: String -> Int
partA input = length $ unfoldr (dropSandA boundary) grid0
where
grid0 = loadGrid input
boundary = y $ fst $ DM.findMax grid0
We loadGrid
our initial seed into grid0
. Thanks to the custom Ord
instance
for the Pos
ition type, we can use DM.findMax
to get the Pos
ition with
the largest Y coordinate in O(log n)
time, thus setting the boundary
.
We let unfoldr
do its thing over dropSandA boundary
working from
the starting grid. It will lazily produce a list of «landed» Pos
itions
until the first one goes out of bounds. The length
of the resulting
list is the number of units that settled down before the first one
went out of bounds.
Part B is similar but hard to factor out
After solving Part A, the description for Part B keeps the same core simulation, but adds two new restrictions:
The ground is assumed to be two levels after the maximum Y wall, spanning as far left and right as needed. This effectively means there's a new bound but for stopping in place, not interrupting the simulation.
The simulation must run until enough units have landed such that it's not possible to drop another one. That is, everything has piled up to
origin
.
I wasn't able to easily refactor dropSandA
into something that could
work for both simulations (hence the name). I ended up writing
dropSandB
which follows the same strategy
dropSandB :: Int -> DM.Map Pos Content -> Maybe (Pos, DM.Map Pos Content)
dropSandB maxY grid = dropAnother origin
where
originFull = isJust $ DM.lookup origin grid
dropAnother p
| p' == origin && originFull = Nothing -- 1
| p' == p = Just (p', DM.insert p' Sand grid) -- 2
| y p' == pred maxY = Just (p', DM.insert p' Sand grid) -- 3
| otherwise = dropAnother p'
where
p' = fall grid p
There are just two tweaks:
Units «appear» at the
origin
and try to fall down from there. There will be a point whereorigin
is not occupied, a new unit appears but can't progress beyondorigin
staying there -- that would be rule (2). The next unit will appear at theorigin
, will not be able to progress beyondorigin
, but can't stay there on account of the previous one. That's rule (1) signaling the end of the simulation.There's always the ground to stop units. They can lay there, and the simulation must continue. The ground is one unit above the bound. That's rule (3).
Rules (2) and (3) can be combined with (||)
in the guard, I know.
It's easier to explain the tweaks if they're separate, that's all.
Other than those tweaks, it's exactly the same approach in order
for dropSandB
to be hoisted by unfoldr
after figuring out the
«ground level» bound
partB :: String -> Int
partB input = length $ unfoldr (dropSandB boundary) grid0
where
grid0 = loadGrid input
boundary = 2 + y (fst $ DM.findMax grid0)
Conclusion
Beginner Haskell programmers tend to write their recursive functions by hand. That's fine. They're learning the ropes. Once you understand the basic forms of recursion, you must stop writing it explicitly and turn to higher-order functions abstracting it.
Haskell programmers use data as control structures. When we want to
go over all elements of a recursive data structure, we fold
. It doesn't
matter if it is list-like or tree-like... Foldable
takes care of moving
around and detecting when we're done -- we just need to say what to do
at each step. These forms of recursion consume one element at a time
and are said to «collapse» the original data structure into a
resulting value. This recursion style is a catamorphism.
Conversely (or «dually»), when we want to build a data structure
incrementally from an initial seed, we unfold
. There is no Unfoldable
type class, because there isn't a generalized API useful enough.
We have Data.List.unfoldr
able to generate a (potentially infinite)
list one element at a time. The standard Haskell type Data.Tree
also
provides an unfoldTree
. These forms of recursion produce one element
a time, in a predictable order, and are said to «construct» a new
data structure incrementally. This recursion style is an anamorphism.
They are sometimes enough for simple simulations such as this one -- that's why I used it. They can also be used to generate locally-finite globally-infinite data structures for localized traversals.
Learning to write all your recursion as composition of
catamorphisms (fold
) and anamorphisms (unfoldr
) over the
particular data types involved is a crucial skill. It allows
you to separate the what to do from the how to move, leading
to better decoupling and code reuse. It also improves
performance:
When there are multiple
fold
s in sequence, they can be combined into a singlefold
. The compiler can do it for you! This means less value passing, less function calling, better cache locality.Placing a
fold
after andunfold
, is similar to having a consumer after a producer. Theunfold
will generate one element at a time, and thefold
can process it immediately. The intermediate structure is never created. This form of recursion is known as an hylomorphism.
Go read
«Functional programming with bananas, lenses, envelopes, and barbed wire» (Meijer, Fokkinga, Paterson)
«A tutorial on the universality and expressiveness of fold» (Hutton)
«Theory and Practice of Fusion» (Hinze, Harper, James)
for the basic mathematical nitty-gritty. These techniques have been known for decades, and GHC applies them when improving and generating machine code. The apparent list processing becomes tight assembly loops building no lists at all...
The simulation is unfold
-based (anamorphism), and
length
is fold
-based (catamorphism). Their composition is
fused as a hylomorphism. The assembler looks written by
hand, except it was not. High-order recursion wins.