Advent of Code 2022 has started. I usually try and solve most of it using Haskell. Some problems are more interesting than others, specially when they make it possible to contrast two or more techniques.

Day 1 is a straightforward parse-and-compute problem. I propose two solutions that exhibit two ways to think in a functional context: «lazy stream processing» and «synthetic parsing».

## The Problem

We are given a dataset of unknown length, with the following structure. The explicit newlines (`\n`) are there to make the explanation easier to follow.

``````    1000\n
2000\n
3000\n
\n
4000\n
\n
5000\n
6000\n
\n
7000\n
8000\n
9000\n
\n
10000\n
\n
``````

The problem amounts to computing the sum for every group of numbers. Part A wants the maximum sum, while Part B wants the sum of the three top sums. For the example above, the answer for Part A would be `24000`, while the answer for Part B would be `45000` (`24000 + 11000 + 10000`).

## Main Program

The main program needs to do three things:

1. Read the data file as a `String`.

2. Transform said `String` in such a way as to get our `Integer` result.

``````    partA :: String -> Integer
partB :: String -> Integer
``````
3. Print the result.

The main program executes on the `IO` monad in order to interact with the «Real World». We can take advantage of

``````    readFile :: FilePath -> IO String
``````

to read the whole file lazily, returning its contents as a `String`. Since `IO` is also a `Functor`, we can lift our solution functions via `<\$>` so they compute the result inside `IO`, and then simply bind to `print`. Hence

``````    main = partA <\$> readFile "input.txt" >>= print
``````

It's a terse and idiomatic equivalent for

``````    main = do
print (partA contents)
``````

## Separating the common part

The solution functions have the signature

``````    String -> Integer
``````

and we immediately recognize that we need to find a maximum or the three maximum values, from a list of the partial sums. This suggests breaking the problem in two parts

``````    totalPerGroup :: String -> [Integer]
findMaximum   :: [Integer] -> Integer
``````

Doing that, we can write our solutions as

``````    partA = findMaximum . totalPerGroup
``````

Let's assume `totalPerGroup` exists. In that case

``````    partA = maximum . totalPerGroup
``````

where `maximum` comes from Haskell's standard `Prelude`, and

``````    partB = sum . take 3 . sortBy (flip compare) . totalPerGroup
``````

That is (from right to left), sort the list values from largest to smallest, take the first three, add them up.

There are two approaches for writing `totalPerGroup`. Regardless of the one we choose, it is clearly separated from `IO` and decoupled from the actual computation.

Reusability and composability, tools of the trade.

## The Lazy Stream Processing approach

I argue that this approach is paramount to lazy functional programming:

Try and think about ways to transform the whole data value into the desired value, without worrying (too much) about its size. Break the transformation into composable «steps» and, more often than not, Haskell's laziness will take care of doing things step-by-step for you.

Let's implement

``````    totalPerGroup :: String -> [Integer]
``````

that way. By the way, I typically «build» this composition sequence using Haskell's REPL (`ghci`), by adding steps and checking the intermediate values, while «taking notes» in the actual program source. That is, I started with

``````    ghci> f <- readFile "data.txt"
ghci> step1 f
...look at the thing here...
ghci> (step2 . step1) f
...look at the thing here...
``````

so on and so forth. I'll just mention the functions and why they make sense.

For starters, we have a whole `String` with all the file contents.

``````    ghci> contents <- readFile "data.txt"
ghci> contents
"1000\n2000\n3000\n\n4000\n\n5000\n6000\n\n..."
``````

Haskell's `Prelude` provides the function

``````    lines :: String -> [String]
``````

that given a `String` breaks it wherever there's a `\n`, producing a list of all the lines. Therefore

``````    ghci> lines contents
["1000","2000","3000","","4000","","5000","6000","",...]
``````

Notice how empty lines become `""`. If we can «group» the `String`s begore every `""` into their own lists, and drop the empty lines to get

``````    [["1000","2000","3000"],["4000"],["5000","6000"],...]
``````

we'd be closer to our solution. Suppose we already wrote the function

``````    groupNumbers :: [String] -> <span class="createlink">String</span>
``````

doing that grouping, so we can now write

``````    ghci> (groupNumbers . lines) contents
[["1000","2000","3000"],["4000"],["5000","6000"],...]
``````

Notice how we have a list of lists. Consider any list at the second level, such as

``````    ["1000","2000","3000"]
``````

we need to convert every «number looking `String`» into an actual `Integer`, and then add them up. If we add type annotations, and then make them into type signatures, we can use Haskell's `read` to parse the `String` into `Integer`, so

``````    ghci> map read ["1000","2000","3000"] :: [Integer]
[1000,2000,3000]
ghci> sum \$ map read ["1000","2000","3000"] :: Integer
6000
``````

So we need to apply

``````    sum . map read :: Integer
``````

to every second level list. This is the same as writing

``````    ghci> (map (sum . map read) . groupNumbers . lines) contents :: [Integer]
[6000,4000,11000,...]
``````

provided `groupNumbers` exist.

We can now write

``````    totalPerGroup :: String -> [Integer]
totalPerGroup = map (sum . map read) . groupNumbers . lines
``````

and we only need to write `groupNumbers` to solve our problem.

We know that `groupNumbers` must have type

``````    groupNumbers :: [String] -> <span class="createlink">String</span>
``````

and that we will be working with a list like

``````    ["1000","2000","3000","","4000","","5000","6000","",...]
``````

Let's take advantage of `break` provided by Haskell's standard Prelude.

``````    ghci> break null contents
(["1000","2000","3000"],["","4000","","5000","6000","",...])
``````

The first element of the resulting tuple is precisely the first group before the empty (`null`) `String`. The second tuple has the rest of the original list, unprocessed. We can write `groupNumbers` recursively by:

1. Saving the group as the head of a list.

2. Ignoring the empty strings at the beginning of the unprocessed rest.

3. Recursively applyting `groupNumbers` to the resultof (2).

4. Producing and empty list when there are no more groups.

Therefore,

``````    groupNumbers :: [String] -> <span class="createlink">String</span>
groupNumbers [] = []
groupNumbers ns = g : (groupNumbers . dropWhile null) bs
where (g,bs) = break null ns
``````

Our solution is complete.

Due to Haskell's lazy I/O (`readFile`) and the way the solution is structured, the program will read as many lines as needed until it gets to an empty line, process that group and synthesize its sum, before performing I/O again.

A reasonable trade-off given how high level the code is.

## The Synthetic Parsing approach

We've written a solution structured in such a way that `totalPerGroup` can be replaced with any alternative way to transform the `String` into `[Integer]` with all partial sums. Let's take advantage of that.

It can be argued that any function with type signature

``````     String -> a
``````

where `a` is any structured type, can be tought as a parser.

Haskell's `parsec` module provides a set of parser combinators: functions that can be composed to build production-grade recursive descent parsers. There are functions providing parsers for «simple things», and functions that combine parsers to build complex parsers.

The library provides a very simple `Parser` monadic type that works over `String` and allows you to produce any synthetic type. Recall the input is a `String`, which is actually a `[Char]`. I'll add some comments to the input, to highlight its structure:

``````    1000\n      <--- One or more group of digits...
2000\n      <--- ...separated by a newline...
3000\n      <--- ...ending in a newline, make a group!
\n          <--+
4000\n         | Groups separated by a newline!
\n          <--+
...
``````

Let's try to write a parser

``````    parseGroup :: Parser Integer
``````

that collects all numbers in a group, adds them, and returns the result.

The construction is incremental («parser combinators») as follows, using combinators already provided by `Text.Parsec`, and the `read` and `<\$>` technique already discussed, taking advantage of `Parser` being a `Functor`.

Each line shows a parser combinator or combination, the type signature showing its result, and a description as a comment in curly braces:

``````    digit :: Parser Char
{ accepts one digit [0-9] }
many1 digit :: Parser [Char]
{ accepts one or more digits }
read <\$> many1 digit :: Parser Integer
{ parses the string of digits into a proper Integer }
(read <\$> many1 digit) `sepEndBy1` newline :: Parser [Integer]
{ collects a group separated and terminated by '\n' }
sum <\$> ((read <\$> many1 digit) `sepEndBy1` newline) :: Parser Integer
{ computes the sum for the group }
``````

Thus, we end up writing

``````    parseGroup :: Parser Integer
parseGroup = sum <\$> ((read <\$> many1 digit) `sepEndBy1` newline)
``````

Since groups are separated by a newline, we collect them all with

``````    parseAllGroups :: Parser [Integer]
parseAllGroups = parseGroup `sepBy1` newline
``````

A parser in two lines of Haskell's `Parsec`. Try and do that in your favorite language. I'll wait.

Production-grade parsers can fail if the input is broken. `Parsec` provides a «driver» function

``````    parse :: Parser a -> SourceName -> String -> Either ParseError a
``````

that we must use like so

``````    parse parseAllGroups "aoc d1" contents
``````

to apply parser `parseAllGroups` to `contents`. The `"aoc d1"` is just the name of the source, so parsing errors can be tagged. The result is going to be provided as an `Either` data value.

The input for Advent Of Code problems is always well-formed. This ensures we will never have parsing errors, with the result always being of the form

``````    Right [Integer]
``````

We can finally write our parser-based solution as

``````    totalPerGroup :: String -> [Integer]
totalPerGroup cs = case (parse parseAllGroups "aoc" cs) of
Right is -> is
Left  _e -> []
``````

Again, this parser construction can be tested step by step on GHCi, using `parse`. That's left as an exercise to the reader.

The parser solution is not only shorter but also slightly more efficient, as it will take each number at a time and carry a running sum per group, skipping the newlines used as punctuation.

## Conclusion

If you've been trained and are a practitioner in the theory and practice of parsing and translation, the second solution is the first one that comes to mind. We tend to look for patterns such as the ones in the comments, know parsing combinators by heart, and are quick to build top-down parsers from the bottom-up (pun intended).

If you've left behind all the vices of imperative programming, and fully embraced the «value transformation» approach of functional programming, the first solution will resonate. You start thinking about pure partial transformations of the whole that take you step by step to the desired result. You already know about fusion and laziness, letting the compiler do its job of simplifying your code.

And regardless of your training, separating the «impure» parts (IO) from the «pure» parts (processing) first, to then decompose the processing into decoupled parts, is The Way to build software you can reason about in a sound logical way.