Advent of Code 2022 -- Day 4

The problem for Day 4 is fiendishly presented in a way that leads you to believe using lists or maps, to then compute intersections, is the «obvious» way to work. And then your program blows (in style and performance), because the input data set would result in large lists and larger maps.

This is were a math background comes in handy.

We revisit the Functor and Arrow tricks to keep things point-free.

The Problem

We are given a dataset of unknown length. The example describes it like this

    2-4,6-8\n
    2-3,4-5\n
    5-7,7-9\n
    2-8,3-7\n
    ...

We are shown that each pair or numbers correspond to a set of values, i.e.

    2-4 <---> 2,3,4
    6-8 <---> 6,7,8

and that we need to figure out if either of the ranges is fully contained within the other. The problem description makes an oh-so-obvious attempt to bend you into using lists (or maps). That is, try and go

From 2-8 to [2,3,4,5,6,7,8],
From 3-7 to [3,4,5,6,7],

and then use set intersection to figure out if one is contained in the other. And I know many people who went that way. When you look at the actual data set you need to process (do it, always!) you'll see lists or maps would work, you'd need to do a fair amount of work for handling them... and would definitely not work if the ranges are in the the thousands of elements.

But your Jedi tricks don't work on me.

Main Program

The main program is pretty much the same as Day 1.

Parse and compute, all over again.

As usual, the first step is to parse the input into a manageable stream of basic parts. By the way, that's called «tokenization» in programming language parlance.

Part A asks to find in how many cases one set is fully contained within the other. Note it could be the first one contained in the second one, or the other way around. So we're looking at this signature again

    partA :: String -> Int

and can break its implementation in four parts:

Parse the input String into a list of lines, using lines.
Break each line into a pair of Ranges. Each Range models the «start» and «end».
```
 parseRanges :: String -> (Range,Range)
```
For each pair of Ranges, figure out if either overlaps the other.
```
 contained :: Range -> Range -> Bool
```
Count how many containments we find.

This leads to a high level implemenation like so

    partA = length
          . filter id
          . map (uncurry contained . parseRanges)
          . lines

There are to idiomatic things to note here:

I could've written contained with signature
```
    contained :: (Range,Range) -> Bool
```
However, this makes the function less flexible as it forces us to provide both ranges every time. It can be argued that parseRanges is providing a list of tuples, so «it fits». I was thinking ahead for Part B, unknown at the time, to see if being more general would pay out.

That's why the uncurry trick makes sense
```
    uncurry :: (a -> b -> c) -> (a, b) -> c
```
to efficiently use a curried function to receive tuples.
The map is going to produce a [Bool]. 'tis the season to show another valid use of length by showing how to count the number of True in a list.

Ranges and containment

I could've written all that follows using tuples and their combinators (fst and snd). That would've make the code a tad harder to understand, and I'd probably made logic mistakes along the way. Using separate proper types for your values leads to less mistakes, signaled by the compiler, not by your prodding while debugging.

Let's start with a simple

    data Range = Range { s :: Int
                       , e :: Int
                       }
               deriving Show

that directly models the start and end of each range.

Now, given two Ranges we can check if either is contained by the other with

    contained :: Range -> Range -> Bool
    contained rA rB = containedIn rA rB || containedIn rB rA
      where
        containedIn :: Range -> Range -> Bool
        this `containedIn` that = s this >= s that && e this <= e that

We need to check if rA is fully contained by rB, as well as the other possibility. Since (||) and (&&) are «short-circuited», this code always performs the minimum number of checks between Ranges to determine containment.

Parsing the input into `Range`s

We will use lazy String manipulation to implement parseRanges, take advantage of tuples being Functor, and convenient Arrow combinators. This is the line of reasoning as tested on the REPL:

How do we go from the whole line to its two parts?

  ghci> let wholeLine = "2-8,3-7"
  ghci> break (==',') wholeLine
  ("2-8",",3-7")
  ghci> fmap tail $ break (==',') wholeLine
  ("2-8","3-7")

How do we go from one part to its corresponding Range?

  ghci> let aPart = "2-8"
  ghci> break (=='-') aPart
  ("2","-8")
  ghci> fmap tail $ break (=='-') aPart
  ("2","8")
  ghci> let (b,f) = fmap tail $ break (=='-') aPart
  ghci> Range (read b) (read f)
  Range { s = 2, e = 8 }

Which leads to

    parseRanges :: String -> (Range,Range)
    parseRanges = (toRange *** toRange) . fmap tail . break (==',')
      where
        toRange :: String -> Range
        toRange cs = Range (read b) (read f)
          where (b,f) = tail <$> break (=='-') cs

Note the idiomatic use of <$> instead of fmap to implement toRange. Also, having to apply toRange to both parts of the tuple, is a perfect use case for the (***) Arrow combinator in the context of functions as discussed on Day 2.

And that completes our solution for Part A.

Part B is now a one liner...

Part B requires computing the number of lines that have overlapping ranges. There's a whole body of maths devoted to the study of ranges, both continuous and discrete. This problem can be solved by applying one of the basic range predicates:

    overlaps :: Range -> Range -> Bool
    this `overlaps` that = max (s this) (s that) <= min (e this) (e that)

That is, if the maximum between the lower bounds is less than or equal than the minimum of the upper bounds, there's an overlap.

The concrete implementation is

    partB = length
          . filter id
          . map (uncurry overlaps . parseRanges)
          . lines

and we're done!

Conclusion

I decided to write Part A's contained in a curried way and apply the uncurry trick, thinking that Part B might change things in such a way that would make contained reusable. It wasn't the case.

Nevertheless, it's generally wise to write multiple-argument functions in curried form. You will be able to easily apply them partially («sections»), and you can always use the uncurry trick if tuples need to be fed. Curried functions are particularly useful when applied to Functor, Applicative or Monad context in a «step by step» fashion. It's much harder to do when your function argument is tupled.

Let go of the imperative style of tupling arguments in favor of curried functions. If some of the arguments need internal structure or grouping, create a separate type to model it.

The more math you know, the better your programming. As your programming improves, the more math you end up learning. High school should start teaching the basics of logic and discrete math -- they make any professional a better thinker.

Keep calm and curry on.