The problem for Day 4 is fiendishly presented in a way that leads you to believe using lists or maps, to then compute intersections, is the «obvious» way to work. And then your program blows (in style and performance), because the input data set would result in large lists and larger maps.
This is were a math background comes in handy.
We revisit the Functor
and Arrow
tricks to keep things point-free.
The Problem
We are given a dataset of unknown length. The example describes it like this
2-4,6-8\n
2-3,4-5\n
5-7,7-9\n
2-8,3-7\n
...
We are shown that each pair or numbers correspond to a set of values, i.e.
2-4 <---> 2,3,4
6-8 <---> 6,7,8
and that we need to figure out if either of the ranges is fully contained within the other. The problem description makes an oh-so-obvious attempt to bend you into using lists (or maps). That is, try and go
- From
2-8
to[2,3,4,5,6,7,8]
, - From
3-7
to[3,4,5,6,7]
,
and then use set intersection to figure out if one is contained in the other. And I know many people who went that way. When you look at the actual data set you need to process (do it, always!) you'll see lists or maps would work, you'd need to do a fair amount of work for handling them... and would definitely not work if the ranges are in the the thousands of elements.
But your Jedi tricks don't work on me.
Main Program
The main program is pretty much the same as Day 1.
Parse and compute, all over again.
As usual, the first step is to parse the input into a manageable stream of basic parts. By the way, that's called «tokenization» in programming language parlance.
Part A asks to find in how many cases one set is fully contained within the other. Note it could be the first one contained in the second one, or the other way around. So we're looking at this signature again
partA :: String -> Int
and can break its implementation in four parts:
Parse the input
String
into a list of lines, usinglines
.Break each line into a pair of
Range
s. EachRange
models the «start» and «end».parseRanges :: String -> (Range,Range)
For each pair of
Range
s, figure out if either overlaps the other.contained :: Range -> Range -> Bool
Count how many containments we find.
This leads to a high level implemenation like so
partA = length
. filter id
. map (uncurry contained . parseRanges)
. lines
There are to idiomatic things to note here:
I could've written
contained
with signaturecontained :: (Range,Range) -> Bool
However, this makes the function less flexible as it forces us to provide both ranges every time. It can be argued that
parseRanges
is providing a list of tuples, so «it fits». I was thinking ahead for Part B, unknown at the time, to see if being more general would pay out.That's why the
uncurry
trick makes senseuncurry :: (a -> b -> c) -> (a, b) -> c
to efficiently use a curried function to receive tuples.
The
map
is going to produce a[Bool]
. 'tis the season to show another valid use oflength
by showing how to count the number ofTrue
in a list.
Ranges and containment
I could've written all that follows using tuples and their
combinators (fst
and snd
). That would've make the code a tad
harder to understand, and I'd probably made logic mistakes
along the way. Using separate proper types for your values
leads to less mistakes, signaled by the compiler, not by
your prodding while debugging.
Let's start with a simple
data Range = Range { s :: Int
, e :: Int
}
deriving Show
that directly models the s
tart and e
nd of each range.
Now, given two Range
s we can check if either is contained
by the other with
contained :: Range -> Range -> Bool
contained rA rB = containedIn rA rB || containedIn rB rA
where
containedIn :: Range -> Range -> Bool
this `containedIn` that = s this >= s that && e this <= e that
We need to check if rA
is fully contained by rB
, as well as the
other possibility. Since (||)
and (&&)
are «short-circuited», this
code always performs the minimum number of checks between Range
s
to determine containment.
Parsing the input into Range
s
We will use lazy String
manipulation to implement parseRanges
,
take advantage of tuples being Functor
, and convenient Arrow
combinators. This is the line of reasoning as tested on the REPL:
How do we go from the whole line to its two parts?
ghci> let wholeLine = "2-8,3-7" ghci> break (==',') wholeLine ("2-8",",3-7") ghci> fmap tail $ break (==',') wholeLine ("2-8","3-7")
How do we go from one part to its corresponding
Range
?ghci> let aPart = "2-8" ghci> break (=='-') aPart ("2","-8") ghci> fmap tail $ break (=='-') aPart ("2","8") ghci> let (b,f) = fmap tail $ break (=='-') aPart ghci> Range (read b) (read f) Range { s = 2, e = 8 }
Which leads to
parseRanges :: String -> (Range,Range)
parseRanges = (toRange *** toRange) . fmap tail . break (==',')
where
toRange :: String -> Range
toRange cs = Range (read b) (read f)
where (b,f) = tail <$> break (=='-') cs
Note the idiomatic use of <$>
instead of fmap
to implement toRange
.
Also, having to apply toRange
to both parts of the tuple, is a
perfect use case for the (***)
Arrow
combinator in the context
of functions as discussed on Day 2.
And that completes our solution for Part A.
Part B is now a one liner...
Part B requires computing the number of lines that have overlapping ranges. There's a whole body of maths devoted to the study of ranges, both continuous and discrete. This problem can be solved by applying one of the basic range predicates:
overlaps :: Range -> Range -> Bool
this `overlaps` that = max (s this) (s that) <= min (e this) (e that)
That is, if the maximum between the lower bounds is less than or equal than the minimum of the upper bounds, there's an overlap.
The concrete implementation is
partB = length
. filter id
. map (uncurry overlaps . parseRanges)
. lines
and we're done!
Conclusion
I decided to write Part A's contained
in a curried way and apply the
uncurry
trick, thinking that Part B might change things in such a
way that would make contained
reusable. It wasn't the case.
Nevertheless, it's generally wise to write multiple-argument functions
in curried form. You will be able to easily apply them partially
(«sections»), and you can always use the uncurry
trick if tuples
need to be fed. Curried functions are particularly useful when
applied to Functor
, Applicative
or Monad
context in a «step by step»
fashion. It's much harder to do when your function argument is tupled.
Let go of the imperative style of tupling arguments in favor of curried functions. If some of the arguments need internal structure or grouping, create a separate type to model it.
The more math you know, the better your programming. As your programming improves, the more math you end up learning. High school should start teaching the basics of logic and discrete math -- they make any professional a better thinker.
Keep calm and curry on.