Haskell's type classes group types based on shared behavior.
That's what we mean when we talk about types being Eq
, Num
,
Functor
, Foldable
, Monad
or whatever. Type classes require
a minimal set of functions a type must support in order to
belong to the class, and then provide derived functions with
default implementations. Parametric polymorphism allows
restrictions to function signatures, so that only types
belonging to particular type classes can be passed as
arguments.
This challenge is a textbook example of how efficient it is to take advantage of a custom implementation for a type class...
The Problem
We are presented with the notion of «packets». They have the form of nested lists of integers. The sample input looks like this
[1,1,3,1,1]\n
[1,1,5,1,1]\n
\n
[[1],[2,3,4]]\n
[[1],4]\n
\n
[9]\n
[[8,7,6]]\n
Each pair of packets must be compared to determine if they are in the correct order. The problem description goes into detail about how to compare two packets in particular to determine their ordering.
Main Program
The main program is pretty much the same as Day 1.
Packets and how to compare them
Haskell's lists are homogeneous, i.e. all elements must be
of the exact same type, hence the signature [a]
. The packets
as described by this problem are heterogeneous lists:
there can be plain integers or lists of integers at the same
level, because nesting is possible. We'll start by defining
a recursive data type to model this heterogeneous container
in an homogeneous way
data Packet = Atom Int
| List [Packet]
deriving (Show,Eq)
This type will let us turn the heterogeneous
[[1],[2,3,4]]
into the homogeneous
List [ List [ Atom 1 ], List [ Atom 2, Atom 3, Atom 4 ] ]
and even allow mixed-mode nesting, to turn the heterogeneous
[[1],4]
into the homogeneous
List [ List [ Atom 1 ], Atom 4 ]
The type definition requested the compiler to derive Show
and Eq
.
The former let's us print (via show
) values on screen, the latter
allows us to compare (via (==)
) values on our code. The compiler
is able to generate the implementation for (==)
automatically.
We are interested in being able to compare two values of type
Packet
to decide which one is «smaller». We'd like to use (<=)
directly, and that's something that type class Ord
provides.
The compiler can generate an Ord
instance automatically, but
the ordering would not match the one required by this problem.
Go read the documentation to find out why.
Haskell allows you to implement your own instance for any
type class you see fit. For the Ord
type class
λ> :info Ord
type Ord :: * -> Constraint
class Eq a => Ord a where
compare :: a -> a -> Ordering
(<) :: a -> a -> Bool
(<=) :: a -> a -> Bool
(>) :: a -> a -> Bool
(>=) :: a -> a -> Bool
max :: a -> a -> a
min :: a -> a -> a
{-# MINIMAL compare | (<=) #-}
we see that we either provide the implementation for (<=)
or
for compare
, and in exchange all the other functions will
have working default implementations. Not only that, any
function having the Ord a
restriction can now be used on
Packet
.
Here's the custom Ord
instance for Packet
providing (<=)
I wrote
instance Ord Packet where
Atom left <= Atom right = left <= right
List lps <= List rps = lps <= rps
Atom left <= List rps = List [Atom left] <= List rps
List lps <= Atom right = List lps <= List [Atom right]
This instance implements the rules laid out by the problem description,
and I'm not going to explain them again. There are a few comments related
to the polymorphic use of (<=)
that are worth mentioning:
All the uses of
(<=)
on the left side of the equals (=
) are the ones we are defining forPacket
. That is, they are comparingPacket
on the left withPacket
on the right.The use of
(<=)
on the right side of the equals for the first rule, is comparing twoInt
s and is already defined. It's comparing the values held by theAtom
constructor, which areInt
.The use of
(<=)
on the right side of the equals for the second rule, is comparing two[a]
at the list level. That is, it's using theOrd
instance for[a]
. This is convenient because it matches exactly what thePacket
comparison for lists needs.The uses of
(<=)
on the right side of the equals for the third and fourth rules, are comparing twoPacket
. This is just recursion.
And that's all it takes for all the comparison operators
to work over our Packet
type.
Parsing packets
A Packet
is a recursive structure. Parsing one is too.
parsePacket :: Parser Packet
parsePacket = Atom . read <$> many1 digit
<|> List <$> between (char '[')
(char ']')
(parsePacket `sepBy` char ',')
For the base case, we parse one or more digits, read
them into an Int
,
and lift it into a Packet
using the Atom
constructor. For the recursive
case, we take advantage of the between
parser combinator provided
by Parsec
: find zero or more packets separated by a comma, between
squared brackets, and lift the resulting [Packet]
into a Packet
using
the List
constructor.
The input file has pairs of packets separated by an empty line. We're asked to compare each pair, so it makes sense to parse two lines into a tuple to keep them together
pairPackets :: Parser (Packet,Packet)
pairPackets = (,) <$> parsePacket <* newline
<*> parsePacket <* newline
This is a nice example of Parsec
's Applicative
API. By writing
parsePacket <* newline
we ask for both parsers to be run in sequence, but to keep
the value produced by parsePacket
. By writing
(,) <$> p0 <*> p1
we are asking to apply the tuple constructor to the results
of p0
and p1
. This effectively parses two consecutive lines
and keeps the resulting Packet
s in a tuple, in the same order
they are in the input file.
Reading the whole file as a list of tuples can be written as
parseInput :: Parser [(Packet,Packet)]
parseInput = pairPackets `sepBy1` newline
and we finish up with a nice top-level parsing function that actually runs the parser, returning the unadorned list of tuples
loadPackets :: String -> [(Packet,Packet)]
loadPackets input = case parse parseInput "loadPackets" input of
Right ps -> ps
Left _e -> undefined
Solving Part A
Each pair of packets position is relevant to the problem. The first
pair of packets is at position 1
. Part A asks to add up the positions
for all packets that are in the correct order.
Thanks to the Ord
instance for Packet
, we can use (<=)
to compare
two of them. But (<=)
is a curried function, as it should
ghci> :type (<=)
(<=) :: Ord a => a -> a -> Bool
and we are parsing our input into [(Packet,Packet)]
. This is
a perfect time to
ghci> :type uncurry
uncurry :: (a -> b -> c) -> (a, b) -> c
which turns any curried function into one that takes a tuple as argument.
Now we can use uncurry (<=)
over each tuple of our list. We reason
ghci> :type loadPackets
loadPacket :: String -> [(Packet,Packet)]
ghci> :type map (uncurry (<=)) . loadPackets
map (uncurry (<=)) . loadPackets :: [Bool]
and the result would be whether or not the particular packets are in order. Now, to add the position index to every result, we reason
ghci> :type zip [1..] . map (uncurry (<=)) . loadPackets
zip [1..] . map (uncurry (<=)) . loadPackets :: (Num a, Enum a) => String -> [(a, Bool)]
where the restrictions state that it's some kind of Enum
erable Num
ber type.
But we're only interested on those pair where the result was True
, so
ghci> :type filter snd . zip [1..] . map (uncurry (<=)) . loadPackets
filter snd . zip [1..] . map (uncurry (<=)) . loadPackets :: (Num a, Enum a) => String -> [(a, Bool)]
Now we need to extract the first component from each tuple, and add them up
partA :: String -> Int
partA = sum
. map fst
. filter snd
. zip [1..]
. map (uncurry (<=))
. loadPackets
Solving Part B
Pairs don't matter any more. We are told to bunch all packets together, add two special marker packets, and put everything in ascending order. Figure out the position where these marker packets end up on the sorted list, and multiply them.
The marker packets are
marker2 :: Packet
marker2 = List [ List [ Atom 2 ] ]
marker6 :: Packet
marker6 = List [ List [ Atom 6 ] ]
We can use loadPackets
to load as [(Packet,Packet)]
, but since the
pairing is irrelevant, we can use
ghci> :type unzip
unzip :: [(a, b)] -> ([a], [b])
to end up with a list holding «all the first» and a list holding «all
the seconds», in a pair. Now, we can use uncurry (++)
over said pair
to combine both lists into one, behind the desired markers
ghci> :type (\p -> [marker2,marker6] ++ uncurry (++) p) . unzip . loadPackets
(\p -> [marker2,marker6] ++ uncurry (++) p) . unzip . loadPackets :: String -> Packet
Thanks to parametric polymorphism we can use
ghci> :type Data.List.sort
Data.List.sort :: Ord a => [a] -> [a]
to sort our [Packet]
. Using the same techniques as seen on partA
we write
partB :: String -> Int
partB = product
. map fst
. filter (\(_,s) -> s `elem` [marker2,marker6])
. zip [1..]
. sort
. (\p -> [marker2,marker6] ++ uncurry (++) p)
. unzip
. loadPackets
to complete the solution.
Conclusion
Familiarity with Haskell's type classes is essential to approaching many programming problems in a clean straightforward way. GHC's ability to automatically derive type class instances for user defined types is rooted in Algebraic Type Theory: when you use recursive sum and product types, the compiler can deduce a reasonable instance and generate code for you. This applies to any type-class, be it Haskell standard, provided by a library, or defined on your own.
The more
type class behaviors
you can recognize and use, the easier will be to design types and write
programs to take advantage of them. Knowing when to write your own Show
, Ord
,
Semigroup
, Monoid
, and possibly Num
can lead to general solutions that
compose better.
Identifying and using Monoid
, Functor
, Applicative
and Monad
is a must.
There is a set of standard Haskell type classes described in the
Typeclassopedia, with
reasonably good descriptions of their usage.
Many libraries provide their own type classes, usually accompanied with
implementations for Haskell's standard types. For instance aeson
comes
with typeclasses FromJSON
and ToJSON
providing functionality for
exactly what you're thinking.
Type classes and parametric polymorphism restrictions are superior
to interfaces and even roles, as found in OO Programming Languages,
because you don't have to write additional code to fulfill the
interface, the compiler can generate default implementations for
your types, the code is not encumbered with runtime dispatch costs,
and you can have more than one instance for the same base type
thanks to newtype
wrappers at zero runtime cost.
Classier than your run of the mill classes.