Category Theory for Programmers

You don't need to learn Category Theory to be proficient at functional programming.

A little Category Theory may help you understand some concepts quicker, and it can be argued that it will make you realize how many concepts you «sorta kinda get» have deeper meanings. It is not needed by a practitioner, but it will make you an outstanding one faster, if that's your goal.

I started studying Category Theory over twenty years ago, suffering through Saunder's canonical «Categories for the Working Mathematician». I do not recommend it, even if you are a working mathematician. I felt thick as brick for a while...

I started my shift from Standard ML and Miranda towards Haskell in the early 2000's. I was advised to read Pierce's «Basic Category Theory for Computer Scientists». It made a lot more sense. I still think is too theoretical for anyone who hasn't had a strong discrete math background.

I was finally able to grasp the material after my (second) reading of Walters' «Categories and Computer Science». Not because I think it's particularly easy to follow, but because I'd read the aforementioned two: it's like reading the «Compiler's» (the «Dragon») after having studied «The Theory of Parsing, Translation, and Compiling»...

The Orange Combinator's Meetup agreed to go through Bartosz Milewski's book. I found it very entertaining: quite pragmatic, a bit «hand-wavy» at times (your Jedi tricks don't work on me), and not heavy on theory nor formalisms (not that there's anything wrong with that). Definitely not a textbook, yet able to introduce complex concepts in a somewhat familiar setting.

Each chapter has several challenges that range from writing things in Haskell, trying the same in lesser programming languages, and some equational reasoning bits.

The Orange Combinator's Meetup tries to go over the challenges in a collaborative fashion. I'm going to be adding my answers to most challenges here, after they've been discussed by the group.

Chapter 1

Implement, as best as you can, the identity function in your favorite language (or the second favorite, if your favorite language happens to be Haskell).

Perl functions receive a list of arguments and are all consumed at once (no currying). Perl functions also have a calling context, sometimes requiring a scalar value, sometimes requiring a list.

Therefore, to be as polymorphic as possible, one needs to figure out said context. That way id over a scalar will return that scalar, but id over a list will return the list if a list is wanted.
```
#!/usr/bin/perl
sub id {
  @_ if wantarray;
  shift
}
print id(42), "\n";                # Scalar context
print length id(69,"hello"),"\n";  # List context (length wants a list)
```
Implement the composition function in your favorite language. It takes two functions as arguments and returns a function that is their composition.

In order for this to work, your favorite language must have functions as first-class objects: that is, you must be able to pass and return functions through other functions. This has been possible in Perl, way before other «modern and popular» languages started providing it.

Let's assume we are going to compose single-argument functions. In that case, it's straightforward: compose receives two function references, and returns a new function reference such that it feeds its argument to the functions in sequence.
```
#!/usr/bin/perl
sub compose {
  my ($f,$g) = @_;
  sub {
    my $x = shift;
    $f->( $g->($x) )
  }
}
sub add2   { my $n = shift; $n + 2 }
sub times2 { my $n = shift; $n * 2 }

my $h = compose( \&add2, \&times2 );
print $h->( 20 ), "\n";  # add2(times2(20))  
```
When is a directed graph a category?

When every node has a loop to itself (identities), and when all possible paths are represented explicitly.

Chapter 2

Define a higher-order function memoize in your favorite language. This function takes a pure function f as an argument and returns a function that behaves almost exactly like f, except that it only calls the original function once for every argument, stores the result internally, and subsequently returns this stored result every time it's called with the same argument.

I don't need to write it in Perl, because the standard library already includes the Memoize module that does this in a general way. However, this is one of the standard examples of «can your language do this?» I used to show in class. Here it is

How many different functions are there from Bool to Bool? Can you implement them all?

The cardinality of a type is the number of possible values, so

    | Bool | = | { False, True } | = 2

Now, the cardinality for a function type a -> b is

    | a -> b | = |b|^|a|

in our case |Bool|^|Bool| = 2^2 = 4, so there are exactly four functions. They are

-- f0 = id
f0 :: Bool -> Bool
f0 True  = True
f0 False = False
-- f1 = not
f1 :: Bool -> Bool
f1 True  = False
f1 False = True
-- f2 = const False
f2 :: Bool -> Bool
f2 True  = False
f2 False = False
-- f3 = const True
f3 :: Bool -> Bool
f3 True  = True
f3 False = True

Draw a picture of a category whose only objects are the types Void, () (unit), and Bool; with arrows corresponding to all possible functions between these types. Label the arrows with the names of the functions.

Cardinalities are helpful to figure out functions
- Void is the type with no elements (empty set) having cardinality 0. Therefore: there is only one function taking Void and producing a (anything). Reason
```
    |Void -> a| = |a|^|Void| = |a|^0 = 1
```
  and the function is named absurd. There arn't any functions producing Void as there's no value to produce.
- () (unit) is the type with exactly one element having cardinality 1. Therefore, for any given type a (anything), we reason
```
    |() -> a| = |a|^|()| = |a|^1 = |a|
```
  so there are as many functions from () to a as elements are in a. Pick any element x in a, and use const x. We can also reason, for any given type a (anything),
```
    |a -> ()| = |()|^|a| = 1^|a| = 1
```
  there's only one function taking a and producinv (). It ignores its argument so, const ().
- We already know there are four functions from Bool to Bool (those would be loops), so we only need to figure functions relating to Void and ().
These are the morphisms and the associated functions
```
    Void -> Void --> id
    Void -> ()   --> absurd
    Void -> Bool --> absurd
    ()   -> ()   --> id
    ()   -> Bool --> const True
    ()   -> Bool --> const False
    Bool -> ()   --> const ()
    Bool -> Bool --> id
    Bool -> Bool --> not
    Bool -> Bool --> const True
    Bool -> Bool --> const False
```
And we can draw it as

Chapter 3

Considering that Bool is a set of two values True and False, show that it forms two (set-theoretical) monoids with respect to, respectively, operator && (AND) and || (OR).

For the binary operator AND.
- && takes two Bool and produces a Bool as result (closure).
- (a && b) && c = a && (b && c) because AND is associative.
- True && x = x && True = x makes True the neutral element.
For the binary operator OR.
- || takes two Bool and produces a Bool as result (closure).
- (a || b) || c = a || (b || c) because OR is associative.
- False && x = x && False = x makes False the neutral element.
Each type in Haskell can have exactly one instance for any given typeclass. That's why we need to use newtype to be able to have both instances simultaneously. Also, Haskell's Monoid is preceded by Semigroup for practical reasons.
```
    newtype All = All { getAll :: Bool }

    instance Semigroup All where
      (All False) <> _ = All False
      (All True)  <> x = x

    instance Monoid All where
      mempty = All True

    newtype Any = Any { getAny :: Bool }

    instance Semigroup Any where
      (Any False) <> x = x
      (Any True)  <> _ = Any True

    instance Monoid Any where
      mempty = Any False
```
Represent the Bool monoid with the AND operator as a category. List the morphisms and their rules of composition.

The category has a single object. The name or structure of the object doesn't matter: it represents «the Bool monoid over AND».

The category has three morphisms, all of them loops:
1. The identity morphism -- True.
2. The «AND False» morphism -- (&&False) section.
3. The «AND True» morphism -- (&&True) section.
The rules of composition are as follows:
1. Start with the identity morphism, making True the current result.
2. Use either «AND False» or «AND True» computing the logical AND between the previous result and the current morphism, thus becoming the current result.
3. Repeat (2) for as many compositions as needed.

Chapter 4

I refuse to code in C++

Chapter 5

Show that the terminal object is unique up to isomorphism.

Let's suppose we have two terminal objects t1 and t2. Since t1 is terminal, there is a unique morphism f from t2 to t1. By the same token, since t2 is terminal, there is a unique morphism g from t1 to t2. We have
```
    f :: t2 -> t1
    g :: t1 -> t2
```
The composition f . g must be a morphism from t1 to t1. But t1 is terminal so there can only be one morphism from t1 to t1 and it must be the identity. Therefore f . g = id.

The composition g . f must be a morphism from t2 to t2. But t2 is terminal so there can only be one morphism from t2 to t2 and it must be the identity. Therefore g . f = id.

But f . g = id = g . f proves f and g are inverses of each other, thus making t1 and t2 isomorphic.
What is the product of two objects in a poset?

A partially ordered set (poset) consists of a set and a binary relation indicating that one element of the set «precedes» another one. It's partial because not every pair of elements needs be comparable. So, for any two elements x and y in the set, it can be x < y, x = y, x > y, or they are incomparable.

We can turn the Hasse diagram for any Poset into a category by explicitly adding the identity loops, and the transitive relations (a < b && b < c ==> a < c).

Consider two objects a and b within the poset. The product p of a and b must be an object in the poset. There must be a projection («a way to find») from p to b and from p to c, such that for any other object p' in the poset, there's a morphism from p' to p (that is, p is «the best» in some sense).

Morphisms in the category are just the relation <. The projections we are looking for are also morphisms.

Suppose a and b are comparable, with a < b. Take a as their product. The identity morphism is the projection from a the product to a the smaller element, while the a < b morphism is the projection from a the product to b the larger element. Since b cannot project to a, it follows that a is the best between a and b. Any other a' < a would be factored by its path to a. Hence, a is the product.

Suppose a and b are incomparable, find the largest object p such that p < a and p < b. These morphisms are projections from p to a and b, respectively. Also, for any other object p' < p, even though there are morphisms p' < a and p' < b, they are made up of a composition that passes through p, making p a factor. Hence, p is the product.

Therefore, the product p for two objects a and b in the poset is the largest element that precedes both a and b.
What is a coproduct of two objects in a poset?

Using duality, take p the smallest element that is preceded by a and b. The morphisms a < p and b < p are the required injections. Any other p' such that p < p', would be a worse coproduct factored by that morphism.

Chapter 6

Show the isomorphism between Maybe a and Either () a

To show T1 and T2 are isomorphic we need:

A function t1ToT2 :: T1 -> T2.
A function t2ToT1 :: T2 -> T1.
Show t1ToT2 . t2ToT1 = id = t2ToT1 . t1ToT2.

These are the functions

    mToE :: Maybe a -> Either () a
    mToE Nothing  = Left ()          -- (R1)
    mToE (Just x) = Right x          -- (R2)

    eToM :: Either () a -> Maybe a
    eToM (Left ()) = Nothing         -- (R3)
    eToM (Right x) = Just x          -- (R4)

Now we prove using equational reasoning. Recall

    (f . g) x = f (g x)      -- R0

(A) We need to prove

    eToM . mToE = id
      { Eta-Abstraction: f = g => f x = g x }
    (eToM . mToE) x = id x

so we start from the left hand side

Case A.1:

(eToM . mToE) Nothing
  { replace LHS R0 }
eToM (mToE Nothing)
  { replace LHS R1 }
eToM (Left ())
  { replace LHS R3 }
Nothing
  { insert identity }
id Nothing

Case A.2:

(eToM . mToE) (Just x)
  { replace LHS R0 }
eToM (mToE (Just x))
  { replace LHS R2 }
eToM (Right x)
  { replace LHS R4 }
Just x
  { insert identity }
id (Just x)

Having shown that

    ∀x : (eToM . mToE) x = id x

it follows

    eToM . mToE = id

proving (A).

(B) We need to prove

mToE . eToM = id
  { Eta-Abstraction: f = g => f x = g x }
(mToE . eToM) x = id x

so we start from the left hand side

Case B.1:

(mToE . eToM) (Left ())
  { replace LHS R0 }
mToE (eToM (Left ()))
  { replace LHS R3 }
mToE Nothing
  { replace LHS R1 }
Nothing
  { insert identity }
id Nothing

Case B.2:

(mToE . eToM) (Right x)
  { replace LHS R0 }
mToE (eToM (Right x))
  { replace LHS R4 }
mToE (Just x)
  { replace LHS R2 }
Right x
  { insert identity }
id (Right x)

Having shown that

    ∀x : (mToE . eToM) x = id x

it follows

    mToE . eToM = id

thus proving (B).

Proving (A) and (B) implies mToE and eToM are their respective inverses, thus establishing the isomorphism between Maybe a and Either () a.

Show that a + a = 2 x a holds for types up to isomorphism.

For algebraic types a + a is the disjoint union of type a with itself, so it can be modeled with Either a a, whereas 2 x a is the product of a type with cardinality 2, such as Bool, and arbitrary type a, so it can be modeled with (Bool,a).

We propose the functions

    sToP :: Either a a -> (Bool,a)
    sToP (Left x)  = (False,x)        -- (R1)
    sToP (Right x) = (True,x)         -- (R2)

    pToS :: (Bool,a) -> Either a a
    pToS (False,x) = Left x           -- (R3)
    pToS (True,x)  = Right x          -- (R4)

Now we prove using equational reasoning. Recall

    (f . g) x = f (g x)      -- R0

(A) We need to prove

pToS . sToP = id
  { Eta-Abstraction: f = g => f x = g x }
(pToS . sToP) x = id x

so we start from the left hand side

Case A.1:

(pToS . sToP) (Left x)
  { replace LHS R0 }
pToS (sToP (Left x))
  { replace LHS R1 }
pToS (False,x)
  { replace LHS R3 }
Left x
  { insert identity }
id (Left x)

Case A.2:

(pToS . sToP) (Right x)
  { replace LHS R0 }
pToS (sToP (Right x))
  { replace LHS R2 }
pToS (True,x)
  { replace LHS R4 }
Right x
  { insert identity }
id (Right x)

Having shown that

    ∀x : (pToS . sToP) x = id x

it follows

    pToS . sToP = id

proving (A).

(B) We need to prove

sToP . pToS = id
  { Eta-Abstraction: f = g => f x = g x }
(sToP . pToS) x = id x

so we start from the left hand side

Case B.1:

(sToP . pToS) (False,x)
  { replace LHS R0 }
sToP (pTos (False,x))
  { replace LHS R3 }
sToP (Left x)
  { replace LHS R1 }
(False,x)
  { insert identity }
id (False,x)

Case B.2:

(sToP . pToS) (True,x)
  { replace LHS R0 }
sToP (pToS (True,x))
  { replace LHS R4 }
sToP (Right x)
  { replace LHS R2 }
(True,x)
  { insert identity }
id (True,x)

Having shown that

    ∀x :(sToP . pToS) x = id x

it follows

    sToP . pToS = id

proving (B).

Proving (A) and (B) implies sToP and pToS are their respective inverses, thus establishing the isomorphism between Eiter a a and (Bool,a).

Chapter 7

Prove functor laws for the reader functor.

Recall the Reader functor is nothing but a function, having

    instance Functor ((->) r) where
      fmap f g = f . g

Preservation of Identity:

fmap id h
  { fmap definition for Reader }
id . h
  { Eta-extension }
(id . h) x
  { Composition definition }
id (h x)

Preservation of Composition:

fmap (g . f) h
   { fmap definition for Reader }
(g . f) . h
   { Innermost composition }
g . (f . h)
   { Innermost fmap instance }
g . (fmap f h)
   { Outermost fmap instance }
fmap g (fmap f h)
   { Composition definition }
(fmap g . fmap f) h

Prove the functor laws for the list functor.

Recall the List functor defined by the book

    data List a = Nil | Cons a (List a)

    length :: List a -> Int
    length Nil         = 0
    length (Cons x xs) = 1 + length xs

    instance Functor List where
      fmap f Nil         = Nil
      fmap f (Cons x xs) = Cons (f x) (fmap f xs)

Preservation of Identity:

Base case:

fmap id Nil
  { fmap definition for List }
Nil
  { Insert identity }
id Nil

Inductive Hypothesis:

For all lists xs having length xs <= n, it is true that fmap id xs = id xs.

Inductive case:

Let length xs = n then length (Cons x xs) > n

fmap id (Cons x xs)
  { fmap definition for List }
Cons (id x) (fmap id xs)
  { Apply identity }
Cons x (fmap id xs)
  { Inductive Hypothesis: length xs <= n }
Cons x xs
  { Insert identity }
id (Cons x xs)

Preservation of Composition:

Base case:

fmap (g . f) Nil
  { fmap definition for List }
Nil
  { fmap g definition for List }
fmap g Nil
  { fmap f definition for List }
fmap g (fmap f Nil)
  { Composition definition }
(fmap g . fmap f) Nil

Inductive Hypothesis:

For all lists xs having length xs <= n, it is true that fmap (g . f) xs = (fmap g . fmap f) xs.

Inductive case:

Let length xs = n then length (Cons x xs) > n

fmap (g . f) (Cons x xs)
  { fmap definition for List }
Cons ((g . f) x) (fmap (g . f) xs)
  { Composition definition }
Cons (g (f x)) (fmap (g . f) xs)
  { Inductive Hypothesis: length xs <= n }
Cons (g (f x)) ((fmap g . fmap f) xs)
  { Composition definition }
Cons (g (f x)) (fmap g (fmap f xs))
  { fmap g definition for List }
fmap g (Cons (f x) (fmap f xs)
  { fmap f definition for List }
fmap g (fmap f (Cons x xs))
  { Composition definition }
(fmap g . fmap f) (Cons x xs)

Chapter 8

Show the data type data Pair a b = Pair a b is a Bifunctor. For additional credit implement all three methods of Bifunctor and use equational reasoning to show that these definitions are compatible with the default implementations whenever they can be applied.

    data Pair a b = Pair a b

    instance Bifunctor where
      bimap  f g (Pair x y) = Pair (f x) (g y)
      first  f   (Pair x y) = Pair (f x)    y
      second   g (Pair x y) = Pair    x  (g y)

Now, given bimap prove default first and second work

    first h (Pair p q)
      { Default implementation for first }
    bimap h id (Pair p q)
      { Provided bimap: f ~ h, g ~ id, x ~ p, y ~ q }
    Pair (h p) (id q)
      { Apply identity }
    Pair (h p) q

    second h (Pair p q)
      { Default implementation for second }
    bimap id h (Pair p q)
      { Provided bimap: f ~ id, g ~ h, x ~ p, y ~ q }
    Pair (id p) (h q)
      { Apply identity }
    Pair p (h q)

Finally, given first and second prove default bimap works

    bimap j k (Pair p q)
      { Default implementation for bimap: g ~ j, h ~ k }
    (first j . second k) (Pair p q)
      { Composition definition }
    first j ( second k (Pair p q))
      { Provided second: g ~ k, x ~ p, y ~ q }
    first j (Pair p (k q))
      { Provided first: h ~ j, x ~ p, y ~ (k q) }
    Pair (j p) (k q)

Show the isomorphism between the standard definition of Maybe and this desugaring

type Maybe' a = Either (Const () a) (Identity a)

The following functions establish the isomorphism

    mToP :: Maybe a -> Maybe' a
    mToP Nothing  = Left  (Const ())
    mToP (Just x) = Right (Identity x)

    pToM :: Maybe' a -> Maybe a
    pToM (Left  (Const ()))   = Nothing
    pToM (Right (Identity x)) = Just x

Show that the following data types define bifunctors in types a and b

 data K2 c a b = K2 c

 instance Bifunctor K2 c where
   bimap _ _ (K2 k) = K2 k

 data Fst a b = Fst a

 instance Bifunctor Fst where
   bimap f _ (Fst x) = Fst (f x)

 data Snd a b = Snd b

 instance Bifunctor Snd where
   bimap _ g (Snd y) = Snd (g y)

Chapter 10

Define a natural transformation from the Maybe functor to the list functor. Prove the naturality condition.

A natural transformation is a way to «change the wrapper» without altering the contents.

    mToL :: Maybe a -> [a]
    mToL Nothing  = []
    mToL (Just x) = [x]

The naturality condition states that for every f

    fmap f . mToL = mToL . fmap f

Case 1:

    (fmap f . mToL) Nothing
      { Composition definition }
    fmap f (mToL Nothing)
      { mToL application }
    fmap f []
      { fmap definition for [a] }
    []
      { Introduce mToL }
    mToL Nothing
      { Introduce fmap for Maybe }
    mToL (fmap f Nothing)
      { Abstract composition }
    (mToL . fmap f) Nothing

Case 2:

    (fmap f . mToL) (Just x)
      { Composition definition }
    fmap f (mToL (Just x))
      { mToL application }
    fmap f [x]
      { fmap definition for [a] }
    [f x]
      { Introduce mToL }
    mToL (Just (f x))
      { Introduce fmap for Maybe }
    mToL (fmap f (Just x))
      { Abstract composition }
    (mtol . fmap f) (Just x)

Define at least two different natural transformations between Reader () and the list functor. How many different lists of () are there?

Using the simplified defintion
```
    newtype Reader r b = Reader (r -> b)
```
fixing r ~ () we note Reader stores functions () -> b, therefore there are as many possible natural transformations as |b|. The challenge requests b ~ [a], thus having an infinite cardinality: we just choose any two (their usefulness is a different discussion).

Two simple choices are
```
    alpha2a :: Reader () a -> [a]
    alpha2a (Reader f) = []

    alpha2b :: Reader () a -> [a]
    alpha2b (Reader f) = [f ()]
```
The first one «ignores» the function in the Reader, producing and empty list as there's nothing to wrap. The second one uses the function once over () wrapping its result. Additional alternatives would use this same value more than once to build a list of arbitrary length.

Continue the previous exercise with Reader Bool to Maybe

Using the simplified defintion

    newtype Reader r b = Reader (r -> b)

we can ignore the function in the Reader and have nothing to wrap

    alpha3a :: Reader Bool a -> Maybe a
    alpha3a (Reader _) = Nothing

or use the function in the Reader with each possible boolean value, and wrapt the result

    alpha3b :: Reader Bool a -> Maybe a
    alpha3b (Reader f) = Just (f True)

    alpha3c :: Reader Bool a -> Maybe a
    alpha3c (Reader f) = Just (f False)