Sliding windows are hard. They become annoying when you have to carefully synchronize them with I/O operations, and their buffering. For those, and other reasons relating to cardinality, I suppose Day 6 must've been very annoying for those not using lazy functional programming.

## The Problem

We are given a lone long line of characters. If you cheat and look at the input, is 4096 bytes. I rather not cheat and write fast, elegant code, that doesn't care how long the input is.

The example given is a much shorter input

```
mjqjpqmgbljsphdztnvjfqwrcgsmlb
```

and the objective is to find the first position in the `String`

such that four different characters appear for the first time.
That is, imagine sliding a window of size 4 over the string

```
mjqjpqmgbljsphdztnvjfqwrcgsmlb
\__/ 'j' is repeated
mjqjpqmgbljsphdztnvjfqwrcgsmlb
\__/ 'j' is still repeated
mjqjpqmgbljsphdztnvjfqwrcgsmlb
\__/ 'q' is repeated
mjqjpqmgbljsphdztnvjfqwrcgsmlb
\__/ all characters are different
```

Once the *first* window with four different characters is found,
we must report the *rightmost* position of the window, counting
from 1. For the example

```
mjqjpqmgbljsphdztnvjfqwrcgsmlb
\__/
1234567
```

we should answer 7.

## Main Program

The main program is pretty much the same as Day 1.

## There's no parsing

As usual, we are going to read our input lazily using
`readFile`

. It will take care of buffering, and feeding data
to downstream functions as soon as it is *needed* and not
before.

Since `readFile`

provides a `String`

, which is nothing more
than `[Char]`

, we just need to figure out:

How to check if all

`Char`

within a window are different?How to move the window over the lazy long

`String`

while keeping track of our position, until we get to the first valid window.

## Verifying the window

To check if all elements of a list are different, there's
no other way than to compare each pair. For a window of
four elements, and an imperative mentality, this seems
pretty straightforward. For the sake of argument, let's
say the window is a `[Char]`

(it's going to be), but I want
to treat it «as if it were an array» (because I suddenly
have a relapse of the imperative programming malaise).

Then you'd be tempted to write something

```
fugly :: [Char] -> Bool
fugly [a,b,c,d] = a /= b && a /= c && a /= d
&& b /= c && b /= d
&& c /= d
```

or something nastier up to isomorphism. It only works for four, and you explicitly enumerate all cases. «A» for effort in noticing there's a minimum number of comparisons, though.

Let's try a more general approach: process windows of *any size*,
and *any kind of content* (as long as they are «comparable»). This
suggests a signature such as

```
allDifferent :: Eq a => [a] -> Bool
```

Now, consider an arbitrarily sized list of things that can be compared,

```
[a0,a1,a2,...,aN]
```

Now, for *each* element of the list, let's check if it is
a member of the rest of the list. There's a standard
Haskell function `elem`

that does precisely this, i.e.

```
a0 `elem` [a1,a2,...,aN]
```

will return `True`

or `False`

if `a0`

is contained in `[a1,a2,...,aN]`

.
And `elem`

is «short circuited» so it returns `True`

as soon as
possible, and `False`

if it needs to check all the rest (and there's
no other way around that).

After the first element has been compared to the rest, I save
whatever result `elem`

produced, and now focus on the *second*
element and repeat, i.e.

```
a0 `elem` [a1,a2,...,aN] -> True or False
a1 `elem` [a2,...,aN] -> True or False
a2 `elem` [a3,...,aN] -> True or False
...
aN `elem` [] -> False
```

We are processing each element of the list *once*, from left
to right. At each step we need to compute using the current
rest of the list. And we need to *save* the result of the
partial computations. That's a `foldl'`

with a structured
accumulator: we collect the `Bool`

results in a list, and we
carry the current rest of the list.

So we start with

```
allDifferent as = foldl' go ([],tail as) as
where
go (bs,es) e = ((e `elem` es) : bs, tail es)
```

so we can go over *every* element of `as`

one by one. We start
with an empty `[Bool]`

and the initial `tail`

of the list.
At each step, we save the result of testing if the element
being processed (`e`

) is contained in the current rest of
the list (`es`

), and continue with the `tail`

of the current rest.

The first element of the resulting tuple will be a `[Bool]`

holding
the results of all `elem`

tests, and the last tail which is empty
and we don't care about. So we can improve our code

```
allDifferent as = fst $ foldl' go ([],tail as) as
where
go (bs,es) e = ((e `elem` es) : bs, tail es)
```

to get the `[Bool]`

.

Now, if there's *at least* one `True`

in that list, we know there's
a repeated element. So we need the boolean disjunction («or»)
for all elements in the list. If at least one is `True`

, the
disjunction will quickly short circuit to `True`

, and we need to
negate it to signal they are *not* `allDifferent`

.

If all elements are different, the list will be full of `False`

,
the disjunction will be `False`

and we need to negate it to
signal «all different». This is the most expensive case, but
it only happens precisely when detecting what we are looking
for.

We can finally write

```
allDifferent :: Eq a => [a] -> Bool
allDifferent as = not $ or $ fst $ foldl' go ([],tail as) as
where
go (bs,es) e = ((e `elem` es) : bs, tail es)
```

It doesn't matter how many elements there are. It works for any
data type that supports comparisons. It uses short-circuit
boolean operations and will do the *minimum* number of comparisons
to figure out the answer. You can't get much better than this.

## Moving the window

We can check windows of any size, so it's only natural to try and write code that moves a window of any size over input of any content as long as the contents can be compared.

This suggests a signature

```
findMarker :: Eq a => Int -> [a] -> Int
```

having as first argument the size of the window we want to
move, and as second argument the «we don't know nor care how big»
list of things that can be compared. It's supposed to return
the *rightmost* position of the window, the moment all its
elements are different.

Haskell lists positions start at zero. Starting at zero is the right way to count in Computer Science. It is, it always has been. Shut up. Learn to count.

If we start with a list such as

```
[e0,e1,e2,e3,e4,e5,e6,...]
```

we can use the standard functions `take`

and `drop`

. I'll use 4
to drive the point home

```
ghci> take 4 [e0,e1,e2,e3,e4,e5,e6,...]
[e0,e1,e2,e3]
ghci> drop 4 [e0,e1,e2,e3,e4,e5,e6,...]
[e4,e5,e6,...]
```

This way we can «prime» the initial window, and focus on the
remaining input. If we are working with windows of size 4...
did you notice that *that* would be the index to return
if this window is the one satisfying the conditions?

From now own, we only need to count from 4 onward, while grabbing one character at a time to shift the window.

```
findMarker :: Eq a => Int -> [a] -> Int
findMarker c s = go c (take c s) (drop c s)
where
go count buffer rest = if allDifferent buffer
then count
else go (succ count)
(tail buffer ++ [ head rest ])
(tail rest)
```

In order for the result to be provided immediately, I've
written the logic as the tail recursive function `go`

.

Notice that the moment `allDifferent buffer`

succeeds, the
function *immediately* returns the current `count`

, ignoring
the rest of the input. But if the check fails, the
`count`

is incremented by one, we «shift out» the leftmost
element of the `buffer`

and add the next element in the
input.

Tail-recursive functions are as efficient as your carefully
written `while`

loop, both in terms of stack space, as well
as local variable use, as long as the compiler for the
language does the right thing. GHC does. Good C/C++ compilers
can as long as there's no aliasing in the body. Chances are your
favorite dynamic language doesn't.

The notion of «loop invariants» that many of you learned about if you took a course in algorithm design, can be applied here:

`count`

always points to the rightmost position of the`buffer`

, and`buffer`

is of constant size as it loses (`tail`

) exactly one element while gaining (`(++)`

) exactly one more, and`rest`

is always losing (`tail`

) one element.

This allows me to *reason and prove* that the window is *always*
of arbitrary size `c`

and it will slide from left to right until
all of its elements are different.

Generic, efficient, and formal.

## Solving Part A

Part A wants windows of size 4. So be it

```
partA :: String -> Int
partA = findMarker 4
```

## Solving Part B

Part B wants windows of size 14. You don't say.

```
partB :: String -> Int
partB = findMarker 14
```

I pity the fool that used arrays and had to come up with all the comparisons or the convoluted array index relations...

## Conclusion

The more general your code, the better the chances for
reusing. The less you are bound by the shackles of
the `length`

of things, the `while`

using counters, and
embrace laziness, fusion, `fold`

and tail-recursive
encoded loops, the more your code will compose better.

I must point out that

```
someList ++ [e]
```

is a *terrible code smell* in the general case, because
concatenation `(++)`

is `O(n)`

on the length of it first argument.
Since we are working with short lists, and performing a single
concatenation, is not much of an issue.

If this was production code, with sliding windows of
considerable size, I would've used `ByteString`

for binary
data or `Seq`

for structured data. Both data types are carefully
crafter for functional programming to afford adding to *either*
end in `O(1)`

time.