A Brainfuck interpreter, functional style

Posted on 2025-04-18 by Ernesto Hernández-Novich
Tags: , ,

The practical part of CI3725 covers a minimal set of techniques and tools one needs to know about in order to write adequate parsers and interpreters. Those who are not interested in structured language processing should at least be less ignorant about what’s reasonable and what’s not. Those who are, will know where to find better tools.

Transformation mindset

The key concept we hope students understand and apply is that a language processor is a transformation (abusing Haskell notation because it’s the best notation)

String -> Behavior

where the input is some «program description» and the output is a practical result (for some value of «practical»): it can be another high/low level description, a simulation, the execution of instructions… «things happening as described».

The main technique to embrace is breaking down this transformation into a sequence of many analyze-synthesize phases. Each phase will take the result of the previous one, process it to produce an intermediate representation that is better for the next stage. That is, something along the lines of

phase0 :: String -> R1
phase1 :: R1 -> R2
...
phaseN :: Rn -> Behavior

This naturally breaks down the problem into simpler parts, a notion rooted into good engineering. It also allows each transformation to employ a particular set of tools and algorithms specifically designed for abstract language processing. Tools and algorithms that are extremely hard to come up with by accident.

For run-of-the-mill declarative processing, or the implementation of toy programming language interpreters, we teach students to split the problem in at least three parts

  1. String -> [Tokens] (Lexical Analysis) – group input characters into meaningful «word» elements in the context of the problem.
  2. [Tokens] -> (AST,Context) (Syntax-Directed Translation) – process group of tokens to build a hierarchy of meaningful syntactical elements, creating an Abstract Syntax Tree (AST) reasonable for the problem, while capturing as much context about the description (variables, subroutines).
  3. (AST,Context) -> Behavior (the Really Hard part) – traverse the AST while using the Context to «make things happen»: each node of the AST corresponds to some basic instruction (if, while, expression) that is executed according to what’s in context. This is the actual interpretation stage.

We teach tools that make step (1) extremely easy to write, step (2) straightforward to write as long as students have a good understanding of what’s needed for (3). That is, even though the problem can be broken in these three parts, you must understand the whole processing flow in order to properly connect boundaries. We can’t teach experience, so we try to give them guidelines (both technical and in terms of estimated work), but since undergraduate students already Know Everything, there will be anguish and suffering midway through step (2).

This post is not going to be a fully functional example, i.e. there will be the minimal amount of code for the reader to understand what it takes. That is, this post is about the engineering of a solution and how to approach the back and forth. I’ve also chosen an esoteric language as subject matter because efficiently implementing its behavior is more interesting than the first two phases. It will also allow me to fuse phases (1) and (2) as they are quite… boring. At least for this language.

Brainfuck and its behavior

Brainfuck is a minimalistic esoteric programming language. Yes, it can be used to program anything that can be programmed, but no, you wouldn’t use it to program anything complex. It’s just an illustration of the Church-Turing Thesis, a concept more people in IT should be familiar with. Specially those who think «computers can do anything», even more so with deranged hallucinatory probably probabilistic angsty teenager vector adders. But I digress.

Phases (1) and (2) for Brainfuck are boring for two reasons: programs are linear and each instruction is but a single character. That means we can process the input «program» from left to right, one character at a time, in deterministic fashion, to figure out each instruction. The only challenge is handling nested blocks: finding an opening bracket ([) forces us to wait until we find the closing bracket (]) because the open needs to jump to the instruction after the close, and the close needs to jump to the instruction after the open. Confused already?

The key observation here is that as we process the input string one character at a time, we will transform it into labeled instructions. The labels will be natural numbers, and the instructions will be a Haskell data type, some with values that refer to labels. Thus, we start with a Haskell module HBF.Program providing data types to represent instructions and programs, as well as a parser. Each instruction (Cmd) is represented by

data Cmd = Next
         | Prev
         | Incr
         | Decr
         | PutC
         | GetC
         | JFwd Int
         | JBak Int
         deriving (Show,Eq)

Once we successfully parse the input, the program is immutable. Given that programs have to jump back and forth, we need an efficient data type to store the program, such that access to each position in constant time O(1): Haskell’s immutable arrays (Data.Array)

type Program = Array Int Cmd

We teach students how to use a LALR Parser Generator, and I might explore that in a future post. But for this language, it’s simpler to use a Recursive Descent Parser. Even more so thanks to Haskell’s Text.Parsec parser combinators, or any of the alternative derivatives.

The parser can be 100% deterministic, because each instruction is a single-character, and they are all different. Also, characters that are not part of Brainfuck’s syntax can be ignored, making it for a very easy basic instruction parser.

The parseSingle combinator is a parser having String as input, carrying a single Int as mutable state, and producing a list of instructions.

parseSingle :: Parsec String Int [Cmd]

The parsing process is the same for each one of the simpler instructions: make sure to match the particular character we’re interested in, modify the state to keep the counter going, and produce a singleton list with the matching instruction constructor as per Cmd. For instance, to parse a > and produce the corresponding Next, we write

parseSingle =  (char '>' >> modifyState succ >> pure [Next])
           <|> ...

The reader should be able to come up with the five other rules for the simple instructions.

Now, the open-close bracket blocks are interesting. Syntactically speaking, for every opening one there must be a closing one, and nesting them makes this a recursive problem. In terms of the program structure, we’ll need to use the proper label for the forward and backward jumps. Also, there might me more than one instruction (including nested blocks!) within the block: that’s why I chose to use [Next] to make concatenation easy.

Once we match the opening bracket and update the state we save the current state as begin. It’s the position just after the opening bracket. One or more (many1) commands are parsed using parseSingle recursively. This produces a [[Cmd]] flattened by mconcat into a [Cmd]. Since these are recursive calls to parseSingle, the state has been updated accordingly. We match the closing bracket and update the state, or immediately fail with a syntax error stating there’s a missing closing bracket. If the balancing bracket is there, we save the current state as end, to produce the final value: a jump forward to the end position, all the instructions we found in between, and a jump backward to begin.

           <|> (do char '[' >> modifyState succ
                   begin <- getState
                   cmds  <- mconcat <$> many1 parseSingle
                   (char ']' >> modifyState succ) <?> "missing ]"
                   end   <- getState
                   pure $ [JFwd end] <> cmds <> [JBak begin])

Yes, I’m making absolutely no effort to recover from syntax errors: that’s what the Brainfuck specification sates, and you have no idea how hard is to do syntax error recovery…

The rest of the instruction parsing needs to ignore any character that is not part of the Brainfuck specification. Note ignoring characters does not update the state, because they don’t count as program instructions. Finally, the parser needs an extra syntax error detection for when there are more closing brackets than opening brackets. Think about it.

           <|> (many1 (noneOf "><+-,.[]") >> pure [])
           <?> "unexpected ]"

If the above parser succeeds in processing the whole input string, it will return a full list of instructions using the Cmd type, where the internal jumps target the proper destination address. Given that I want to build a static immutable array, it would be helpful to know how many instructions to store. But that’s exactly what the parser state carries! We can write a top-level parser that uses parseSingle to get all the instructions in a single list cs and the value of the internal state once finished

parseProgram :: Parsec String Int (Int,[Cmd])
parseProgram = do cs <- mconcat <$> many1 parseSingle
                  l  <- getState
                  pure (l,cs)

and then write a program loading function that uses that information to build the immutable array

loadProgram :: String -> Program
loadProgram input = case runParser parseProgram 0 "loadProgram" input of
  Left _e      -> error $ show _e
  Right (l,cs) -> array (0,pred l) $ zip [0..] cs

This wraps up our fused (1) and (2) stages: going from the input String into a Program. It’s nicely encapsulated into a separate Haskell module exporting both Cmd and Program data types, as well as the loadProgram function.

In terms of efficiency, the parser is deterministic as it does not need to backtrack, so it takes O(n) time in the number of instructions. I chose to write the parser this way because I understood the language’s syntax. It requires as much stack space as there are nesting blocks, but that would be necessary regardless of the chosen parser type. It can be argued that using [Cmd] is not the most efficient data structure. But I used it as a Monoid (noticed <> and mconcat?), which means it can be replaced with a better performing data structure as long as it is a Monoid (e.g. Data.Sequence). I probably won’t.

I make it correct first, fast only after measuring.

Focus Lambda-san

I was able to fuse phases (1) and (2) into one, on account of Brainfuck’s syntax being intentionally easy to parse, and the lack of symbols (variables or subroutines). That means we have «the AST» we need to simulate, and we don’t need any other context from the program source. So far, so good.

So we focus on phase (3), the actual behavior. Brainfuck specifies its mutable state as a single tape. There is a pointer to the «current cell», that can move left or right to change focus. The tape has to store at least 30000 elements, but this is Haskell so we will try and make it infinite (bound by the amount of available RAM). The left end of the tape must be easily identifiable: trying to move the pointer to the left of the left end, should result in no movement. And it has to efficiently move left or right, one position at a time.

We are obviously not going to use an Array, because both size and contents are immutable. Using [a] and functions like length, (!!), drop or take would be brutally inefficient, as they would traverse the list from the left every single time. If you have taken a proper course in data structures, even better, taking one in performant declarative data structures (like the ones discussed in CI4251), you should’ve already realized «’tis but a list zipper».

We continue with a Haskell module named HBF.Tape providing an abstract data type Tape, built like this

data Tape = Tape { front :: [Int]
                 , rest  :: NE.NonEmpty Int
                 }
          deriving (Show)

Here’s the trick: the tape

[1,2,3,4,5,6,7]
       ^

with 4 being in focus, will be represented as

[3,2,1] [4,5,6,7,8]
\front/ \  rest  /

Being a properly engineered abstract data type, there are some invariants that hold true at any given time:

  1. ∀ tape : tape = reverse front <> rest. We only observe that representation for debugging, being implicit during execution. It’s the mathsy way to describe the trick.

  2. The rest of the list will never be empty. That’s why it will be represented using Haskell’s NonEmpty type.

  3. The focus will always be on the first element of the rest.

  4. Identifying we are at the left end of the tape should take O(1).

The module exports the type Tape without the internal implementation details, alongside the API needed to use it, while keeping the invariants

module HBF.Tape ( Tape
                , initialTape
                , shiftR
                , shiftL
                , peek
                , poke
                , change
                )

The only way to create a new type is by calling

initialTape :: Tape
initialTape = Tape { front = []
                   , rest  = NE.singleton 0
                   }

which, unsurprisingly, creates a tape with a single 0 in focus.

This data type is extremely efficient because accessing the first element of a list is always O(1). For instance, to read the element currently in focus

peek :: Tape -> Int
peek k = NE.head $ rest t

where NE.head is O(1) and NonEmpty guarantees rest t contains at least one element.

How about moving the focus to the left? If the front of the tape is empty (null is O(1)) there’s nowhere to move so we return the list as is. If there’s at least one element in the front we drop it (drop is O(1)), placing it as the new first element of rest (<| is O(1)).

shiftL :: Tape -> Tape
shiftL t = if null (front t)
             then t
             else Tape { front = drop 1 $ front t
                       , rest  = head (front t) NE.<| rest t
                       }

we are effectively transforming

[3,2,1] [4,5,6]
         ^

into

[2,1] [3,4,5,6]
       ^ 

The rest of the functions are left as an exercise to the reader. The most interesting one would be shiftR: after pushing the previously focused value into the front (using (:) which is O(1)), a check is needed to see if we reached the far end of rest to create a new singleton 0 in focus (another O(1) operation).

Modern GHC compilers are gradually forcing programmers to refactor their code to use NonEmpty a instead of plain [a], because the former allows for even better optimizations over list-like code, since the non-emptiness can be enforced at compilation time. It also forces the programmer to properly check their list manipulation, and not having to deal with head or tail throwing exceptions on empty lists.

Plain [a] has an immense didactic value, but once you understand its shortcomings, you should use better data types, forgo partial functions, or include additional checks that would be also needed when… using better data types. So, use better data types.

Run to the hills

Phase (3) usually brings pain and misery to students that did not pay attention to the plethora of warnings brought up by lecturer and teaching assistants. It’s the part where you combine your AST (Program in our case) and your Context (Tape in our case) and write a recursive function that goes over the AST effecting changes over Context.

In our particular case:

  • Program will always be read-only, as most executable code should be.

  • Tape will naturally be read-write. There’s also the need to know what is the next instruction to execute, as Brainfuck programs go one instruction after another and follow jumps.

  • Brainfuck programs read and write, so there’s I/O involved.

This leads to the obvious solution of using a carefully setup monad transformer stack

data BF = BF { tape  :: Tape
             , pc    :: !Int
             , power :: !Power
             }

type BrainFuck = ReaderT Program (StateT BF IO)

where the ReaderT ensures read-only access to our Program, StateT allows read-write access to the execution state BF, all sitting atop IO in order to interact with the world.

Every Cmd in our Program has a particular effect on the Tape (reading, writing, moving) and will always change the program counter pc (next step or arbitrary jump). We need to write one function for each Cmd value, implementing that particular set of effects. It’s worth separating the effects on the program counter from the effects over the tape.

Consider the abstract effect «jump to the i-th instruction». According to the Brainfuck implementation, jumping to a nonexistent instruction stops the program immediately. That means we need to get the current state, check if the target instruction is within bounds in terms of Program, and act accordingly

goto :: Int -> BrainFuck ()
goto i = do s0 <- get
            r0 <- asks bounds
            if inRange r0 i
               then put s0 { pc = i }
               else put s0 { power = Off }

Function get from StateT reads the current state as a whole. Function asks from ReaderT, it’s used to apply bounds to the Program array. This gets the array’s valid index range used by inRange. A check that ends up either putting a new state jumping to the i-th instruction, or shutting the power off.

This leads to a trivial implementation for step, i.e. continue to the next instruction when the current one is not a goto: get the current state, read the pc, add one, goto there. It’s one line using desugared Monad notation.

Once you have goto and step to manipulate the program counter, we can implement all the actual instructions. For instance, the implementation for ShiftR would be

next :: BrainFuck ()
next = do s0 <- get
          put $ s0 { tape = shiftR $ tape s0 }
          step

Start by getting the current state, apply shiftR to the current tape in s0, put the new tape back as updated state, and step to the next instruction.

Implementing the conditional jump is straightforward too. Consider the implementation for JFwd i

jfwd :: Int -> BrainFuck ()
jfwd to = do v <- gets (peek . tape)
             if v == 0
                then goto to
                else step

Function gets from StateT lets us apply a function to the state, instead of manipulating the state as a whole. In this case, peeking at the current value the tape has in focus. If said value is 0, then goto the destination address, otherwise step to the next instruction.

Finally, functions that need to interact with the real world will take advantage of the underlying IO monad. Brainfuck’s specification states that PutC reads the value currently in focus, and outputs its ASCII equivalent, therefore

putc :: BrainFuck ()
putc = do c <- gets (chr . peek . tape)
          liftIO $ putChar c
          step

Exactly the same flow as for a jump in terms of peeking, but adding chr to get the corresponding ASCII character. Then we produce output by «lifting» the IO monad into the stack, and step to the next instruction.

Finally, after writing all the functions needed to implement each instruction’s behavior, an abstract «Brainfuck microcode» can be implemented as

go :: BrainFuck ()
go = do pc0 <- gets pc
        cmd <- asks (! pc0)
        case cmd of
          Next   -> next
          Prev   -> prev
          Incr   -> incr
          Decr   -> decr
          PutC   -> putc
          GetC   -> getc
          JFwd i -> jfwd i 
          JBak i -> jbak i 
        pwr <- gets power
        when (pwr == On) go

where we read the current program counter pc0, fetch (!) the instruction from the array, perform the particular instruction using the associated implementation monadic effect, check if the machine is still on, and if so tail-recur. Tail-recursion will be compiled to a closed loop, so this Brainfuck interpreter will be very efficient in terms of memory. If the Brainfuck execution tries to jump out of its programs bounds, it will terminate. Or it can hang forever, as any Turing-complete computation device must be able to…

Some assembly required…

Brainfuck source code will come from some text file, named using a command-line argument. We then

readFile :: FilePath -> String

to read it. We would need a function

brainfuck :: String -> IO ()
brainfuck input = {- left as an exercise -}

that perfectly aligns with our notion of an interpreter. This function can be built with the parts we’ve worked out so far. I hope you can figure out how to fill in the blanks around:

  1. Parsing the program from the input

    program :: Program
    program = {- left as an exercise -}
  2. Creating the initial state for the Brainfuck machine

    state0 :: BF
    state0 = BF {- left as an exercise -}
  3. Unravel the monad stack over go to run. I’ll give it in point-free style Because I Can®.

    emulate :: BrainFuck () -> IO ((),BF)
    emulate = flip runStateT  state0
            . flip runReaderT program

    Try and rewrite it explicitly without the flips. Figure out where the ((),BF) comes from and either take advantage of it to have a Debug mode for your interpreter, or change what you need to make it a mere () .

Mine can run things in normal mode by default

$ stack exec hbf-exe -- ~/hello.b
Hello World!

but also in Debug mode with pretty-printing

$ stack exec hbf-exe -- --debug ~/42.b
      ++++         +++
    +[>++++    ++[>+<-][
   <]<  -]>   >++    +++
  +.-   ---   ---    ---
 --.+++++++         +++
        +++       .++
        +++      +.-
        ---    -----.--.

Final State
( ()
, BF
    { tape =
        Tape
          { front = [ 45 , 43 , 32 , 10 , 0 , 0 ]
          , rest = 46 :| [ 60 , 62 , 91 , 93 ]
          }
    , pc = 459
    , power = Off
    }
)

Here’s your hello world and the answer.

All in all, it took me about an hour to write the basic Brainfuck interpreter in about 160 lines of code, imports, type-signatures and all. The command line processing, debugging mode, and pretty-printing added about 20 more lines. A breakpoint instruction required half a dozen: # in source becomes Bkpt :: Cmd; when reaching it, the interpreter stops, pretty-prints the state, and waits for you to press any key to continue (if you have it).

Sometimes you don’t need a fully-fledged lexer, a generated shift-reduce parser, a syntax tree, and a symbol table. But the analyze-synthesize mentality should always be there. And choosing languages with superior capabilities for implementing other languages.