The previous post presented an example covering both the mindset and techniques needed to write an interpreter. Haskell is, by far and large, the best language for language embedding, parsing, and compiling, so it’s a natural choice. I wouldn’t want to write parsers in any other language, and I pity the fool.
Prolog is particulary well suited for prototyping and implementing programming languages. That’s something I definitely asserted while teaching CI3641 and CI3661. It’s unfortunate I never had a chance to show full examples then and there, as there’s so much you can share in four lectures. I’ll try and convince you with this post.
Now, if you’re already a fan of Prolog, having exposed your mind to unification, automatic backtracking, pure solution space exploration, and the clarity of recursive thinking… there will be very little of it. You see, parsing and intrepretation for programming languages are inherently deterministic and full of side effects. We’ll need to use Prolog’s «imperative» features. Unlike the programming projects students had to face where it was prohibitid, my usage of said techniques is absolutely warranted. I have a permit: I do what I want.
Transformation mindset, revisited
Writing a language processor in Prolog is also a transformation. But given Prolog’s processing model, and particular set of data types, we cannot approach it as a streaming transformation, as we did when using Haskell.
The input is still going to be a «program description» and the output a «practical result». Prolog was not designed to handle strings, in the way Haskell, and lesser imperative languages do. Traditional Prolog systems only deal with basic units known as atoms that are not strings, but «labels» that stand on their own; even basic I/O is done in terms of atoms. Modern Prolog systems have included the notion of strings, but only at the edges: read a string and be forced to «atomize it», or use a string as a format for printing.
Therefore, breaking down this transformation into a sequence of many analyze-synthesize phases, will have to consider this: read a string from a file and turn it into a list of atoms. It’s just a case of tokenization as described before.
Then, a deterministic parser needs to process this list of atoms to produce some sort of intermediate data structure. But Prolog does not have functions, it has predicates. You don’t call a function passing arguments to get return values; you write predicates that must be proven and result in variables taking values. So we have to write a set of predicates that behave like a parser, succeeding exactly once if, and only if, said list can be parsed.
Parsers produce an intermediate data structure, usually
more than one (a tree, a symbol table, intermediate code). Prolog
has a
single data structure: a compound term, a.k.a. functor
(no relation with Haskell Functor
). Any given functor
can be data or code, depending on context. There are no
ready-made arrays, maps, trees, or graph data structures in
Prolog: you make your own out of functors as you go. Even Prolog
lists are nothing but syntactic sugar for the dot (.
) functor!
It follows the parsing predicate will need to create some sort of functor-based structure representing the Brainfuck program to execute. For Brainfuck, executing a program means having a tape. A persistent tape. And it can grow arbitrarily as we move to the right. As we move!
Again, this post is not going to be a fully functional example. It’s about the engineering of a solution and, showing that the technique is the same but adapted to a different mindset: this is not going to be «a Haskell program written in Prolog», much less «I’m a good Perl/C/Go programmer so Prolog has to allow me to do this, otherwise it sucks».
Brainfuck and its behavior
I’ll assume you read
Brainfuck and its behavior
from my previous post. All observations apply, except
that Prolog doesn’t have ADTs, arrays, lazy evaluation,
[Char]
, parser combinadors, nor polymorphism. But, if
you’re familiar with
Prolog semantics, you know its funct… PRED-I-CATES can
be recursive, are applied in definition order, and
have implicit backtracking.
This means we could write a top-down predictive parser by hand,
like the competent non-Prolog programmers we are.
The good thing is we don’t need to as any modern Prolog system
provides Definite Clause Grammars.
They look like the Context Free Grammars discussed in CI3725 to base parsers upon. DCGs have non-terminals on the left, an arrow to the right, and then a sequence of terminals, non-terminals, and Prolog «actions». The Prolog interpreter will translate the DCG rules into recursive Prolog predicates that manipulate an implicit list of atoms, processing it as a top-down predictive parser. Indeed, a Prolog parser generator embedded in Prolog and accepting embedded Prolog. You ain’t seen nothing yet. A couple of things to pay attention to:
Only atoms can be used as terminals (tokens), so our input has to become a list of atoms.
Arguments can be added to the non-terminals to carry information up or down the parsing flow.
The Haskell parsing combination solution for simple instruction looked like this
= (char '>' >> modifyState succ >> pure [Next])
parseSingle <|> ...
to match character >
, increment the instruction counter, and
produce the singleton list of instruction based on an ADT constructor.
This can be written using Prolog DCG’s and functors as the rule
I,next)],I,F) --> ['>'], { F is I + 1, ! }. stmt([p(
The functor stmt/3
on the left side is our non-terminal;
the 3
stands for three arguments. Arguments in Prolog are not
like arguments in other languages, that you pass to functions
or procedures as parameterizing input. Prolog arguments
are unified during processing: Prolog will construct and/or
deconstruct arguments out of the available parts. Sometimes arguments
look like they are in (given, so the predicate has something
to work with), sometimes they look like they are out (built
as part of the predicate’s work), and sometimes they are both
(partially given, partially produced). We’ll leverage the
three ways.
If I were to document stmt/3
following Prolog’s lore,
it would read
% stmt(-Stmt,+Begin,-End)
meaning:
- The first argument is out (it’s actually both, you’ll see), unified with the instruction we are parsing.
- The second argument is in. It does not specify a type, but the context will make it an integer: the current counter, as we try to apply the DCG rule.
- The third argument is out. Prolog does not allow mutation, so in order to keep count, an additional argument will be unified to the updated value, if we were able to fully apply the DCG rule.
The DCG rule should make more sense now
I,next)],I,F) --> ['>'], { F is I + 1, ! }. stmt([p(
Predicate stmt
will succeed with its first argument
being a singleton list with the functor p(I,next)
,
as long as all the predicates on the right side succeed
from left to right. next
is an atom, it stands for itself,
representing the abstract Brainfuck instruction (remember
Next
from the Haskell ADT?).
But where does I
come from? Well it’s the second argument
that we are receiving: Prolog will unify variable I
from
the second argument into the first argument and everywhere in
the rule. That’s why I said the first argument is both:
it’s partially provided, and completed via unification,
but only if the predicate succeeds.
The rules on the right side start by a straightforward
match and consume an '>'
from the implicit list. But as
an atom. Then, everything within curly braces is literal Prolog
code, i.e. use Prolog’s is/2
predicate to do math, such
that F
is I
incremented by one. And then, cut (!
).
Our parser has to be deterministic: if we got this far,
it means this was the right rule to choose, and we don’t
need Prolog to backtrack. If the match fails, Prolog will
backtrack to the next stmt/3
rule available. That is,
the rule has to match and count, or try the next rule.
Since unification can be confusing until it becomes magical, this is what would happen when using this rule. This is not exactly Prolog, but I hope it will get the point across. Let’s say that the implicit list looks like this
'>',...] [
before applying the rule, and the rule was «asked» like this
Result,41,Next) stmt(
Prolog will automatically do the following for you. Read the
=
as “equivalent” not as assignment, and _
as undefined)
I,next)] = Result
[p(I = 41
F = _
but I
has to be the same for the whole rule so
41,next)] = Result
[p(I = 41
F = _
The first element of the implicit list matches what we are looking for, so Prolog will remove it from the list, and proceed with
41,next)] = Result
[p(I = 41
F = _
F is I + 1
but I
has to be the same for the whole rule so
41,next)] = Result
[p(I = 41
F = _
F is 41 + 1
math happens, and then the cut prevents backtracking. So, the «question»
Result,41,Next) stmt(
is «proved» with the change to the implicit list and
41,next)] = Result
[p(I = 41
F = 42
That is, a full singleton list with the current counter and instruction
held by the arbitrarily named p/2
functor, as well as
the next instruction counter we need to rinse and repeat. Writing
DCG rules for the other simple instructions is left as an
exercise to the reader.
Now, the open-close bracket blocks are as interesting in Prolog as they were in Haskell. It’s the exact same reasoning, but written using DCGs: for every opening one there must be a closing one, and nesting them makes this a recursive problem.
Block,I,F) --> ['['], { I1 is I + 1, ! } % (1)
stmt(, { Begin = p(I,jfwd(F)) } % (2)
, list(B0,I1,F0) % (3)
, [']'] % (4)
, { End = p(F0,jbak(I1)) % (5)
, append(B0,[End],Body)
, Block = [Begin|Body]
, F is F0 + 1
, !
}.
L,I,F) --> stmt(S,I,F0),
list(Ss,F0,F),
list({ append(S,Ss,L) }.
,I,I) --> []. list([]
A single undefined variable Block
is used as first argument,
expecting the whole list of instructions in the block to be
unified with it. That makes it an out argument: we don’t know
how many arguments in that list, but that’s the beauty of a
well-thought out recursive program. Arguments I
and F
serve
the same purpose as before.
We start (1) by matching the opening bracket and computing I1
as
our new counter, and continue building the first instruction of
the block in (2), unifying it with a variable for later use.
Notice how I
has a value, but F
still doesn’t: unification
will take care of putting the value in place once we compute it.
Now for the funky recursive stuff in (3). DCG list
tries to match
as many single statements as possible from left to right,
collecting them in a list. B0
will hold all the statements,
I1
is the counter to use for the first statement of the block,
and F0
will have the counter after the block is collected.
Looking at the implementation of list
you’ll see that it
uses stmt
recursively: since stmt
keeps count for simple
instructions, each of those will increment the counter
accordingly; and if the list contains another opening brace,
a nested recursive block will be parsed starting with the proper
count.
If the block is empty, the base case for list
simply
unifies the in counter, with the out counter, because there’s
no reason to increment.
After collecting the block, now unified with B0
, we try and
match the closing bracket at (4). The rule ends
building the block’s closing statement at (5): I1
was computed
at (1), and F0
was produced by collecting the block. The
whole Block
is built using Prolog’s builtin append/3
to
concatenate lists, and list construction syntatic sugar.
After using F0
to compute the final counter F
, cut backtracking.
As with the Haskell version, ignoring any character not part of the Brainfuck specification does not update the state. But the implicit list has atoms not characters. This needs to be the last rule, a «catch all»
,I,I) --> [A], { atom(A), ! }. stmt([]
What happens if any of the matching over the implicit list
fails? Well, that’s a syntax error, making the predicates fail.
The attentive reader should have figured out by now it can only
happen when there’s a missing ]
or we run out of implicit list.
Since we’ve added cuts for each rule, there will be no backtracking.
Again, I’m making absolutely no effort to recover from syntax
errors, following Brainfuck’s specification.
Since a Brainfuck program is a list of statements, we will need a single DCG rule as our «top level» rule
Program) --> list(Program,1,_). bf(
such that Program
will be unified with the whole list of
instructions (the collection of p/2
functors), as long as
list/3
can parse it. We pass an explicit 1
to start
the counter, and we don’t care what the final counter is.
But our parser works on a list of atoms and our program will come from a file holding characters. This is the I/O boundary we have to pay attention to. I’m using SWI Prolog for this proof of concept, so I will refer you to its documentation to understand
F,P) :- read_file_to_string(F,S,[]),
source(S,Cs),
string_chars(phrase(bf(P),Cs,[]).
where read_file_to_string(+Filename,-String,+Options)
does
exactly what its name advertises. Still, no use for strings as they
are the poor man’s datatype. But, once S
is unified with
the string coming from file F
, string_chars(+String,-ListOfChars)
does exactly that what its name advertises.
Finally, phrase/3
is the actual DCG entry-point: try to use
DCG parser bf(P)
on input Cs
consuming all input. The empty
list as third argument means «leave nothing in the implicit list»
(yes, partial parsing positively possible).
$ swiplWelcome to SWI-Prolog (threaded, 64 bits, version 9.0.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.
For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).
?- consult('pbf.pro').
true.
?- source('hw.b',Instructions).
Instructions = [p(1, incr), p(2, incr), p(3, incr), p(4, incr), p(5, incr), p(6, incr), p(7, incr), p(8, incr), p(..., ...)|...]
We can parse our file into a list of functors representing
the program’s instructions. The parser is deterministic as
it does not need to backtrack, so it takes O(n)
time
in the number of instructions. Written this way, again,
because I understood the language’s syntax. It requires as
much stack space as there are nesting blocks, but that would
be necessary regardless of the chosen parser type.
Notice how Prolog DCG’s help make the input stream implicit
and keep the data processing and grouping explicit. And how
convenient unification is to push or pull values back
and forth. I always make a point of mentioning that the
top parser is list
, that tries to use stmt
for simple
things, until going back to list
for nested blocks –
the Ouroboros
strikes again…
Horn your skolems
Let’s turn to Brainfuck’s mutable state: the tape. I will
use the exact same trick used for the Haskell implementation,
the list zipper.
But Prolog doesn’t have ADT’s to encapsulate functionality,
no NonEmpty
datatype and, no State
monad to hold the list
as we run our interpreter.
If we were to use pure Prolog predicates, I would need to add an extra argument just to carry the tape (which is expensive). And since Prolog’s scope rules state that variables are only alive in the current rule, then we would need another argument to carry the possibly modified tape after each predicate. Possible, but certainly not very Prolog-y.
Prolog has a knowledge database. Every Prolog program is actually a set of predicates stored in said database, usually immutable. However, it’s possibly to define a dynamic predicate such that rules can be added or removed by the running program. We can have a dynamic predicate
?- dynamic(tape/2).
to model our tape. At any point in time there will be
exactly one rule tape/2
in the database: a fact with form
Tnorf,Rest). tape(
modeling the list zipper: Tnorf
is, obviously, the front
of the list, reversed, while Rest
is the rest of the list
with focus on its first element. That is, list [1,2,3,4,5]
with focus on 3
would be represented as a rule
2,1],[3,4,5]). tape([
in the database.
We can create a new tape in its initial state with
:- abolish(tape/2),
tape_reset asserta(tape([],[0])).
by deleting existing tape/2
facts from the database, and
asserting a single one with the desired initial state. Think
about how tape([],[0])
goes from being a data value to
an active rule: it goes from data to code thanks to… code.
This is what Prolog’s homoiconicity is all about.
The rest of the implementation should be straightforward,
but I’ll provide two intereseting examples. To change
the element in focus placing V
instead
V) :- tape(Tnorf,[_|Rest]),
tape_poke(abolish(tape/2),
asserta(tape(Tnorf,[V|Rest]).
First query the database for any tape/2
fact (the single one);
unification will set Tnorf
and Rest
, ignoring the element
currently in focus. Purge tape/2
from the database. Create
a new tape/2
building a list with the parts we saved,
putting the new element in focus, and add it to the database.
This «side-effects» in Prolog is heavily used for self-improving
programs, since the 1980’s.
The other interesting predicate is the one to move to the
right. We don’t have Haskell’s NonEmpty
to ensure that the
rest of the list is always non empty, forcing us to consider
multiple cases:
:- tape(Tnorf,[]), % (1)
tape_right abolish(tape/2),
asserta(tape(Tnorf,[0])), !.
:- tape(Tnorf,[H]), % (2)
tape_right abolish(tape/2),
asserta(tape([H|Tnorf],[0])), !.
:- tape(Tnorf,[H|Rest]),
tape_right abolish(tape/2),
asserta(tape([H|Tnorf],Rest)), !.
The rest of the list is already empty. Extend it with an extra
[0]
.The rest of the list would be empty if we move right. Move right and extend it with an extra
[0]
.If we got here, there would be at least one element in the list if we move right, so just move right.
Predicates tape_left
and tape_peek
are left as an exercise
to the reader, as well as tape_show
affording this behavior,
useful for debugging purposes.
?- tape_reset.
true.
?- tape_right, tape_poke(42), tape_right, tape_right, tape_poke(23), tape_left, tape_left.
true.
?- tape_show.
0,<,42,>,0,23]
[true.
We can assemble without transformers
The Haskell solution took advantage of Monad transformers to carry the immutable program alongside the mutable tape, for the interpreter to use and effect. Prolog doesn’t have monad transformers, so we’ll use the database to hold the program.
Recall the parser predicate source/2
succeeds having its second
argument unified with a list of p/2
describing the program’s
instructions. That’s data. We can transform that data into rules
by storing them into Prolog’s database. Since each one of those
newly stored facts has a different integer index as first argument,
they effectively become O(1)
to search – we don’t need arrays
when we have efficient database access.
We write
P) :- abolish(p/2),
asm(P), !.
load_asm(
I|Is]) :- assertz(I), load_asm(Is), !.
load_asm([:- !. load_asm([])
so that proving asm(+P)
, when P
is a list produced by parsing,
will purge the database from any previously loaded p/2
facts, and then add the new ones as facts in the database.
Using assertz/1
adds a functor as a fact, at the last position
for the particular functor. This translate into all p/2
facts
being added in the same order they have in the list.
?- source('hw.b',Instructions), asm(Instructions).
Instructions = [p(1, incr), p(2, incr), p(3, incr), p(4, incr), p(5, incr), p(6, incr), p(7, incr), p(8, incr), p(..., ...)|...] .
?- listing(p/2).
:- dynamic p/2.
1, incr).
p(2, incr).
p(3, incr).
p(4, incr).
p(5, incr).
p(6, incr).
p(7, incr).
p(8, incr).
p(
...?- p(42,What).
What = jbak(12).
Again, our progam turned data into a program dynamically, and we can quickly query the new rules at will. Neat.
Having both tape and program stored in Prolog’s database, the final task is writing the actual interpreter. An additional deterministic predicate is needed to peform the i-th instruction, effecting whatever state changes to the tape, and moving on to the next instructions. That is, we want
+Current,+Statement,-Next) perform(
covering every possible statement functor.
Implementing for next
is unsurprisingly simple
I,next,N) :- tape_right, N is I+1, !. perform(
just move the tape, continue with the next instruction.
Implementing for decr
is also straightforward
I,decr,N) :- tape_peek(V),
perform(V1 is V - 1,
V1),
tape_poke(N is I + 1, !.
Implementing jumps is slightly more intesting. They are Brainfuck’s conditionals, but since there are no explicit selectors in canonical Prolog, we need two clauses:
_,jfwd(D),D) :- tape_peek(0), !.
perform(I,jfwd(_),N) :- N is I + 1, !. perform(
Prolog considers cluases in order of appearence. The first rule
looks at the tape expecting a 0. Remember that arguments can be
in, out, or both? The implementation for tape_peek/1
is based
on unification: if we call it with a free variable we «get»
(out) the value from the tape; but if we call it with a bound
variable or concrete value it has to match (in) what’s in the tape!
So, the only way the first clause succeeds is when the tape focus
is currently over a 0
, and so the next instruction will be
the destination of the jump. If there isn’t a 0
in focuse,
tape_peek(0)
fails, Prolog will backtrack, and continue
trying the second rule. If that happens, it’s a straight continue
to the next instruction. Both rules have a cut at the end,
to ensure that one, and only one, will succeed. This is how
you implement a selector’s behavior (if-then-else
or case
)
in Prolog.
Less assembly required…
After writing the remaining predicates needed to implement each instruction’s behavior, we can create an abstract «Brainfuck microcode» simulation predicate.
There’s nothing to it
F) :- source(F,P), asm(P), tape_reset, go(1), !.
run(
I) :- p(I,Action), perform(I,Action,Next), go(Next), !.
go(_) :- !. go(
We can now run(F)
. When F
is unified with a Brainfuck source
file name:
Prove
source(F,P)
to read and parse into a list of instructionsP
.Prove
asm(P)
to transform the program instructionsP
(data) into Prolog database facts (program rules).Prove
tape_reset
to set the initial tape state.Prove
go(1)
to perform the first instruction. Predicattego/1
queries the database to find theI
-th instruction, performs it, and continues to the following one, either in sequence or as the result of a jump.
The cut at the end of run/1
is natural: there’s only one way
to parse and run a program. If any step fails, the run fails, and
once the run is done, it’s done.
Note the first clause for go/1
has a cut and is tail-recursive.
The tail-recursive part is to keep going as long as there’s
something to do. The cut at the end means the Prolog interpreter
does not need to keep intermediate stack frames to backtrack.
Is almost like a closed loop or while (true)
. If you write
a non-terminating Brainfuck program, it will hang, as it should.
Having that cut also means that after «jumping» to the next
instruction, failure to continue will not make Prolog backtrack
to retry «failed» instructions: execution has to be deterministic.
So, when the first clause tries to continue to an invalid
instruction, the first clause for the recursive call to go/1
will not find it in the database. It will fail, and try the second
clause. But the second clause accepts any instruction index,
immediately stoping execution. Not unlike a break
out of
a while (true)
.
Now we have a convenient way to run Brainfuck programs
$ swiplWelcome to SWI-Prolog (threaded, 64 bits, version 9.0.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.
For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).
?- consult('pbf.pro').
true.
?- run('hw.b').
Hello World!
true.
All in all, it took me about half and hour to write this Brainfuck interpreter in 80 lines of code, pragmas and all. There are two reasonable improvements that can be made, left as exercises:
Note there’s no need to build an intermediate list with the
instructions: the parser can be amended to asserta/1
as it
goes. That enhancement will make a successful parse end with
the program already loaded in the database, while a failed
parse will end up with a partial program that could be used
for syntax error detection. This will also allow processing
larger Brainfuck programs, as the program won’t pay the price
of large intermediate lists and their garbage collection.
Finally, an extra first argument for the name of the program
to both p/2
and tape/2
. That way, it would be possible
to parse and load multiple Brainfuck programs over the same
database, and run them at will, independently.
Same but different
The analyze-synthesize mentality is always there. Understanding and embracing the declarative paradigm, and the features that make Prolog a vehicle for language processing is key to avoid being a competent functional (imperative) programmer writing mediocre Prolog. I lose the strong static-type safeguards, but gain homoiconicity and expedited symbol processing.
Yes, I had to use «impure» Prolog. That’s what we call the subset of Prolog that requires cuts and using the database as mutable store. But that’s the nature of the language we are trying to implement: mutates a tape, running requires a state.
Very few high-level languages afford such facilities for quick prototyping of lexing, parsing, mutable state, and «data is code, code is data». LISP and derivatives were the first ones to allow it, though keeping you closer to a perennial AST. It has been successful for over 70 years now, with stubborn people trying to reimplement LISP and failing miserably (I’m looking at you, JavaScript, Ruby, and Python fans).
Prolog replaced LISP’s ubiquitous lists with a generic formulation for recursive structures having arbitrary arity (the functor). Singletons, tuples, lists, trees, and graphs, end up having the same underlying representation, and can be constructed and deconstructed efficiently. Programming and meta-programming are extremely easy and there’s no other language remotely close. Prolog derivatives have essentially the same mechanics, with extra builtin predicates so you don’t have to write them by hand.
Prolog has been used to parse non-deterministically for natural language processing or expert systems. These would use DCGs with less or without cuts, to explore alternative parsings for ambiguous languages. Prolog has also been used to prototype compiler optimization techniques, as most of them are just symbolic manipulation engines.