6. Compiling Little Languages

Interpret Now, Compile for Later

Functions and a Stack Machine

Chapter two discussed how to write grammars and example 2/12 presented a typical grammar for a comma-separated list of arithmetic expressions:

list: sum [{ ',' sum }];
sum: product [{ '+' product | '-' product }];
product: signed [{ '*' signed | '/' signed }];
signed: [ '-' | '+' ] term;
term: Number | '(' sum ')';

Chapter three and chapter four explained that a grammar, token patterns, and the classes of the EBNF module are all it takes to implement lexical and syntax analysis for sentences conforming to the grammar, in this case to recognize comma-separated lists of arithmetic expressions.

Chapter five introduced Action methods (which we will just call actions from now on) to interact with syntax analysis and produce useful output rather than deeply nested lists.

This chapter looks at actions to evaluate arithmetic expressions or to translate them into different representations more suitable for interpretation, etc. Many examples reuse actions; all classes are available from the module Six which is built into the practice page.

Arithmetic expressions are at the core of programming languages. This chapter also presents a grammar for a little language, actions to compile the language into JavaScript functions, and actions to compile the language for execution on a stack machine which will be simulated in JavaScript.

Immediate Evaluation

It makes sense to add a few more rules to the grammar above because they provide hooks for actions. For example, there is no point in letting syntax analysis distinguish addition and subtraction, only to have to check for + or - a second time when it comes to evaluation. Example 6/01 recognizes the same list of arithmetic expressions:

Press to represent and check the grammar and
press to perform syntax analysis;
compare to the original grammar in example 2/12 to see that the additional rules result in more deeply nested lists.

Here is the modified grammar:

list:     sum [{ ',' sum }];
sum:      product [{ add | subtract }];
add:      '+' product;
subtract: '-' product;
product:  signed [{ multiply | divide }];
multiply: '*' signed;
divide:   '/' signed;
signed:   [ '-' ] term;  
term:     number | '(' sum ')';
number:   Number;

The goal is to immediately evaluate arithmetic expressions, i.e., each action should return the value which the corresponding rule has recognized. This results in obvious actions for the last three rules:

class Eval02 {
  // list:     sum [{ ',' sum }];
  
  /** `sum: product [{ add | subtract }];` */
  sum (product, many) { puts(g.dump(product)); }

  // add:      '+' product;
  // subtract: '-' product;
  // product:  signed [{ multiply | divide }];
  // multiply: '*' signed;
  // divide:   '/' signed;

  /** `signed: [ '-' ] term;` */
  signed (minus, term) { return minus ? - term : term; }

  /** `term: number | '(' sum ')';` */
  term (...val) { return val.length == 1 ? val[0] : val[1]; }

  /** `number: Number;` */
  number (number) { return parseInt(number, 10); }
}

Because of the placeholder for sum (line 5) very simple expressions can already be tested in example 6/02:

Press to represent and check the grammar and
press to perform syntax analysis and invoke the actions.
Toggle and press again to see how the values are computed by the actions. Why is so much output undefined? Why are some numbers followed by commas in the output?

The next three actions are tricky but representative for any evaluation based on this kind of a grammar:

class Eval03 extends Six.Eval02 {
  // list:     sum [{ ',' sum }];
  // sum:      product [{ add | subtract }];
  // add:      '+' product;
  // subtract: '-' product;

  /** `product: signed [{ multiply | divide }];` */
  product (signed, many) {
    return (many ? many[0] : [ ]).
      reduce((product, list) => list[0](product), signed);
  }

  /** `multiply: '*' signed;` */
  multiply (_, right) { return left => left * right; }

  /** `divide: '/' signed;` */
  divide (_, right) { return left => left / right; }

  // signed:   [ '-' ] term;
  // term:     number | '(' sum ')';
  // number:   Number;
}

Multiplication and division are left-associative operations, i.e., they are evaluated from left to right by reduce() in the action for product (line 10 above). Therefore, the multiply and divide actions have to return functions which will take a left argument, combine it with the right argument which the multiply and divide rules have recognized and evaluated, and return the result of the computation (lines 14 and 17).

The product action gets a first signed value which the signed rule has already evaluated, and it gets many of these functions for the remainder of the chain of products (line 9).

The chain has to be processed as discussed in the Idioms in chapter five using reduce(): If there are no functions, the first signed value is the result. Otherwise the callback function (line 10) is applied from left to right and each time applies a multiplication or division to the previous product, starting with signed.

Again, it might be a good idea to try this stage in example 6/03:

Press to represent and check the grammar and
press to perform syntax analysis and invoke the actions.

It seems to work at first, but 2 / (3*4) produces the unexpected outputs 12 and NaN.

Toggle and press again to see how the values are computed. Where do the two lines of output 12 and NaN come from? Which action has to be repaired?

Check out the complete immediate expression evaluation in example 6/04. Addition and subtraction use the same pattern as multiplication and division:

class Eval04 extends Six.Eval03 {
  /** `list: sum [{ ',' sum }];` */
  list (sum, many) {
    puts(sum);
    if (many) many[0].forEach(seq => puts(seq[1]));
  }

  /** `sum: product [{ add | subtract }];` */
  sum (product, many) {
    return this.product(product, many);
  }

  /** `add: '+' product;` */
  add (_, right) { return left => left + right; }

  /** `subtract: '-' product;` */
  subtract (_, right) { return left => left - right; }

  // product:  signed [{ multiply | divide }];
  // multiply: '*' signed;
  // divide:   '/' signed;
  // signed:   [ '-' ] term;
  // term:     number | '(' sum ')';
  // number:   Number;
}

sum is used recursively inside parentheses, i.e., it needs to return a value and it cannot output the final result. Therefore, the action for the start rule has to display each top-level sum (lines 3 to 6 above).

Functional Evaluation

Returning functions for some arithmetic operations suggests that the entire evaluation could be delayed — every action returns a function, the result is stored as an "executable" and can be evaluated (executed) several times over. This is more useful if simple variables are added to the grammar in example 6/05.

There has to be a token for names:

Name: /[a-z]+/

There has to be a rule to recognize a name

name: Name;

and the rule for term has to recognize a name just like a number:

term: number | name | '(' sum ')';

The action for name (lines 27 to 30 below) has to return a value. For now, a local object memory maps names to values (line 29) and it is generated if needed (line 28).

class Functions05 extends Six.Eval04 {
  #parser;                                    // for error messages
  get parser () { return this.#parser; }

  constructor (parser) { super(); this.#parser = parser; }
  
  // list: sum [{ ',' sum }];
  
  /** `sum: 'let' Name '=' sum | product [{ add | subtract }];` */
  // arg          [1]      [3]   [0]     [1]
  sum (... arg) {
    if (arg.length < 4) return this.parser.call(this, super.sum, arg[0], arg[1]);
    if (!this.memory) this.memory = { };
    return this.memory[arg[1]] = arg[3];
  }
  
  // add: '+' product;
  // subtract: '-' product;
  // product:  signed [{ multiply | divide }];
  // multiply: '*' signed;
  // divide: '/' signed;
  // signed: [ '-' ] term;
  // term: number | name | '(' sum ')';
  // number: Number;

  /** `name: Name;` returns value or `0` */
  name (name) {
    if (!this.memory) this.memory = { };
    return name in this.memory ? this.memory[name] : 0;
  }
}

This is only useful if there is a way to assign a value to a name. Therefore, the rule for sum is modified as shown in the comment (line 9 above).

The action for sum defers to the superclass for simple sums (line 12) or it creates memory if needed (line 13) and performs the assignment (line 14).

Check this stage out in example 6/05:

Press to represent and check the grammar and
press to evaluate the expressions in the .

let x = 3,
(x + 1) / (y - 2) * 3,
y + (x + 1) / ((let y = 1) - 2) * 3

The assignment in line 1 returns 3. So far, y is undefined, i.e., zero; therefore, line 2 produces -6. In line 3, y is set to 1 but only once the denominator is evaluated, i.e., the output is -12 and not -11.

At this point, expressions are evaluated as they are recognized and the top-level list action returns undefined to the Parser's parse() method which returns it to be shown in the output area.

If every action returns a function rather than a value we can execute the top-level function more than once. This is interesting if there is input during execution; therefore, example 6/06 adds input as a "reserved" variable name, optionally with a default value:

term: input | number | name | '(' sum ')';
input: 'input' [ Number ];

Here are the first few modified actions which return functions:

class Functions06 extends Six.Functions05 {
  // sum: product [{ add | subtract }];
  // add: '+' product;
  // subtract: '-' product;
  // product: signed [{ multiply | divide }];
  // multiply: '*' signed;
  // divide: '/' signed;
  // signed: [ '-' ] term;
  // term: input | number | name | '(' sum ')';

  /** `input: 'input' [ Number ];` returns fct */
  input (_, number) {
    const dflt = String(number !== null ? number[0] : 0);
    return () => parseInt(prompt('input', dflt), 10);
  }

  /** `number: Number;` returns fct */
  number (number) {
    const result = parseInt(number, 10);
    return () => result;
  }

  /** `name: Name;` returns fct */
  name (name) {
    return memory => name in memory ? memory[name] : 0;
  }
}

The number action returns a function returning a constant value. The action optimizes and converts the string only once (line 19 above). The returned function takes advantage of a closure (line 20).

The name action returns a function which checks memory at run time, i.e., it expects to receive memory when the executable is run (line 25).

Finally, the input action returns a function which invokes prompt() and converts the resulting string to a number if possible (line 14); parseInt() returns NaN if it cannot convert.

The term action can be inherited because now it juggles functions: the action of every descendant of the rule must deliver a function and the term action picks it out of the list produced by recognition.

If sum and product recognize a single descendant they will just pass on whatever the descendant's action returns, i.e., if sum is the start rule the actions can be inherited and one simple expression such as input will be compiled into a JavaScript function.

Again, it might be a good idea to try this stage in example 6/06:

Press to represent and check the grammar and
press to perform syntax analysis and invoke the actions, i.e., to compile simple programs such as 10 or (input).
Press to execute the compiled program.
Note that a program consisting only of a variable name such as x is compiled into a function which expects memory as an argument, i.e., execution with will fail:

> run = g.parser().parse(program, actions)
memory => name in memory ? memory[name] : 0
> run()
memory is not an Object. (evaluating 'name in memory')

Example 6/07 converts the remaining actions so that they return functions. The action for signed is patterned after those for term and name:

class Functions07 extends Six.Functions06 {
  // ...
  signed (minus, term) {
    return minus ? memory => - term(memory) : term;
  }

If there really is a sign change, there has to be a new function, otherwise the old function will do.

All of these functions have to accept memory because term could be the function created by name and, therefore, require that argument.

Actions for add, subtract, multiply, and divide again are tricky, for example:

  multiply (_, right) {
    return left => memory => left(memory) * right(memory);
  }

multiply expects two functions, left and right, each with an argument memory, to return the left and right operand value for the multiplication.

The function right is handed to the multiply action as the result of recognizing the right operand, i.e., it is the result of the signed action and as such expects memory as argument.

The multiply action returns a function which expects a function left as an argument and returns the function which expects memory as an argument and produces the result of the multiplication, i.e., a function which fits the pattern of signed and name.

This complicates the product and sum actions a bit, for example:

  product (signed, many) {
    const c = (a, b) => b(a);  // function composition
    return (many ? many[0] : []).
      reduce((product, list) => c(product, list[0]), signed);
  }

The action for product fits the idiom discussed in chapter five: there is one function signed and there can be many more. The functions have to be composed with reduce() and the callback function (line 2) has to compose two functions:

a is a function which expects memory as an argument, i.e., a function like the one returned by the action for signed.
b is a function which expects something like a as an argument, i.e., a function like the one returned by the action for multiply.
As discussed for multiply above, b(a) is in fact a function.

The result of reduce() is the last function produced with the composition c(), i.e., it is a function which can be applied to memory to carry out the calculations required to evaluate the product. Note that reduce() is executed at compile time, not at run time.

The actions for divide, add, and subtract follow the pattern of multiply. The action for sum either produces a function to implement the assignment or it delegates to the action for product:

  sum (... arg) {
    if (arg.length == 4)
      return memory => memory[arg[1]] = arg[3](memory);
    else
      return this.product(arg[0], arg[1]);
  }

Finally, the action for list has to return the executable, i.e., a function which does not accept an argument and displays the result of each sum in the comma-separated list:

  list (sum, many) {
    const list = [ sum ].
      concat(many ? many[0].map(seq => seq[1]) : [ ]);
    return () => {
      const memory = { };
      puts(... list.map(fct => fct(memory)));
    }
  }
}

The list of functions to be called can be produced during recognition (lines 2 and 3 above) based on the idiom discussed in chapter five. The executable creates memory (line 5), uses map() to produce an array of results, and uses spread syntax to display it on one line, i.e., with a single call to puts() (line 6).

Try the functional expression compiler in example 6/07:

Press to represent and check the grammar and
press to perform syntax analysis and invoke the actions, i.e., to compile a program such as

let x = input 3, (x + 1) / (y - 2) * 3,
y + ((let y = 2) + 1) / (x - 2) + y

Press and execute the program for different inputs.

Stack Evaluation

In this section arithmetic expressions are compiled into code for a stack machine.

Example 6/08 contains new actions for the single expression grammar from example 6/06 which simply output the operation to be performed:

class Postfix08 {
  // sum: product [{ add | subtract }];
  
  /** `add: '+' right;` */
  add (_, r) { puts('add'); }

  /** `subtract: '-' right;` */
  subtract (_, r) { puts('subtract'); } 
  
  // product: signed [{ multiply | divide }];

  /** `multiply: '*' right;` */
  multiply (_, r) { puts('multiply'); }

  /** `divide: '/' signed;` */
  divide (_, r) { puts('divide'); } 

  /** `signed: [ '-' ] term;` */
  signed (minus, t) {  if (minus) puts('minus'); }

  // term: input | number | name | '(' sum ')';

  /** `input: 'input' [ Number ];` */
  input (i, n) { puts('input'); }

  /** `number: Number;` */
  number (number) { puts(number); }

  /** `name: Name;` */
  name (name) { puts(name); } 
}

There are no actions for sum, product, and term. The program

  (x + 1) / (input - -2) * 3

produces the output (one item per line)

x 1 add input 2 minus subtract divide 3 multiply

This is known as Reverse Polish Notation — operators follow their operands and no parentheses are needed as long as it is clear how many operands belong to each operator. This notation can be viewed as machine language for a stack machine:

operands are pushed onto the stack,
operators are applied to the top entries on the stack,
pop the entries off the stack, and
push the result onto the stack.

The example suggests that syntax analysis with a grammar should make it near trivial to produce stack machine code to evaluate an arithmetic expression.

Example 6/09 shows how to generate "code" and simulate a stack machine in JavaScript. The grammar from example 6/07 can be used but the start rule has to be modified to avoid collecting all results of sum on the stack:

list: stmt [{ ';' stmt }];
stmt: sum;
sum: 'let' Name '=' sum | product [{ add | subtract }];

A program is a list of statements, a statement is a sum and can start with an assignment prefixed by let to avoid an ambiguity. In the list above, statements are semicolon-separated. Alternatively

list: { stmt ';' };

would make statements semicolon-terminated.

Here is a small program:

let x = let y = input 4; (x + 1) / (y - 2) * 3;
y + ((let y = 2) + 1) / (x - 2) + y

Note that assignment is right-associative because it involves right recursion.

It turns out to be a good idea to encapsulate the infrastructure for a stack machine as a class to separate it from the compiler:

class Machine09 {
  code = [ ];                             // holds the instructions
  
  /** Represents `code` as text */
  toString () {
    return this.code.map((f, n) => n + ': ' + f).join('\n');
  }

  /** Creates stack machine */
  run (memorySize) {
    return () => {
      const memory = Array(memorySize).fill(0);    // create memory
      this.code.forEach(code => code(memory));           // execute
      return memory;
    };
  }
}

A program is compiled into an array code[] of JavaScript functions (line 2 above) which simulate the instructions of the stack machine. The method toString() can be used to display the instructions (line 5).

The method run() creates and returns the stack machine interpreter (line 10) which must be a parameterless function.

Runtime values are stored in an array memory[] which contains the variables' values followed by the stack. Thankfully, JavaScript imposes no restrictions on array length. memory[] is created when the stack machine starts (line 12) and manipulated by the JavaScript functions which simulate the instructions of the stack machine (line 13). At this point, each instruction in code[] is executed sequentially just once.

The actions need access to the stack machine infrastructure and to a symbol table which maps variable names to memory addresses. The actions class has construction parameters so that future subclasses can extend the machine infrastructure and/or the symbol table:

class Arithmetic09 {
  #parser;                                    // for error messages
  get parser () { return this.#parser; }
  #machine;                                    // handles execution
  get machine () { return this.#machine; }
  #symbols = new Map();    // symbol table, maps names to addresses
  get symbols () { return this.#symbols; }
  
  constructor (parser, machine = new Machine09 ()) {
    this.#parser = parser;
    this.#machine = machine;
  }
  // ...

Every variable has an address that is determined at compile time, i.e., during syntax analysis, and stored in the symbol table. The symbol table is managed by a method _alloc() which assigns new addresses to as yet unknown names:

  /** Returns memory address for name */
  _alloc (name) {
    let addr = this.symbols.get(name);               // known name?
    if (typeof addr == 'undefined')      
      this.symbols.set(name,                            // new name
        addr = this.symbols.size);       // allocate, starting at 0
    return addr;
  }

With all of this in place, the actions for the operands are easy to write:

  /** `input: 'input' [ Number ];` */
  input (_, number) {
    const dflt = String(number !== null ? number[0] : 0);
    this.machine.code.push(
      memory => memory.push(parseInt(prompt('input', dflt), 10))
    );
  }

  /** `number: Number;` */
  number (number) { 
    const result = parseInt(number, 10);
    this.machine.code.push(memory => memory.push(result));
  }

  /** `name: Name;` */
  name (name) {
    const addr = this._alloc(name);
    this.machine.code.push(memory => memory.push(memory[addr]));
  }

The input action creates a machine instruction which uses prompt() to obtain a string, converts it to a number, and pushes the value onto the stack (line 5 above).

The number action "compiles" the numerical value of the string collected by the token (line 11 above) and stores a machine instruction as generated code which will push this numerical value onto the stack (line 12).

The name action consults _alloc() for the address of the variable in memory[] (line 17) and creates a machine instruction to push the value of the variable onto the stack (line 18).

Moving on to the arithmetic operations:

  /** `add: '+' right;` */
  add (_, r) { 
    this.machine.code.push(
      memory => memory.splice(-2, 2, memory.at(-2) + memory.at(-1))
    );
  }

The add action is a typical example. The machine instruction has to add the top two values on the stack and replace them by the single result value — a simple job for the splice() method.

Just as in example 6/08 there are no actions for term and product because their results are already on the stack, but sum has to implement the embedded assignment, if any:

  /** `sum: 'let' Name '=' sum | product [{ add | subtract }];` */
  sum (...val) {
    if (val.length < 4) return;
    const addr = this._alloc(val[1]);
    this.machine.code.push(memory => memory[addr] = memory.at(-1));
  }

_alloc() finds or creates an address (line 4 above) and the machine instruction copies the value from the top of the stack to that address in memory[] (line 5). Note that the value remains on top of the stack.

The action for stmt generates a machine instruction to display the result of a top-level sum (or assignment) as it is popped off the stack (line 13 below).

The action for the start rule list is executed after all statements have been compiled. It displays the results of the compilation (lines 3 to 7) and calls run() to create the stack machine function (line 8):

  /** `list: stmt [{ ';' stmt }];` */
  list (s, many) {
    puts(this.machine.toString());                     // show code
    this.symbols.forEach(                         // show variables
      (value, name) => puts(name, 'at', value));
    const size = this.symbols.size;          // number of variables
    puts('stack starts at', size);
    return this.machine.run(size);                 // stack machine
  }

  /** `stmt: sum;` */
  stmt (s) {                                // print and clear stack
    this.machine.code.push(memory => puts(memory.pop()));
  }
}

The compiler is complete but it consist of two classes. The has to evaluate to a single class or object with the Action methods for the parser. It takes a little trick to accomplish this:

(() => {  // define and immediately use an anonymous function
  class Machine09 { ... }
  class Arithmetic09 { ... }
  return Arithmetic09;
}) ()

The two class definitions are placed into an anonymous function where return delivers the actual value for the actions (line 4) and the anonymous function is called immediately after being defined (line 5).

It is instructive to see the actual code. For example a - - b compiles into

> run = g.parser().parse(program, actions)
0: memory => memory.push(memory[addr])
1: memory => memory.push(memory[addr])
2: memory => memory.splice(-1, 1, -memory.at(-1))
3: memory => memory.splice(-2, 2, memory.at(-2) - memory.at(-1))
4: memory => puts(memory.pop())
a at 0
b at 1
stack starts at 2

The output shows the machine instruction functions but not the variable addresses because the functions use closure to capture the addresses computed by _alloc(). Try out example 6/09:

Press to represent and check the grammar,
press to compile the program in the , and
press to execute the stack machine.

The output shown above is created by converting the values in code[] from JavaScript functions to strings, i.e., the output shows the actual JavaScript functions which simulate what hardware instructions would do to memory[]. As such, the output is precise but a bit hard to read.

Example 6/10 is a slightly modified version of example 6/09 which makes the generated machine instructions easier to read. The following output, again for a - - b, still consists of JavaScript functions converted to strings, but it looks like it is closer to the reverse Polish notation introduced at the beginning of this section:

> run = g.parser().parse(program, actions)
0: memory => this.Load(0)(memory)
1: memory => this.Load(1)(memory)
2: memory => this.Minus(memory)
3: memory => this.Subtract(memory)
4: memory => this.Pop(memory)
a at 0
b at 1
stack starts at 2

The infrastructure for the stack machine is extended. In Machine10 the machine instructions are defined using methods so that they are more meaningful when displayed as strings, for example:

class Machine10 extends Six.Machine09 {
  // ...
  /** `stack: ... a b -> ... a-b` */            Subtract (memory) {
    memory.splice(-2, 2, memory.at(-2) - memory.at(-1));
  }

  /** `stack: ... -> ... memory[addr]` */             Load (addr) {
    return memory => memory.push(memory[addr]);
  }
}

code[] is local to the Machine10 object and contains the functions shown in the output above where this refers to the Machine10 object because the stack machine executable is defined in the run() method as an arrow function, i.e., it inherits this and uses it when interpreting code[].

As a result, this.Subtract(memory) acts just like a simple function with memory as argument (lines 3 to 5 above). this.Load(0)(memory) (line 2) is more costly because it generates the function to manipulate memory during execution (line 8) — the price to pay for the mnemonic display — but the address argument is computed at compile time.

Actions generate code as follows:

class Arithmetic10 extends Six.Arithmetic09 {
  constructor (parser, machine = new Machine10()) {
    super(parser, machine);
  }
  // ...

  /** `subtract: '-' product;` */
  subtract () { this.machine.gen('Subtract'); }

  /** `name: Name;` */
  name (name) {
    this.machine.gen('Load', this._alloc(name));
  }
}

Arithmetic10 uses Machine10 unless a subclass should decide otherwise (line 2 above). Code generation is implemented in Machine10 as a method gen() (line 2 below) which takes the instruction name and an optional list of values, calls the method ins() to produce the machine instruction, and then adds the function to code[]:

  /** returns `code.length` */
  gen (name, ... args) {
    return this.code.push(this.ins(name, ... args));
  }

  /** returns instruction function */
  ins (name, ... args) {
    return args.length ?
      eval(`memory => this.${name}(${args.join(', ')})(memory)`) :
      eval(`memory => this.${name}(memory)`);
  }

The operations on memory are still the same JavaScript functions, but they are wrapped into outer functions with expressive names.

ins() uses eval() (lines 9 and 10 above) to perform closure so that, e.g., the address used in a Load instruction is visible when the function is converted into a string for display purposes. Using the Function constructor would be safer; however, the resulting function then has no display name.

Unfortunately, extending to Arithmetic10 and Machine10 results in a lot of unused code:

All machine instruction functions have been copied from Arithmetic09 into Machine10.
All actions have been overwritten in Arithmetic10, only _alloc() has been inherited.

This could have been avoided if the mnemonic display of the instructions had been introduced immediately.

Control Structures

In this section arithmetic expressions will be extended with typical control structures and implemented for the stack machine. Changes to the grammar can be seen on this page, new stack machine and action method classes can be seen in the method browser:

press show to see the methods, sorted alphabetically,
press by class or mix-in to sort by module, class or mix-in, and finally method name,
select individual classes and/or methods to see less.

Thus far the stack machine used forEach() to process the machine instructions in code[] just once, one after another. Real machines have a program counter which is advanced as an instruction is fetched from memory. Here is an idea for a modification to the stack machine:

run (size, startAddr) {                     // create stack machine
  return () => {
    memory.pc = startAddr;            // initialize program counter
    ...

Once properties are added to memory to simulate machine registers such as a "program counter" there can be simulated branch instructions, for example:

class Machine11 extends Six.Machine10 {
  // ...
  Branch (a) {                           // stack: ... -> ... | pc: a
    return memory => memory.pc = a;
  }
  Bzero (a) {                // stack: ... bool -> ... | pc: !bool? a
    return memory => { if (!memory.pop()) memory.pc = a; }
  }

If this.Branch(a)(memory) is executed the stack machine takes the next instruction from address a.

If this.Bzero(a)(memory) is executed the stack is checked and popped, and if the top value was zero the stack machine takes the next instruction from address a.

It is helpful if the stack machine can display instructions as they are executed, and if tracing can be controlled from within a program, e.g., tracing could be unconditional or depend on the current value of a variable. Machine11 adds a method trace() which accepts true to create an unconditional tracing function (lines 3 to 5 below) or a numerical memory address to create a function which traces if the current value in memory[address] is non-negative (lines 6 to 10):

  /** Returns trace function, if any */
  trace (address) {
    if (address === true)                    // unconditional trace
      return (memory, pc) =>            // traces instruction at pc
        puts(memory.toString(), pc+':', this.code[pc].toString());
    if (typeof address == 'number') // address of control variable?
      return (memory, pc) => {          // traces instruction at pc
        if (memory[address] >= 0) // variable at addr non-negative?
          puts(memory.toString(), pc+':', this.code[pc].toString());
      };
  }

  get Memory () { return this.#Memory ??= class extends Array {
      toString () { return '[ ' + this.join(' ') + ' ]'; }
    };
  }
  #Memory;

Either function displays the instruction stored in code[pc]. Each trace output includes a display of the current memory contents produced by a locally modified toString() method (lines 13 to 17 above). Eventually, memory[] will be more than just a linear list and toString() can be overwritten again.

The stack machine interpreter, i.e., the executable returned by run(), still creates memory[] and initializes the variables to zero but it now has to consider the program counter and tracing:

  run (size, startAddr = 0, traceAddr) {
    let t;                      // [closure] trace function, if any
    const StackMachine = (memory, steps) => {
      if (!memory) {                                 // initialize?
        if (steps) t = this.trace(true);  // steps? permanent trace
        else {                           // no steps: don't suspend
          t = this.trace(traceAddr); steps = Infinity;
        }
        memory = new this.Memory(size).fill(0);    // create memory
        memory.pc = startAddr;        // initialize program counter
        t && puts(memory.toString());             // initial memory
      }
      while (steps -- && memory.pc < this.code.length) {  // steps?
        const pc = memory.pc ++;         // advance program counter
        this.code[pc](memory);            // execute at previous pc
        t && t(memory, pc);           // trace executed instruction
      }
      memory.continue = memory.pc < this.code.length;     // again?
      return memory;
    };
    return (memory, steps) => StackMachine(memory, steps);
  }

run() is called with the size of the initialized part of memory, the start address for the program counter, and an address to control tracing if any (line 1 above).

The resulting StackMachine accepts two optional arguments, the memory[] array and steps, the number of instructions to execute (line 3), and eventually returns the modified memory[] (line 19).

StackMachine() will create memory[] and run without limit.
StackMachine(null, 10) will create memory[] and execute and trace 10 instructions.
StackMachine(memory, 10) will then execute another 10 instructions.

Initialization happens if there is no memory[] (line 4):

Permanent tracing is set up if steps is defined and not zero (line 5), otherwise tracing depends on the control address handed to run() and there is no limit on steps (line 7).
memory[] is created and zero-filled (line 9), the program counter at memory.pc is initialized (line 10), and the initial memory[] is shown if there is tracing (line 11).

Instructions are executed as long as there are steps left and if the program counter does not point beyond code[] (line 13). The program counter is incremented (line 14), the instruction is executed (line 15) and traced if tracing was set up (line 16).

Once there are no more steps, a property is set to indicate if more steps could be executed (line 18) and memory[] is returned.

These modifications do not change the basic behavior of the stack machine, i.e., arithmetic expressions from example 6/09 — but for embedded assignment — can be part of a little language that has control structures.

Example 6/11 uses Euclid's algorithm to compute the greatest common divisor of two numbers with a trace of one of the subtractions:

x = input 36; y = input 54;
while x <> y do
  if x > y then x = x - y
  else y = y - x
  fi
od;
print x

Press to represent and check the grammar,
press to perform syntax analysis and generate code, and
press to execute the program, reply 36 and 54 to the requests for input, and see the output:

> run()
18
[ 18 18 ]

The program requires that the grammar from example 6/09 is extended to support control structures:

prog:     stmts;
stmts:    stmt [{ ';' stmt }];
stmt:     assign | print | loop | select;
assign:   Name '=' sum;
print:    'print' sums;
sums:     sum [{ ',' sum }];
loop:     While cmp Do stmts 'od';
While:    'while';
Do:       'do';
select:   'if' cmp Then stmts [ Else stmts ] 'fi';
Then:     'then';
Else:     'else';
sum: ...

This grammar makes a clear distinction between statements (stmt), expressions (sum), and conditions (cmp). This avoids a possible ambiguity between an expression and an assignment: without let both can start with a variable name.

This grammar for control structures uses terminating literals to avoid ambiguity issues like the dangling else problem presented in example 4/05. Some control structure literals such as 'while' have been turned into simple rules so that actions can be attached — this is discussed below.

Conditions could be about as complex as arithmetic expressions; however, for the little language comparisons will suffice, e.g.,

cmp: eq | ne;
eq: sum '=' sum;
ne: sum '<>' sum;

Unfortunately, these rules are ambiguous because both alternatives start with sum. Here is a better approach:

cmp:      sum rel;
rel:      eq | ne | gt | ge | lt | le;
eq:       '=' sum;
ne:       '<>' sum;
gt:       '>' sum;
ge:       '>=' sum;
lt:       '<' sum;
le:       '<=' sum;

While this is patterned after sum and product, this grammar does not allow cascading comparisons (which, e.g., JavaScript does).

Here is the action for one of the comparisons:

class Control11 extends Six.Arithmetic10 {
  constructor (parser, machine = new Machine11()) {
    super(parser, machine);
  }

  /** `eq: '=' sum;` */   eq () { this.machine.gen('Eq'); }

The extended actions class requires additional machine instructions and, therefore, inserts Machine11 during construction (line 2 above). Here is one of the comparison instructions in Machine11:

Eq (memory) {                       // stack: ... a b -> ... a == b
  memory.splice(-2, 2, memory.at(-2) == memory.at(-1));
}

This instruction will replace the top two values on the stack with the result of the comparison, i.e., true or false. Technically, these are very special values in the little language. The syntax does not permit them to be assigned to variables. Instead, they will be consumed by branch instructions for the control structures.

Just like compiling sums, compiling comparisons also amounts to reverse Polish notation, i.e., the rules rel and cmp require no actions.

However, the assign and print statements consume sums and require actions:

  /** `assign: Name '=' sum;` stores and pops stack */
  assign (name, e, s) {
    this.machine.gen('Store', this._alloc(name));
    this.machine.gen('Pop');
  }

  /** `print: 'print' sums;` */
  print (_, sums) { this.machine.gen('Print', sums); }

  /** `sums: sum [{ ',' sum }];` returns number of values */
  sums (sum, many) { return 1 + (many ? many[0].length : 0); }

The machine instructions Store and Pop were already defined in Machine10, i.e., in example 6/10. The assign action generates them to copy the top value on the stack into the memory location for the variable (line 3 above) and to clear the stack at the end of the statement (line 4).

A print statement displays a list of sum values. It receives all arguments on the stack. The number of arguments is counted at compile time (line 11) and compiled into the Print instruction (line 8). At run time the Print instruction pops the arguments off the stack and hands them to puts() for display:

Print (n) {                              // stack: ... n*val -> ...
  return memory => puts(... memory.splice(- n));
}

This little language supports control structures. The actions for loop, While, and Do have to implement the following branch structure:

label	code
`While:`
	`cmp`
`Do:`	`branch if zero to 'od'`
	`stmts`
	`branch to 'While'`
`od:`

This explains what the actions have to do:

  /** `While: 'while';` returns address for branch to `while` */
  While (w) { return this.machine.code.length; }

  /** `Do: 'do';` returns address of slot for bzero to `od` */
  Do (d) { return this.machine.code.push(null) - 1; }

  /** `loop: While cmp Do stmts 'od';` */
  loop (While, _, Do, s, o) {
    const od = this.machine.gen('Branch', While);
    this.machine.code[Do] = this.machine.ins('Bzero', od);
  }

The While action just returns the next address in the generated code (line 2 above).

The Do action allocates a slot in the generated code and returns the address of the slot (line 5) because push() always returns the new array length.

Finally, loop puts it all together: the action adds an unconditional branch from the end of the loop body back to While (line 9) and populates the slot allocated by Do with a conditional branch to od to terminate the loop (line 10).

Yes, the conditional branch at Do fails each time through the loop until it succeeds only once, at the end of the loop. An obvious optimization is to reverse the order of the loop body and the condition in the generated code and cut the number of branch executions in half; however, moving code when syntax analysis generates it directly is not really an option — absolute branches like the ones used here would break.

select is a little harder because there are two possibilities:

label	code with `else`	code without `else`
	`cmp`	`cmp`
`Then:`	`branch if zero to 'Else'`	`branch if zero to 'fi'`
	`stmts`	`stmts`
	`branch to 'fi'`
`Else:`
	`stmts`
`fi:`

The actions have to create the appropriate branch instructions:

  /** `Then: 'then';` returns address for bzero to `else` `fi` */
  Then (t) { return this.machine.code.push(null) - 1; }

  /** `Else: 'else';` creates slot for branch to `fi`,
      returns address of `else` */
  Else (e) { return this.machine.code.push(null); }

  /** `select: 'if' cmp Then stmts [ Else stmts ] 'fi';` */
  select (i, c, Then, s, Else, f) {
    const fi = this.machine.code.length;         // address of 'fi'
    if (Else) {
      Else = Else[0];               // address after branch to 'fi'
      this.machine.code[Then] = this.machine.ins('Bzero', Else);
      this.machine.code[Else - 1] = this.machine.ins('Branch', fi);     
    } else
      this.machine.code[Then] = this.machine.ins('Bzero', fi);
  }

The Then action allocates a slot in the generated code and returns the address of the slot (line 2 above).

The Else action, if called, also allocates a slot in the generated code (for a branch to bypass the else clause) but it returns the address following the slot (line 6).

Finally, the select action checks if there is an argument created by else (line 11) and creates and inserts the branch instructions as described in the pseudo code above.

Code generation is complete. The top-level action for prog shows the generated code and provides tracing if a variable trace is in the program:

  /** `prog: stmts;` returns executable */
  prog (_) {
    const size = this.symbols.size,          // number of variables
      traceAddr = this.symbols.get('trace'); // if a variable named
    if (typeof traceAddr != 'undefined') {     // ...'trace' exists
      puts(this.machine.toString());                   // show code
      this.symbols.forEach(              // show variable addresses
        (addr, name) => puts(`${name} at ${addr}`)
      );
      puts('stack starts at', size);
    }
    return this.machine.run(size, 0, traceAddr);   // stack machine
  }

If the variable exists (lines 4 and 5 above) the generated code and variable addresses are shown (lines 6 to 10). The run() method of Machine11 discussed earlier is called (line 12) to create the stack machine.

In example 6/11

Press to represent and check the grammar,
press to perform syntax analysis and generate code, and
press , , or to step through the program:

> memory = run(null, 1)
0:[ 0 0 ]
[ 0 0 36 ] 0: memory => this.Input(memory)
> memory = run(memory, 1)
[ 36 0 36 ] 1: memory => this.Store(1)(memory)
  ...

Add statements like trace = -1 or trace = 1 to display the generated code and to turn tracing off and on within the program:

trace = -1; x = input 36; y = input 54;
  ...

Again, press to perform syntax analysis and generate code:

> run = g.parser().parse(program, actions)
0: memory => this.Push(1)(memory)
1: memory => this.Minus(memory)
2: memory => this.Store(1)(memory)
  ...
program counter at 0
trace at 1
x at 2
y at 3
stack starts at 4
(memory, steps) => StackMachine(memory, steps)

Now press to execute the program and see the trace:

> run()
0:[ 0 0 0 ]
[ 0 0 0 1 ] 0: memory => this.Push(1)(memory)
[ 0 0 0 -1 ] 1: memory => this.Minus(memory)
18
[ -1 18 18 ]

The stack is cleared at the end of each statement because the value of each sum is assigned or printed and then popped of the stack or two sums are compared and popped and the result of the comparison is tested and popped, i.e., the stack will grow and shrink during expression evaluation but it cannot grow out of bounds. However, the little language can still be used to crash the practice page — just program an infinite loop and don't step execution...

Functional Programming

Fans of functional programming — especially in JavaScript — should enjoy example 6/12, the implementation of the little language in the style of the functional evaluation of arithmetic expressions as discussed previously. Functional programming can be very elegant, but — unlike the stack machine "instructions" — functions created by composition cannot be decomposed for display. At least, can be toggled to see what functions the actions create.

The grammar is almost the same as in example 6/11 — the extra rules for the keywords in loop and select are not needed:

  loop:     'while' cmp 'do' stmts 'od';
  select:   'if' cmp 'then' stmts [ 'else' stmts ] 'fi';

The actions for sum and all the other rules for arithmetic expressions remain unchanged from example 6/07 discussed previously.

Comparisons are implemented just like add, and they return functions which compute the right operand and still need a function to compute the left operand value (lines 9 to 11 below):

class Functions12 extends Six.Functions07 {
  
  /** `cmp: sum rel;` returns fct */
  cmp (sum, rel) { return memory => rel[0](sum)(memory); }

  // rel:      eq | ne | gt | ge | lt | le;

  /** `eq: '=' expr;` returns fct for composition */
  eq (_, right) {
    return left => memory => left(memory) == right(memory);
  }

The grammar does not allow cascading comparisons. Therefore, the action for cmp is a single function composition and does not require reduce() (line 4 above).

Compiling into functions is particularly elegant for control structures. Instead of returning addresses and inserting branch instructions later, the control structures of the little language are simply turned into JavaScript control structures inside functions. Here is the action for loop:

  /** `loop: 'while' cmp 'do' stmts 'od';` returns fct */
  loop (w, cmp, d, stmts, o) {
    return memory => { while (cmp(memory)) stmts(memory); };
  }

The action for select returns one of two functions depending on the presence of else:

  /** `select: 'if' cmp 'then' stmts [ 'else' stmts ] 'fi';` returns fct */
  select (i, cmp, t, stmts, opt, f) { 
    return opt ?
      (memory => cmp(memory) ? stmts(memory) : opt[1](memory)) :
      (memory => { if (cmp(memory)) stmts(memory); });
  }

If else is present, the returned function uses conditional evaluation (line 4 above), otherwise it uses an if statement in JavaScript (line 5).

loop and select are the only places where the result of the cmp action is used. No matter which type JavaScript uses to represent the result of a comparison in the compiled comparisons, the use of this type is consistent with the intent of the compiled control structures — the joy of translating from one language to another...

Assignment now is a statement; therefore, the action for sum is overwritten (to omit assignment there) and a new action for assign creates the function to perform assignment:

  /** `assign: Name '=' sum;` returns fct */
  assign (name, e, sum) {
    return memory => memory[name] = sum(memory);
  }

The stack machine implementation had to pop the sum off the stack, but there is no stack in the functional implementation.

The action for sums creates a list of functions (line 8 below) and the action for print creates a function which elaborates the list to get the values and display them with puts() (line 3):

  /** `print: 'print' sums;` returns function */
  print (p, sums) {
    return memory => puts(... sums.map(fct => fct(memory)));
  }

  /** `sums: sum [{ ',' sum }];` returns list of functions */
  sums (sum, many) {
    return [ sum ].concat(many ? many[0].map(seq => seq[1]) : []);
  }

The action for stmts uses reduce() and the comma operator to create a function which will execute the statement functions, one after another:

  /** `stmts: stmt [{ ';' stmt }];` returns fct */
  stmts (stmt, many) {
    return (many ? many[0] : []).
      reduce((left, list) => 
        memory => (left(memory), list[1][0](memory)), stmt[0]);
  }

As before, all the functions expect a memory object as an argument which is used to map names to values. The start rule of the grammar has to create this object. stmts is referenced in loop and select; therefore, there is an extra top-level start rule to create memory in a parameter-less executable:

  /** `prog: stmts;` returns executable */
  prog (stmts) { return () => stmts({ }); }

Check out example 6/12:

Press to represent and check the grammar and
press to compile the program.
Toggle and press again to see some of the functions created by the actions.

Quick Summary

Action methods can be used in syntax analysis to interpret — rather than just recognize — a sentence. For example, an arithmetic expression can be translated into reverse Polish notation or evaluated while it is recognized.
Variable values (or their descriptions) can be stored as properties of a JavaScript object or in a Map.
Programming languages contain control structures and interpretation of a program requires an intermediate representation so that loops can be interpreted more than once and selections can be partially skipped.
A stack machine B: The Stack Machine is both, very easy to simulate, and very easy to generate code for. It can use a fixed-size array as program storage and a variable-size array as data stack. A program counter can be a property of either array. Global variables can be persistent at the beginning of the data stack. Machine instructions are JavaScript functions which manipulate the data stack, fetch and store variable values in that array, and change the program counter for branching.
Alternatively, composition of JavaScript functions can also be used as an executable intermediate representation. Unlike the stack machine, however, the internals of this representation cannot be displayed as text.