7. Language Features

Type Checking

Functions, Scopes, and Nesting


Chapter six explained how to use actions to interpret or compile arithmetic expressions. Simple variables, input, output, and control structures were added, resulting in a little language which was compiled into JavaScript functions, or into machine instructions for a stack machine simulated in JavaScript.

This chapter adds specific language features to the little language and shows how to implement them, primarily for the stack machine. All classes are available from the module Seven which is built into the practice page.

Syntactic Type Checking

The result types of arithmetic and other operations depend on both, the operators, and the types of the operand values.

Dynamically typed languages such as JavaScript tend to allow operators to combine fairly arbitrary types and let variables store values of different types at different times. The results can be surprising:

  • Strings and numbers can be compared in JavaScript, but the string is first converted to a number, i.e., 'A' > '1' is true (string comparison) and 'A' > 1 is false (number comparison).

  • Strings and numbers can be added in JavaScript, but the number is converted to a string and strings are concatenated with the + operator, i.e., 1 + '2' is '12' (string sum) rather than 3 (number sum).

  • These effects are more surprising if variables are involved.

Strongly typed languages such as Java determine the types of all values at compile time so that they can select the most efficient machine instructions for implementation. This requires either explicitly declaring or implicitly deducing the data types which variables may store, and flagging every operation which does not use compatible data types.

Example 7/01 shows that strong type checking can be accomplished with a restrictive grammar where type mismatches will be reported as syntax errors.

  • Press to represent and check the grammar, and

  • press to perform syntax analysis:

> run = g.parser().parse(program, actions)
error: at (4) '-': in sequence, Lit.parse(): expects 'fi'

Here is line 4 of the program:

      x = x - y 

A deep dive into the grammar will reveal that a Name such as x is recognized as a stringTerm which cannot appear adjacent to the operator -, i.e., the error message is right on target — but hardly helpful.

Constants in a program tend to exhibit their data type by their looks:

  • integers are represented as digit strings,

  • floating point numbers require a decimal point, an exponent, or both,

  • strings are enclosed in quotes,

  • Boolean constants usually are symbols such as true and false, etc.

Fortran considered a variable to be "integer" if the name started with one of the letters I through N and "floating point" otherwise, but this idea of implicit typing has long since been abandoned. Instead, variable names are usually declared and typed before use, i.e., usually there is only one token for variable names

{
  Number: /0|[1-9][0-9]*/,
  String: /'(?:[^'\\]|\\['\\])+'/,
  Name: /[a-z]+/
}

and the type of a variable is determined by actions which interpret declarations.

Numbers can be converted to strings; therefore, this little language requires all variables to be strings. Example 7/01 more or less duplicates the rules for cmp and sum from example 6/12 to separate string and number operations and fuse comparisons and terms:

cmp: sum rel | stringSum stringRel;

term: number | '(' sum ')' | 'number' stringTerm;

stringRel: stringEq | stringNe | stringGt | stringGe | stringLt | stringLe;
stringEq: '=' stringSum;
  ...
  
stringSum: stringTerm [{ stringTerm }];
stringTerm: string | name | input | 'string' term;
string: String;
name: Name;
input: 'input' String String;

cmp requires that numbers are compared to numbers, and strings to strings (line 1 above) because all comparison rules have been duplicated for strings (lines 5 to 7) and only allow stringSum operands (lines 1 and 6).

sum still recognizes arithmetic with numbers, but name and input have been moved from a (numerical) term (line 3) to a stringTerm (line 10), i.e., name and input only allow strings. input uses two String tokens to specify a prompt text and a default input value (line 13).

A stringTerm can be converted to a number by preceding it with the literal number as a cast operation (line 3) and a (numerical) term similarly can be converted to a string (line 10).

A stringSum allows concatenation using one or more stringTerm (line 9); there is no explicit operator to mark this operation.

Example 7/01 still implements Euclid's Algorithm:

x = input 'x' '36'; y = input 'y' '       54';
while x <> y do
  if x > y then 
    x = x - y 
  else
    y = y - x 
  fi
od;
print '\'gcd\': ' x

It is unchanged but for some extra strings for input and print (lines 1 and 9 above). There are essentially three errors in the program, the last one serious enough to crash the practice page:

  • number Conversions are required before y can be subtracted from x and vice versa (lines 4 and 6).

  • string Conversions are required before the results can be assigned back to x and y (lines 4 and 6).

  • The default strings for x and y are not equal (line 1) but string comparison would consider any positive or negative number in x to be greater than y (line 3) because of the leading blanks in the default input (line 1) and 'x' would be set to ever more negative numbers — the program loops!

The last error demonstrates that strong typing will not necessarily catch serious flaws in a program. It should also be noted that this syntactic approach to type checking requires that there are few types and that operations are well separated.

The actions from Example 6/12 can be extended in example 7/01 to compile this little language into JavaScript functions:

class TCheck01 extends Six.Functions12 {
  // ...
  /** `stringEq: '=' stringSum;` returns fct for composition */
  stringEq (_, right) { return this.parser.call(this, super.eq, _, right); }

  /** `stringSum: stringTerm [{ stringTerm }];` returns fct */
  stringSum (term, many) {
    const c = (a, b) => memory => a(memory) + b(memory);
    return (many ? many[0] : []).
      reduce((sum, list) => c(sum, list[0]), term);
  }

The string comparisons can delegate to the numerical comparisons (line 4 above) because the target language JavaScript is dynamically typed. stringSum implements string concatenation in the callback function for reduce() (line 4).

  // term: number | '(' sum ')' | 'number' stringTerm;
  //       [0]          [1]                [1]
  term (...val) {
    switch (val.length) {
    case 1: return val[0];
    case 3: return val[1];
    case 2: return memory => parseInt(val[1](memory), 10);
    }
  }

  /** `stringTerm: string | name | input | 'string' term;` */
  //               [0]      [0]    [0]              [1]
  stringTerm (...val) {
    return val.length == 1 ? val[0] :
      memory => String(val[1](memory));
  }

term has to be overwritten to return a function if a program calls for an explicit conversion (line 7 above). Similarly, stringTerm has to implement a conversion to a string value if needed (line 15).

  /** Removes quotes and backslash */
  _unq (s) { 
    return  s.slice(1,-1).replace(/\\([\\'])/g, "$1");
  }

  /** `string: String;` returns fct */
  string (s) { return () => this._unq(s); }

  /** `input: 'input' String String;` [replace] returns fct */
  input (i, prmpt, dflt) {
    return () => prompt(this._unq(prmpt), this._unq(dflt));
  }

Finally, the string and input actions have to convert a single-quoted string literal in the program into the corresponding string value. A helper method _unq() removes the outer quotes and interprets backslashes if any (line 3 above). Just like the number action, the string action has to return a function which will return a constant, literal string value (line 7). input is overwritten to use the prompt and default strings and to not perform implicit conversion of the incoming string value into a number (line 11).

Type Checking by Interpretation

In this section the functional implementation of the little language with control structures from chapter six will be extended with strings and floating point and integer numbers. Changes to the grammar can be seen on this page, new action methods can be seen in the method browser.

The best conclusion from the previous section is that syntax analysis, i.e., sentence structure, and semantic analysis, i.e., sentence meaning, require separate mechanisms.

If variables are declared before use, type checking amounts to symbolic interpretation of a program — with types taking the place of values. Chapter six showed how to implement interpretation, i.e., immediate evaluation at compile time — at least for arithmetic expressions without variables.

Example 7/02 shows how to interpret types while translating a program into JavaScript functions. Variables have to be declared before use and there are integer and floating point variables with mixed arithmetic, strings with input, printing, and concatenation, strict comparisons and assignments, implicit conversion only for numerical operations, and explicit type casting operations. Here is the typed version of Euclid's algorithm:

int x, y;
x = int (input 'x' '36');
y = int (input 'y' '54');
while x <> y do
  if x > y then 
    x = x - y 
  else
    y = y - x 
  fi
od;
print '\'gcd\': ' + x
  • Press to represent and check the grammar,

  • press to perform syntax and semantic analysis and generate an executable, and

  • press to execute the compiled program.

  • Toggle and press again to see what JavaScript functions are created.

In this kind of type checking, actions immediately interpret types similar to values while generating and returning JavaScript functions to the parent rules and actions. The new action class is mostly concerned with type interpretation. Almost all function generation is inherited from example 6/12 and example 7/01. TCheck02 first defines some infrastructure:

class TCheck02 extends Seven.TCheck01 {
  /** For error messages */
  get parser () { return this.#parser; }
  #parser;
  /** Symbol table, maps names to types */
  get symbols () { return this.#symbols; }
  #symbols; 
  /** For symbolic computing with types */
  get stack () { return this.#stack; }
  #stack = [ ];
  
  constructor (parser, symbols = new Map()) {
    super();
    this.#parser = parser;
    this.#symbols = symbols;
  }

  /** Returns type of name, message if undefined */
  _type (name) {
    const type = this.symbols.get(name);
    if (type) return type;
    this.parser.error(name + ': undeclared');   // return undefined
  }

Error messages require access to the parser (lines 3, 12, and 14 above). There have to be a map from variable names to their types (lines 6, 12, and 15) and a stack (lines 9 and 10) where actions such as number, string, etc., will push a type and actions such as product, sum, etc., will pop and combine operand types and push result types.

_type() expects to find and return the type declared for a variable name in the symbol table (lines 20 and 21). If not, the method displays an error message (line 22) and returns undefined.

If a Name is recognized, the name action determines the type, pushes it onto the compile time type stack (line 3 below), and delegates to the superclass(es) to generate an appropriate JavaScript function (line 4).

  /** `name: Name;`  returns `fct:_type(name)` */
  name (name) {
    this.stack.push(this._type(name));
    return this.parser.call(this, super.name, name);
  }

This pattern is typical for int — which pushes 'int' and delegates to number — and string and input — which push 'string'. The float action has to generate a new function:

  /** `float: Float;` returns `fct:float` */
  float (float) {
    this.stack.push('float'); return () => parseFloat(float);
  }

The first heavy lifting happens when a term contains an explicit conversion (line 12 below):

  // term: int | float | string | name | input
  //       [0]
  //     | 'int' term | 'float' term | 'string' term
  //       [0]   [1]
  //     | '(' sum ')';
  //           [1]
  term (...val) {
    switch (val.length) {
    case 1: return val[0];
    case 3: return val[1];
    }
    const to = val[0], from = this.stack.pop();
    this.stack.push(to);
    return this._cast(val[1], from, to);
  }

In this case the term action is called with two arguments, namely the target type val[0] which is stored as to and the function val[1] which will produce the value which may have to be converted. The type of this value from is popped off the type stack (line 12 above). The result type of the explicit conversion will be to and it is pushed onto the type stack (line 13). Conversions might be applied elsewhere; therefore, it is the job of a new method _cast() to compose a suitable JavaScript function:

  /** Converts `fct:from` into `fct:to` if needed */
  _cast (fct, from, to) {
    switch (`${to} <- ${from}`) {
    default:
      this.parser.error('impossible cast from', from, 'to', to);

    case 'int <- int': case 'float <- float':
    case 'string <- string': case 'float <- int':
      return fct;

    case 'int <- float':
      return memory => Math.round(fct(memory));

    case 'int <- string':
      return memory => parseInt(fct(memory), 10);

    case 'float <- string':
      return memory => parseFloat(fct(memory));

    case 'string <- int': case 'string <- float':
      return memory => String(fct(memory));
    }
  }

Nothing needs to be done in four of the nine possible cases (lines 7 to 9 above) because three cases are identities, and integers and floating point values, both, are represented as numbers in the implementation language.

The five other cases are handled by composing four new functions which involve Math.round(), parseInt(), parseFloat(), and String(). The latter, again, works for both, integers and floating point values.

Arithmetic operations allow integers and floating point arguments, e.g.:

  /** `subtract: '-' product;` returns `fct:int|float` */
  subtract (_, right) {
    const [ l, r ] = this.stack.splice(-2, 2);
    if (l == 'string' || r == 'string')
      this.parser.error("cannot apply '-' to string");
    this.stack.push(l == 'int' && r == 'int' ? 'int' : 'float');
    return this.parser.call(this, super.subtract, _, right);
  }

The types of the left and right operand are on the stack (line 3 above). Subtraction and most other operations flag strings as errors (lines 4 and 5) and replace the operand types by the result type — 'int' only if all operands are integers (line 6). Division always produces 'float'. Code generation is always delegated to the superclass (line 7).

  /** `add: '+' product;` returns `fct:string|int|float` */
  add (_, right) {
    const [ l, r ] = this.stack.splice(-2, 2);
    this.stack.push(l == 'string' || r == 'string' ? 'string' :
        l == 'int' && r == 'int' ? 'int' : 'float');
    return this.parser.call(this, super.add, _, right);
  }

The operator + specifies concatenation if a string is involved and addition otherwise — this determines the result type (lines 4 and 5 above). This makes no difference for the generated function (line 6) because JavaScript has the same semantics.

  /** `cmp: sum rel;` returns fct */
  cmp (sum, rel) {
    const [ l, r ] = this.stack.splice(-2, 2);
    if ((l == 'string' || r == 'string') && l != r)
      this.parser.error('must compare strings to strings');
    return this.parser.call(this, super.cmp, sum, rel);
  }

cmp flags comparisons between strings and numbers (lines 4 and 5 above). The action discards the operand types because the parent rule is a statement and can only be select or loop which need no additional checking.

Assignment insists that variable and value type are equal (lines 3 to 6 below) and pops the value's type, clearing the stack at the statement level (line 4):

  /** `assign: Name '=' sum;` returns fct */
  assign (name, _, sum) {
    const type = this._type(name),
      r = this.stack.pop();
    if (type != r)
        this.parser.error(`assigning ${r} to ${type} ${name}`);
    return this.parser.call(this, super.assign, name, _, sum);
  }

print accepts only string values and pops all their types:

  /** `print: 'print' sums;` returns fct, string arguments only */
  print (p, sums) { 
    if (! this.stack.splice(- sums.length, sums.length).
            every(type => type == 'string'))
      this.parser.error('can only print strings');
    return this.parser.call(this, super.print, p, sums);
  }

As an alternative, printAny(), is an action for the same rule which implicitly converts all arguments to strings using the _cast() method described above:

  /** `printAny: 'print' sums;` returns fct */
  printAny (p, sums) {     // implicitly casts non-string arguments
    sums.reverse().map((sum, n) => {         // check each argument
      const type = this.stack.pop();      // requires reverse order
      if (type != 'string') {
        sum = this._cast(sum, type, 'string');  // apply conversion
        puts(`print argument ${sums.length - n} was ${type}`);
      }
      return sum;                             // returns fct:string
    }).reverse();
    return this.parser.call(this, super.print, p, sums);
  }

sums, the list of functions computing the values to be printed, is reversed so that the order matches the type stack top-down. For each function a type is popped off the stack (line 4 above). If the type is not 'string' already an implicit conversion is applied (line 6). Finally, the resulting list of functions is reversed again (line 10) just before delegation to print() in the superclass (line 11) to generate code for the actual print operation.

Essential for type checking in example 7/02 is that variables have to be declared before they are used. The rule for prog is revised to include declarations before statements (line 1 below). The action has to be overwritten to send the function generated for stmts to the superclass (line 2):

  /** `prog: [{ decl ';' }] stmts;` returns executable */
  prog (many, stmts) { return this.parser.call(this, super.prog, stmts); }
  
  /** `decl: type Name [{ ',' Name }];` */
  decl (type, name, many) {
    [ name ].concat(many ? many[0].map(list => list[1]) : []).
      forEach(name => {
        if (this.symbols.has(name))
          this.parser.error(`${name}: duplicate`);
        this.symbols.set(name, type[0]);
      });
  }
}

Each name in each declaration is checked to catch duplicate declarations (line 8 above) and it's type is recorded in the symbol table (line 10).

In example 7/02

  • Press to represent and check the grammar.

  • Replace + in the last line in the with a comma so that there are two arguments to print and

  • press to perform syntax and semantic analysis to determine that print only permits string arguments.

  • Replace the name print in two(!) rules by printAny in the to use the alternative action,

  • press to represent and check the changed grammar, and

  • press to perform syntax and semantic analysis again to see that there now is an implicit conversion.

Semantic analysis is done. It should be noted that the little language is more restrictive than the implementation language but benefits from some shortcuts due to dynamic typing and powerful implicit conversions in JavaScript.

All error messages show the typical dilemma: an action method could throw an error and forcibly terminate syntax analysis, but then only one error would be detected. Usually, it is better to continue syntax analysis with incorrect code generation and mark the executable as defective. Error reporting through the parser counts errors and reports them at the end of recognition.

It is also quite apparent that implementing semantic analysis points out design choices that were made for the little language. They may be a matter of taste, but they can surprise or annoy users endlessly...

Example 7/03 uses the same compiler Seven.TCheck02 and contains statements with a number of semantic errors.

  • Press to represent and check the grammar,

  • press to perform syntax and semantic analysis.

  • Repair the semantic errors, e.g., by deleting the offending statements, and recompile with .

  • Finally, press to execute the compiled program. The last two lines of output should be 3 and 4.

Functions

In this section the little language with control structures will be extended with functions which can be called recursively. Changes to the grammar can be seen on this page, new stack machine and action methods can be seen in the method browser.

Just like control structures, functions depend either on language features of the implementation language, i.e., functions in JavaScript to which the source language functions can be mapped, or on branch instructions which capture return information. This section looks at an implementation of functions without parameters for the stack machine.

Example 7/04 contains two mutually recursive functions which together implement Euclid's algorithm without an explicit loop:

var x, y;
function fa;
function fb;

function main begin
  x = input 36; y = input 54;
  fa; print x;
  x = 90; y = 72;
  return fb; print 99
end;

function fa begin
  if x = y then return x fi;
  if x > y then x = x - y else y = y - x fi;
  fa = fb
end;

function fb begin
  if x = y then fb = x else 
    if x > y then x = x - y else y = y - x fi;
    fb = fa
  fi
end;

The program starts in main, inputs decimal values for x and y, calls the algorithm, and prints the result (lines 6 and 7 above). It then sets x and y to other values, calls the algorithm again, and returns the result. 99 will not be printed (line 9).

  • Press to represent and check the grammar,

  • press to create the executable,

  • press to execute, and

  • input two arbitrary positive integers for the first call to the algorithm.

The second output (line 3 below) is the greatest common divisor of 90 and 72. It is returned from main (line 9 above) and printed by the caller. After execution memory contains the values 18 for the variables x and y (line 4).

> run()
0
18
[ 18 18 ]

A program consists of global variable definitions followed by function definitions; one of the functions must be called main. When the executable is run main is called implicitly and the result is printed.

Functions must be declared before they can be used, i.e., mutual recursion requires a declaration of function fb (line 3 in the program above) before it can be used in the definition of function fa (lines 12 to 16). If function fb is declared it must later be defined (lines 18 to 23).

return terminates function execution (lines 9 and 13) and can arrange for a result to be delivered to the point of call. The result value of a function is either specified explicitly for return or it is assigned to the function name, or it is zero. Technically, there is no need for a return statement because a result value can be assigned to the function name within the function body (lines 15, 19 and 21); however, return can be convenient for early termination (e.g., line 13 for fa).

Functions can be called in the context of expressions or as statements. If a function is called in a statement (line 7) the result value is discarded; otherwise the result value can be used in an expression. E.g., return fb in the definition of main (line 9) first makes a call to fb and then terminates the (implicit) call to main and returns the result of the call to fb as result of the call to main.

This step in the evolution of the little language requires a bit of semantic analysis:

  • Variables are global and need not be declared.

  • A variable cannot be called as if it were a function.

  • Functions must be declared or defined before they can be used.

  • A function cannot be declared after it has already been defined.

  • Assignment to a function name is only allowed in the function body.

Grammar Modifications

A program consists of an optional list of global variables followed by one or more function declarations and definitions, each terminated by a semicolon. A function declaration has no body (line 5 below):

prog: [ vars ] funs;
vars: 'var' names ';';
names: Name [{ ',' Name }];
funs: { fun };
fun: head [ 'begin' stmts 'end' ] ';';
head: 'function' Name;

There are only two more extensions to the grammar:

  stmt: assign | print | return | loop | select;
  assign: Name [ '=' sum ];
  return: 'return' [ sum ];

A procedure call has to be recognized as an assignment where the actual assignment operation is omitted (line 2 above). return is added as a new statement. All other changes are semantic in nature, i.e., they concern the fact that a Name can refer to a function or a variable.

Machine Instructions

The stack machine needs three new instructions:

class Machine04 extends Six.Machine11 {
  /** `stack: ... -> ... old-pc | pc: addr` */
  Call (addr) { 
    return memory => (memory.push(memory.pc), memory.pc = addr);
  }
  /** `stack: ... old-pc -> ,,, 0 old-pc` */
  Entry (memory) { 
    memory.splice(-1, 0, 0);
  }
  /** `stack: ... old-pc -> ... | pc: old-pc` */
  Return (memory) { 
    memory.pc = memory.pop(); 
  }
  /** `stack: ... x old-pc result -> ... result old-pc result` */
  ReturnValue (memory) { 
    memory.splice(-3, 1, memory.at(-1)); 
  }    
}

Functions are control structures and require a machine instruction which remembers the address to return to. Recursive calls are possible as long as this address is deposited on the stack. Call(addr) pushes the stack machine's current program counter onto the stack and places addr into the program counter (line 4 above).

Return undoes the effect of Call: It expects the function result value on top of the stack and the return address just below. It removes the return address from the stack and assigns it to the program counter (line 8). This leaves the function result value on top of the stack.

ReturnValue(memory) expects the function result value on top of the stack, just above the return address and the slot for the function result. It copies the result value from the top of the stack into the result slot (line 12). Depending on context the value may then have to be popped off the stack.

Infrastructure

The symbol table now contains descriptions of variables and functions; therefore, a separate counter size is needed to allocate variable addresses (lines 3 and 4 below):

class Functions04 extends Six.Control11 {
  /** Manages next (global) variable address */
  get size () { return this.#size; }
  set size (size) { this.#size = size; }
  #size = 0;
  
  /** Describes current function */
  get funct () { return this.#funct; }
  set funct (sym) { this.#funct = sym; }
  #funct;

funct contains the symbol description of the current function (lines 8 and 9 above). Getters and setters are used because they can be overwritten in subclasses and they cannot be mistaken for action methods.

Symbol descriptions will change as the little language evolves. All of them contain a name and a reference to the actions class which uses them, i.e., they are best represented by inner classes which have this.Symbol as the common base class:

  get Symbol () { return this.#Symbol ??= class {
      owner;                                   // surrounding class         
      name;                               // variable/function name
    
      constructor (owner, name) {
        this.owner = owner; this.name = name; 
      }
    };
  }
  #Symbol;

this.Var is the class to represent variables. This kind of symbol description must be created with a new global address (line 5 below):

  get Var () { return this.#Var ??= class extends this.Symbol {
      addr;                                       // memory address
  
      constructor (owner, name, addr) {
        super(owner, name); this.addr = addr;
      }
      load () {                        // generate load instruction
        this.owner.machine.gen('Load', this.addr);
      }
      storeOk () { return true; }       // always permit assignment
      store () {                      // generate store instruction
        this.owner.machine.gen('Store', this.addr);
      }
      toString () { return `${this.name} at ${this.addr}`; }
    };
  }
  #Var;

The class has a few convenience methods to create instructions to copy the variable value to the stack (line 8 above) and back (line 12) and to format information about the symbol (line 14).

this.Fun is the class to represent functions and has similar convenience methods for code generation:

  get Fun () { return this.#Fun ??= class extends this.Symbol {
      start = false;                  // start address, not yet set
      calls = [];                    // forward references to entry
      returns = [];                   // forward references to exit
    
      entry () { // defines start address, arranges slot for result
        this.start = this.owner.machine.gen('Entry') - 1;
      }  
      undo () {               // ends a declaration, undoes entry()
        this.owner.machine.code.length = this.start;
        this.start = false;
      }  
      call () {    // create Call or save address for slot for Call
        if (typeof this.start == 'number')
          this.owner.machine.gen('Call', this.start);
        else
          this.calls.push(this.owner.machine.code.push(null) - 1);
      }
      return () {           // create slot for Branch, save address
        this.returns.push(this.owner.machine.code.push(null) - 1);
      }
      storeOk () {                     // ok to store result value?
        if (this == this.owner.funct) return true;
        this.owner.parser.error(`${this.name}: ` +
            `assigned to outside function`);
        return false;
      }
      store () {              // store top of stack as result value
        this.owner.machine.gen('ReturnValue');
      }
      end () {          // resolve calls and returns if any, exit()
        const call = this.owner.machine.ins('Call', this.start);
        this.calls.forEach(c => this.owner.machine.code[c] = call);
        this.calls.length = 0;

        const br = this.owner.machine.ins('Branch',
          this.owner.machine.code.length);
        this.returns.forEach(c => this.owner.machine.code[c] = br);
        this.returns.length = 0;
        this.exit();
      }
      exit () {      // generates code to return from function call
        this.owner.machine.gen('Return');
      }    
      toString () {
        return `function ${this.name} start ${this.start}`;
      }
    };
  }
  #Fun;

A function result value is zero by default; therefore, the slot for the function result is created at the beginning of a function by pushing zero onto the stack (line 7 above).

Function declarations and definitions look more or less alike and both will cause a call to entry() which a declaration can undo by calling undo() (line 9).

Because of function declarations a function can be called before it is defined; therefore, a method call() might create an empty slot for a Call instruction (line 17) and a method end() is called when the function definition is complete to fill these slots (lines 32 to 34).

Similarly, return() directs all function returns to the end of the function code (line 20) and end() inserts the necessary forward branch instructions (lines 36 to 39).

storeOk() checks if a function result is assigned within the current function definition (line 23) and displays an error message if not.

store() codes the ReturnValue instruction which sets the function result (line 29).

end() delegates to exit() to code the Return instruction which ends function execution (line 43).

All of these methods are designed to be overwritten if function setup and cleanup have to be changed.

All three symbol classes are defined using getters so that they can be overwritten in action subclasses. The result values of the getters are memoized using the ??= operator so that tests like symbol instanceof this.Fun produce the intended result.

_find() and _dcl() manage the symbol table, _alloc() allocates addresses for new variables:

  /** Returns symbol description for name, if any */
  _find (name, report) {
    const sym = this.symbols.get(name);
    if (report && !sym) this.parser.error(`${name}: undefined`);
    return sym;
  }
  
  /** (Re-)defines and returns `sym`, cannot be undefined */
  _dcl (sym, report) {
    if (report && this.symbols.get(sym.name))
      this.parser.error(`${sym.name}: duplicate`);
    this.symbols.set(sym.name, sym);
    return sym;
  }
  
  /** Returns new `Var` at next global address. */
  _alloc (name) { return new this.Var(this, name, this.size ++); }

_find() returns a symbol description for a name and can optionally report that the symbol is undefined. _dcl() enters a symbol description into the symbol table and can optionally report that this is a duplicate definition.

Actions

Given functions, Name can refer to either a function or a variable; therefore, actions have to be created or revised for all rules involving Name.

   // names: Name [{ ',' Name }];
  names (name, some) {
     const dcl = name => this._dcl(this._alloc(name), true);
     dcl(name);
     if (some) some[0].forEach(list => dcl(list[1]));
     return 1 + (some ? some[0].length : 0);
  }

names creates variable descriptions for a list of names and uses _dcl() to enter them into the symbol table and report duplicates.

  // name: Name;
  name (name) {
    const sym = this._find(name, true);
    if (sym instanceof this.Fun) sym.call();
    else if (sym) sym.load();
  }

In an expression, a Name can trigger a function call (line 4 above) which will eventually leave the result on top of the stack, or it can cause the current value of a variable to be pushed onto the stack (line 5). Code generation for either case is handled by the symbol description as discussed above. If a name is undefined _find() reports an error and no code is generated (line 5).

  // assign: Name [ '=' sum ];
  assign (name, sum) {
    const sym = this._find(name, true);
    if (sym) {
      if (sym instanceof this.Var)
        if (sum && sym.storeOk()) sym.store();   // variable = sum
        else this.parser.error(`${name}: cannot call a variable`);
      else if (!sum) sym.call();                   // function call
      else if (sym.storeOk()) sym.store();        // function = sum
      this.machine.gen('Pop');                       // clear stack
    }
  }

In an assignment statement Name can refer to a function or a variable. Code is only generated if _find() provides a description and does not report an error (line 4 above). If it is a variable (line 5) and if there is something to assign sym.store() is called to generate the Store instruction (line 6). Otherwise there is an error because a variable cannot be called (line 7). If it is a function name and nothing to assign sym.call() takes care of the Call instruction (line 8). If there is something to assign sym.store() can be called if the name refers to the current function, i.e., if the assignment is in the function body (line 9); otherwise sym.storeOK() reports the assignment to the function name as an error. Finally, the stack has to be popped to discard the assigned value or the function result (line 10).

  // return: 'return' [ sum ];
  return (r, sum) {
    if (sum && this.funct.storeOk())
      (this.funct.store(), this.machine.gen('Pop'));
    this.funct.return();
  }

If a return statement includes a value, the current function's store() method will assign it as a function result and then the value has to be popped off the stack (line 4 above). Function termination is arranged by the function description's return() method (line 5).

Functions can be declared so that they can be called before they are defined:

funs: { fun };
fun: head [ 'begin' stmts 'end' ] ';';
head: 'function' Name;

head is recognized before fun and the function body if any, i.e., the action for head happens first:

  // head: 'function' Name;
  head (f, name) {
    let sym = this._find(name);
    if (! (sym instanceof this.Fun)) {
      if (sym instanceof this.Var)
        this.parser.error(`${name}: used as variable and function`);
      sym = this._dcl(new this.Fun(this, name));
    }
    if (typeof sym.start == 'number') {
      this.parser.error(`${name}: duplicate`);
      sym = this._dcl(new this.Fun(this, name));           // patch
    }
    sym.entry();                // generate code for function entry
    return this.funct = sym;                         // in function
  }

First there is some semantic analysis:

  • A variable name cannot be a function name (line 5 above); to allow further processing it is redeclared as a function (line 7).

  • An undefined name is declared as a function (line 7).

  • A function cannot be defined more than once (line 9) — sym.start will contain an address once there is code for the function.

head may be the beginning of a function definition; therefore sym.entry() is called to generate the code to start the function (line 13). Finally, the current function is posted in this.funct and returned (line 14).

Once code generation for the function body is complete — or immediately if there is no body — the fun action performs some cleanup:

  // head [ 'begin' stmts 'end' ] ';';
  fun (head, opt, _) {
    if (opt) head.end();            // function definition: wrap up
    else head.undo();   // function declaration: discard entry code
    this.funct = null;                           // not in function
  }

head provides access to the function description. If there is a function body the end() method described previously inserts Call and Branch instructions and generates the code to return from the function call (line 3 above). Otherwise this was only a function declaration and undo() removes the effect of entry() in the head action (line 4). Finally, this.funct is set to null to signal that code generation has left the body of the function (line 5).

All actions for functions and their consequences are in place. Expressions can be retained from the previous (untyped) version of the little language, i.e., from example 6/11. The prog action performs some final cleanup and generates the stack machine:

  // prog: [ vars ] funs;
  prog (v, f) {
    const main = this._check_defs(this.symbols),  // flag undefined
      startAddr = this.machine.code.length,      // at startup code
      trace = this._find('trace'),  // does variable 'trace' exist?
      traceAddr = trace instanceof this.Var ? trace.addr : false;
    if (main) this._startup(main);         // generate call to main
    else this.parser.error('main: undefined');
    if (traceAddr !== false) {                 // if 'trace' exists
      puts(this.machine.toString());                  // show code,
      this.symbols.forEach(s => puts(s.toString()));    // symbols,
      puts('stack starts at', this.size);   // variable memory size
      if (main) puts('execution starts at', startAddr);
    }
    return this.machine.run(this.size, startAddr, traceAddr);
  }

Just as before tracing depends on the existence of a symbol named trace (line 5 above) but trace has to be a variable (line 6). If so, the generated code, the symbols, and the initial memory size are shown (lines 9 to 14). The stack machine generator creates the executable as before (line 15); however, functions definitions have to be checked (line 2) and the initial call to main has to be generated (line 7) which will determine the start address of the executable.

_check_defs() checks all symbols to make sure that all functions have a defined start address and it returns the description for main if it is a defined function:

  _check_defs (map) {
    let main = undefined;
    map.forEach(sym => {
      if (sym instanceof this.Fun)
        if (typeof sym.start != 'number')
          this.parser.error(`${sym.name}: undefined function`);
        else if (sym.name == 'main')
          main = sym;
    });
    return main;
  }

If main was defined, startup() generates instructions to call the function and print the result and returns the start address of the instructions.

  _startup (main) {
    this.machine.gen('Call', main.start);     // call main function
    this.machine.gen('Print', 1);                  // print and pop
  }
}

Tracing was introduced in chapter six. Check out how a function is called and how it returns a result in example 7/04:

  • Press to represent and check the grammar,

  • press to create the executable, and

  • repeatedly press to execute one step at a time.

  • Alternatively, add statements trace = -1, trace = 1, or trace = input to turn tracing off and on, or

  • add trace to the list of global variables (which are initialized to zero) to trace everything.

Example 7/05 demonstrates 13 semantic errors:

  var a, a;
  function undefined;
  function a;
  function g begin g = 1 end;
  function f begin f = 2; g = 3; v = 4 end;
  function v;
  function f begin
    return;
    v = 5; v; undef; f = 6
  end;
  function g;
  • duplicate global variable names (in line 1 above),

  • a name a which is defined as a variable and forward declared as a function (line 3),

  • a function names g and v which are assigned to outside the function definitions (lines 5 and 9),

  • an undefined variable name 'v' (line 5),

  • a duplicate function definition (line 7),

  • a function undef which is used but not declared (or defined) (line 9),

  • forward declarations for functions a, v, and undefined which are never defined (lines 2, 3, and 6),

  • a forward declaration for g after the function has been defined (line 11),

  • and no definition for a main function.

Local Variables

In this section the little language with functions will be extended with local variables. Changes to the grammar can be seen on this page, new stack machine and action methods can be seen in the method browser.

Frames

Real functions have parameters and local variables. This complicates the code when entering and exiting a function:

  • At entry, argument values are connected to parameter names and memory is allocated for local variables.

  • At exit, cleanup is required so that Return only leaves a result behind — this is why a return inside a function body is coded as a branch to the exit code.

Recursion complicates access to parameters and local variables even more because the values exist once for each level of recursion and the names refer to the latest activation of each function.

Rather than just stacking a return address and a result value slot as in example 7/04, now frames, also called activation records, have to be stacked which additionally contain argument values and local variables:

global function recurse

Memory use evolves:

  • Initially there are global variables.

  • Once function a is called and itself calls function b there are two locals frames.

  • If there is a recursive call from b to a there are three locals records.

The locals are chained on a dynamic link so that the exit code and Return can clean them up in reverse order.

Addressing also gets more complicated:

  • Global variables have absolute addresses, determined at compile time.

  • Local variables have relative addresses, i.e., offsets within their frame, which have to be added to the base address where the frame begins at run time. The offset is determined at compile time but — as the graphic indicates — the base address depends on the call history.

In summary, if parms is the number of parameters of a function, a frame contains the following:

offset content
+0... argument values for the parameters, if any
+parms the return address for the function
+parms +1 a slot for the dynamic link,
i.e., the address of the frame of the caller
+parms +2 a slot for the function result
+parms +3... values for the local variables

The Memory class from example 7/04 is extended with a frame pointer register, represented as a property .fp (line 4 below), and toString() is overwritten to display the stack as a sequence of frames:

class Machine06 extends Seven.Machine04 {
  get Memory () {
    return this.#Memory ??= class extends super.Memory {
      fp = 0;                           // global frame starts at 0
      frames = [ 0 ];   // toString(): list of number of parameters
      
      toString () {
        let fp = this.fp,                   // begin of (top) frame
          to = this.length;                   // end of (top) frame
        return this.frames.map((parms, n) => {
          try {
            return `${fp}:[ ${this.slice(fp, to).
              map(slot => this.mapSlot(slot)).join(' ')} ]`;
          } catch (e) { throw e;                // shouldn't happen
          } finally {
            to = fp;             // end and begin of previous frame
            if (n == this.frames.length-1) fp = 0;       // globals
            else fp = this[fp + parms + 1];       // previous frame
          }
        }).reverse().join(' ');
      }
      
      mapSlot (slot) {                   // hook to interpret slots
        return typeof slot == 'undefined' ? 'undefined' : slot;
      }
    };
  }
  #Memory;

The frames can be displayed by following the dynamic link which at this point depends on the number of parameters in each frame. At this point this is stored in a list represented as a property .frames[] (line 5 above).

The top frame on the stack extends from .fp to the current end of the memory array (lines 8 and 9). In a loop over the frames (line 10) a frame is accessed using slice() (line 12) and each slot is represented as a string using a function mapSlot() which might be overwritten (lines 13 and 24).

Machine Instructions

Argument values are pushed onto the stack before a function is called. The Entry instruction at the beginning of each function has to initialize the frame as described above, given that parms is the number of parameters of the function and size is the memory requirement of the new frame:

  /** `stack: ... arguments old-pc
      -> ... arguments old-pc old-fp result locals` */
  Entry (parms, size) { 
    return memory => {
      const locals = size - parms - 3,           // local variables
        fp = memory.length - parms - 1;         // new frame's base
      memory.push(memory.fp, 0);             // push old-fp, result
      if (locals)                   // push local variables, if any
        memory.push(... Array(locals).fill(0));
      memory.fp = fp;                           // new dynamic link
      memory.frames.unshift(parms);  // push frames stack for trace      
    };
  } 

The frame contains a slot for the function result, initialized to 0 (line 7 above). This slot can be treated just like a local variable but for the fact that it is only assigned to.

The Exit instruction at the end of each function reverts to the previous frame pointer (line 6 below) and removes all but the return address and result value from the stack as the Return instruction expects it (lines 7 and 8):

  /** `stack: ... arguments old-pc old-fp result locals
      -> ... result old-pc` */
  Exit (parms) {  
    return memory => {
      const fp = memory.fp;                        // current frame
      memory.fp = memory[fp + parms + 1];   // restore dynamic link
      memory.splice(fp, Infinity,  // pop frame, push result old-pc
        memory[fp + parms + 2], memory[fp + parms]);
      memory.frames.shift();            // pop frames stack (trace)
    };
  }

return statements within the body of the function are coded as branches to the Exit instruction.

Parameters and local variables have to be addressed relative to the frame pointer using two new instructions:

  /** `stack: ... -> ... frame[addr]` */
  LoadFP (addr) {
    return memory => memory.push(memory[memory.fp + addr]);
  }

  /** `stack: ... val -> ... val | frame[addr]: val` */
  StoreFP (addr) {
    return memory => memory[memory.fp + addr] = memory.at(-1);
  }
}

Grammar Modifications

To enable semantic analysis in the actions of syntax analysis, functions have to be declared before use, and a design decision is necessary to determine the scope of declarations in a way that is compatible with the activation record stack:

  • A function should not have access to the local variables and parameters of a different function because its frame might not exist.

  • Functions may have access to global variables.

  • Local names may shadow global names.

For example, "declare before use" for variables could be done prior to functions and at the beginning of function bodies, and missing declarations could be considered errors:

prog: GLOBAL { fun };
fun: head [ 'begin' [ LOCAL ] stmts 'end' ];
head: 'function' Name '(' [ parms ] ')';
parms: Name [{ ',' Name }];

Alternatively, in the absence of GLOBAL and LOCAL above, unknown names could be resolved in the closest scope, i.e., enclosing function.

In this case there could be GLOBAL import declarations so that functions could share some global variables:

fun: head [ 'begin' [ GLOBAL ] stmts 'end' ];

As a third alternative, unknown names could be resolved in the global scope, i.e., visible to any function unless shadowed by parameter names.

In this case there could be LOCAL definitions as above so that functions can have local variables in addition to parameters.

The design choice among these alternatives depends on perceived convenience vs. the wish to prevent errors due to scope violations.

The little language in example 7/06 requires strict "declare before use" and uses global and local declarations:

prog: [ vars ] funs;
vars: 'var' names ';';
names: Name [{ ',' Name }];
funs: { fun };
fun: head parms [ block ] ';';
head: 'function' Name;
parms: '(' [ names ] ')';
block: 'begin' [ vars ] stmts 'end';

names before all functions define global variables (lines 1 to 3 above), names in a block define local variables (line 8), and names in parms declare parameter names local to the function (line 7).

begin and end delimit a block (line 8) to avoid an ambiguity between a semicolon separating statements and a semicolon terminating a function definition.

On the other hand, the semicolons terminating function and variable declarations are not required to disambiguate — they are punctuation in the spirit of end, fi, and od.

A forward declaration omits the function body (line 5). The number of parameters in a forward declaration has to match the function definition, the parameter names need not.

A Name can be assigned to or called with function arguments in a statement, or it can be referenced and called with function arguments in an expression. The common aspects can be emphasized by changes to the grammar:

assign: symbol action;
action: store | call;
store: '=' sum;
call: args;
args: '(' [ sums ] ')';

name: symbol [ args ];
symbol: Name;

In addition to distinguishing the use of variable and function names, semantic analysis now has to check that the numbers of parameters and argument values agree and that a forward declaration of a function specifies as many parameters — irrespective of the actual names — as the function definition.

Example 7/07 contains all semantic errors:

var f, dup, dup;
function undefined ();
function a ();
function a (dup) begin var dup; dup = 1 end;
function a (x) begin var y; y = 1 end;
function b () begin b = 2 end;
function f () begin f = 3; g = 4 end;
function main () begin
  a();
  a = 5;
  undef();
  dup()
end;
  • a global variable dup defined more than once (line 1),

  • a function a declared forward with no parameters but defined with one (lines 3 and 4),

  • a local variable dup with the same name as a parameter (line 4),

  • a second definition for the function a (line 5),

  • a function f with the same name as a global variable (line 7),

  • an undefined variable g (line 7),

  • a call to a where the number of arguments and parameters differ (line 9),

  • an assignment to the function name a outside the function definition (line 10),

  • an undefined function undef (line 11),

  • a function call to the variable dup (line 12), and

  • a forward declared function undefined (line 2) which is not defined.

Infrastructure

Local names can shadow global names, i.e., attributes such as type or address can differ. Therefore, a function needs a local symbol table (line 5 below):

class Parameters06 extends Seven.Functions04 {
  get Fun () { return this.#Fun ??= class extends super.Fun {
      parms;                                // number of parameters
      addr;              // offset of function result slot in frame
      #locals = new Map();      // maps local names to descriptions
      get locals () { return this.#locals; }
      set locals (locals) { this.#locals = locals; }
      #size = 0;                        // next address, frame size
      get size () { return this.#size; }
      set size (size) { this.#size = size; }

Fun, the class of function descriptions in the symbol table, is extended (line 2 above). It now contains the number of parameters (line 3), the relative address of the function result in the frame (line 4), the symbol table for parameters and local variables (lines 5 to 7), and the size of the frame (lines 8 to 10). Getters and setters are used so that further restrictions on the visibility of names can be added.

Code generation for functions has to be modified and extended:

      entry () {  // defines start address, arranges slot for Entry
        this.start = this.owner.machine.code.push(null) - 1;
      }  
      setParms () {         // frame: parms, old-pc, old-fp, result
        if (typeof this.parms != 'undefined' 
            && this.parms != this.size)
          this.owner.parser.error(`${this.name} parameters: ` +
            `previously ${this.parms}, now ${this.size}`);
        this.parms = this.size;         // set number of parameters
        this.size += 2;         // leave room for old pc and old fp
        this.addr = this.size ++;          // leave slot for result
      }
      undo () {
        this.locals = new Map();             // undefine parameters
        this.size = 0;            // reset next address, frame size
        super.undo();
      }
      store () {                                   // use `StoreFP`
        this.owner.machine.gen('StoreFP', this.addr);
      }
      exit () {           // fills Entry, generates Exit and Return
        this.owner.machine.code[this.start] =
          this.owner.machine.ins('Entry', this.parms, this.size);
        this.owner.machine.gen('Exit', this.parms);
        this.owner.machine.gen('Return');
      }    
      toString () {
        const names = [];
        this.locals.forEach(sym => names.push(sym.name));
        return `function ${this.name} start ${this.start} ` +
          `parms ${this.parms} size ${this.size} ` +
          `[ ${names.join(' ')} ]`;
      }
    };
  }
  #Fun;

entry() leaves room for an Entry instruction (line 2 above). This instruction can only be created by exit() once the number of parameters and the frame size are known (lines 22 and 23).

setParms() will be called once the names in the parameter list, if any, have been declared. The method ensures that declarations match (line 6), records the number of parameters, leaves room for the old frame pointer and function result in the frame, and records the address of the function result (lines 9 to 11).

setParms() is called for both, a forward declaration and a definition of a function. undo() is called for a forward declaration and is extended to remove the symbol table in order to discard the parameter names in the declaration and reset the frame size so that nothing has been allocated in the frame yet (lines 14 and 15).

The ReturnValue instruction is no longer used. store() uses a StoreFP instruction to copy the function result from the stack into the slot at .addr reserved by setParms() (line 19).

As before, exit() generates a Return instruction, but it is also responsible for inserting the Entry and Exit instructions which require information about parameters and frame size (lines 22 to 25). Needless to say, on real hardware there would be boilerplate instruction sequences for function entry and exit which exit() would still generate.

Variables can be local in a frame or global, and different instructions are needed to access their values. Therefore, Var, the class of variable descriptions in the symbol table, has to be extended, too:

  get Var () { return this.#Var ??= class extends super.Var {
      depth;                               // 0: global, else local
      
      constructor (owner, name, addr, depth) {
        super(owner, name, addr); this.depth = depth;
      }
      load () {                        // generate load instruction
        if (this.depth)                                    // local
          this.owner.machine.gen('LoadFP', this.addr);
        else                                              // global
          this.owner.machine.gen('Load', this.addr);
      }
      store () {                      // generate store instruction
        if (this.depth)                                    // local
          this.owner.machine.gen('StoreFP', this.addr);
        else                                              // global
          this.owner.machine.gen('Store', this.addr);
      }
      toString () {
        return `${this.name} at ${this.depth ? '+' : ''}${this.addr}`;
      }
    };
  }
  #Var;

A new property .depth differentiates between local and global variables (line 2 above). It has to be set when the variable is created (lines 4 and 5). load() and store() consult the property to create the appropriate instruction for local and global access.

  /** Replace: returns new `Var` at next local/global address. */
  _alloc (name) {
    if (this.funct)                      // create local variable
      return new this.Var(this, name, this.funct.size ++, 1);
    else                                // create global variable
      return new this.Var(this, name, this.size ++, 0);
  }  

  /** Extend: checks local then global map, returns `sym` */
  _find (name, report) {
    let sym;
    if (this.funct && (sym = this.funct.locals.get(name)))
      return sym;                                          // local
    return super._find(name, report);                     // global
  }

  /** Replace: sets innermost map, returns `sym` */
  _dcl (sym, report) {
    const map = this.funct ? this.funct.locals : this.symbols;
    
    if (report && map.get(sym.name))
      this.parser.error(`${sym.name}: duplicate`);
    map.set(sym.name, sym);
    return sym;
  }

funct contains the description of a function exactly while the function is compiled.

_alloc() uses funct to create a local variable in the frame described by funct (line 4 above) or a global variable (line 6). In each case a new address is created from the corresponding size counter.

_find() uses funct to check for local names before global names (line 12). This implements shadowing.

Similarly, _dcl() uses funct to declare a name in a local or global symbol table (line 19) and report duplicates, if any.

  _startup (main) {
    for (let p = 0; p < main.parms; ++ p)
      this.machine.gen('Push', 0);
    super._startup(main);
  }

Finally, _startup() has to be extended to allocate room for the parameters of the main() function (lines 2 and 3 above) before the function can be called to start execution.

Statements

While the grammar now unifies the use of a Name in an assignment statement and in an expression, the rule changes described earlier create a problem:

assign: symbol action;
action: store | call;
store: '=' sum;
call: args;
args: '(' [ sums ] ')';

name: symbol [ args ];
symbol: Name;

An assign statement operates on a symbol, i.e., a Name (lines 1 and 8 above). However, there are two possible operations (line 2): either a value is assigned to a variable (line 3) or a function is called with an argument list (line 4). Similarly, in the context of an expression, either a variable value is pushed onto the stack or — again — a function is called with an argument list (line 7).

Recognition starts top-down with the start rule of the grammar, but the rules call their actions from leaf nodes such as symbol back to the start rule. It is predictable from the grammar in which exact order recognition will complete:

symbol: Name;
  store: '=' sum;
  args: '(' [ sums ] ')';
    call: args;
      action: store | call;
        assign: symbol action;
        name: symbol [ args ];

Given the order in which the actions are called, it easy to design what each action should do:

rule action
symbol: Name; find description
store: '=' sum; if variable generate Store
args: '(' [ sums ] ')'; generate code to stack each value
finally call function
call: args; nothing to do
action: store | call; nothing to do
assign: symbol action; pop stack
name: symbol [ args ]; if variable generate Load
else nothing to do

However, there is a lack of information flow while assign is recognized, e.g., the action for symbol finds a description but the action for store will not receive it as a parameter. Similarly, the action for args seems to have no access to the description of the function to be called.

Example 5/10 demonstrated that action methods should use the actions class as context to communicate with each other, i.e., symbol needs to put the description somewhere into this so that store and args — one of which is called later — can access it.

this.context has to be a stack because the store and args rules both involve sum which in turn could involve another name and symbol... Therefore, this.context is implemented with get and set:

  get context () {
    if (this.#contexts.length) return this.#contexts.at(-1);
    throw 'no context';                              //can't happen
  }
  set context (context) {            // push a value, pop with null
    if (context) this.#contexts.push(context);
    else this.#contexts.pop();
  }
  #contexts = [];

The action for symbol creates a context entry (line 5 below),

  // symbol: Name;
  symbol (name) {
    let sym = this._find(name, true);
    if (!sym) sym = this._dcl(this._alloc(name));          // patch
    this.context = { symbol: sym };      // push symbol description
    return sym;
  }

The actions for store and args access the context entry:

  // store: '=' sum;
  store (_, sum) {
    if (this.context.symbol.storeOk())
      this.context.symbol.store();
  }

  // args: '(' [ sums ] ')';
  args (lp, sums, rp) {
    const sym = this.context.symbol,            // to apply args to
      nargs = sums ? sums[0] : 0;                 // # of arguments
    if (!(sym instanceof this.Fun))
      this.parser.error(`${sym.name}: not a function`);
    else if (nargs != sym.parms)
      this.parser.error(`${sym.name} arguments: ` +
        `expected ${sym.parms}, specified ${nargs}`);
    else
      sym.call();                                  // call function
  }

Calling storeOk() ensures that assignment is allowed and store() generates the appropriate code (lines 3 and 4 above).

args() ensures that a function is called (line 11) and that the numbers of parameters and argument values match (line 13) before call() is used to generate code (line 17).

Finally, the actions for assign and name delete the context entry (lines 3 and 13 below):

  // assign: symbol action;
  assign (symbol, action) {
    this.machine.gen('Pop'); this.context = null;    // pop context
  }

  // name: symbol [ args ]; 
  name (sym, args) {
    if (!args)
      if (sym instanceof this.Fun)
        this.parser.error(`${sym.name}: no argument list`);
      else
        sym.load();                           // variable reference
    this.context = null;                             // pop context
  }

name() marks a function reference without arguments as an error (line 9 above). The grammar guarantees that the context stack remains balanced at all times.

Just One More

Next up on the way to the start rule of the grammar are function definitions. Semantic analysis and code generation for functions consist of five steps:

  1. declare a function with a local symbol table,
  2. declare parameters,
  3. define local variables,
  4. generate code for the function body, and
  5. wrap up.

An action happens after all input required by the corresponding rule has been recognized. If something has to happen earlier, the rule is split into several rules to get the proper timing for the actions:

fun: 'function' Name '(' [ names ] ')'
       [ 'begin' [ 'var' names ';' ] stmts 'end' ] ';';

This rule would have a single action — plus more for names and stmts — and it could not handle the timing of the five steps as described above. Instead:

head: 'function' Name;
names: Name [{ ',' Name }];
parms: '(' [ names ] ')';
vars: 'var' names ';';
block: 'begin' [ vars ] stmts 'end';
fun: head parms [ block ] ';';

This amounts to the same rule but the steps can now be assigned to different actions, here in the order in which they will happen:

rule action return
head [inherited] create function description, set funct symbol
names [inherited] declare parameters in funct.locals count
parms call setParms() to layout frame
names [inherited] declare variables in funct.locals count
stmts [inherited] generate code
block nothing to do
fun [inherited] call funct.end() or funct.undo(), pop funct

It turns out that there is only one new action: parms() calls funct.setParms() to record the number of parameters and layout the function's frame.

Tracing in example 7/06 confirms the order:

  • Enter only function main(a) begin var b; print 1 end; into the .
  • Press to represent and check the grammar,
  • toggle , and
  • press .

Here is the redacted output:

head('function', 'main') returns
  { function main start 0 parms undefined size 0 [  ] }
names('a', null) returns 1
parms('(', [ 1 ], ')') returns undefined
names('b', null) returns 1
...
fun(...) returns undefined
prog(...) returns (memory, steps) => StackMachine(memory, steps)

Everything is in place, we have reached the top-level rules of the grammar which remain unchanged from the previous version of the little language:

prog: [ vars ] funs;
vars: 'var' names ';';
names: Name [{ ',' Name }];
funs: { fun };

The actions remain unchanged, too. At the top level, funct is null; therefore _dcl() defines the variable names in the global symbol table.

Example 7/08 contains mutually recursive functions just as in example 7/04, but this time one version of Euclid's algorithm takes advantage of function parameters.

Tracing

It is often useful to see how the frames are laid out, and what they contain when the program executes, as well as the machine instructions before and during execution.

In example 7/06:

  • Add var trace; before the definition of euclid in the to define the global variable.
  • As usual, press to represent and check the grammar, and
  • press to compile the executable and show the extra information:
> run = g.parser().parse(program, actions)
0: memory => this.Entry(2, 5)(memory)
1: memory => this.LoadFP(0)(memory)
2: memory => this.LoadFP(1)(memory)
3: memory => this.Gt(memory)
  ...
30: memory => this.Entry(0, 3)(memory)
31: memory => this.Input(36)(memory)
32: memory => this.Input(54)(memory)
  ...
42: memory => this.Call(30)(memory)
43: memory => this.Print(1)(memory)
trace at 0
function euclid start 0 parms 2 size 5 [ x y ]
function main start 30 parms 0 size 3 [  ]
stack starts at 1
execution starts at 42
(memory, steps) => StackMachine(memory, steps)
  • press to execute. Global variables such as trace are initialized to 0, i.e., execution will be traced:
> run()
0:[ 0 ]
0:[ 0 43 ] 42: memory => this.Call(30)(memory)
0:[ 0 ] 1:[ 43 0 0 ] 30: memory => this.Entry(0, 3)(memory)
0:[ 0 ] 1:[ 43 0 0 36 ] 31: memory => this.Input(36)(memory)
0:[ 0 ] 1:[ 43 0 0 36 54 ] 32: memory => this.Input(54)(memory)
0:[ 0 ] 1:[ 43 0 0 36 54 34 ] 33: memory => this.Call(0)(memory)
0:[ 0 ] 1:[ 43 0 0 ] 4:[ 36 54 34 1 0 ] 0: memory => this.Entry(2, 5)(memory)
  ...
0:[ 0 ] 1:[ 43 0 0 38 18 ] 28: memory => this.Exit(2)(memory)
0:[ 0 ] 1:[ 43 0 0 18 ] 29: memory => this.Return(memory)
0:[ 0 ] 1:[ 43 0 18 18 ] 38: memory => this.StoreFP(2)(memory)
0:[ 0 ] 1:[ 43 0 18 ] 39: memory => this.Pop(memory)
0:[ 0 43 18 ] 40: memory => this.Exit(0)(memory)
0:[ 0 18 ] 41: memory => this.Return(memory)
18
0:[ 0 ] 43: memory => this.Print(1)(memory)
0:[ 0 ]

The output shows the machine instructions as they are executed, preceded by the frames. First, there is the global frame (line 2 above), main() is called (line 3), there are two frames, and the input values 36 and 54 are pushed onto the stack (lines 5 and 6). Then the call to euclid() happens (line 7) and there is a third frame which starts at location 4 in memory (line 8). Eventually, euclid() exits (line 10), the result value 18 is left on the stack (line 11), is assigned as result of main() and popped (lines 12 and 13), main() exits and leaves 18 on top of the stack (lines 14 and 15), ready to be printed.

Example 7/08 contains several functions and, therefore, has a much more interesting trace...

Block Scopes

In this section the little language with local variables will be extended with block scopes for the variables. Changes to the grammar can be seen on this page, new stack machine and action methods can be seen in the method browser.

At compile time a local scope is pushed on top of a global scope to implement shadowing. So far, for the little language with local variables, shadowing means that parameters and local variables owned by a function hide global variables and function names. The scope where a name is visible is either the enclosing function or the entire program.

Languages like JavaScript also have block scopes, i.e., a name is visible only within the block which contains the definition. In example 7/09 some variables are declared before use and can be shadowed within blocks:

var trace;
function main (x) begin
      var a;    print x; x = 1; print x;
begin var b;    print x; x = 2; print x end;
begin var c, x; print x; x = 3; print x; main = 99 end;
begin var d, x; print x; x = 4; print x end;
                print x
end;

The output is produced as follows:

line code scope output
1 var trace; global
2 function main (x) main() parameters
begin main() block
3 var a; print x; 0
x = 1; print x; 1
4 begin var b; print x; first inner block 1
x = 2; print x end; 2
5 begin var c, x; print x; second inner block 0
x = 3; print x; main = 99 end; 3
6 begin var d, x; print x; third inner block 3
x = 4; print x end; 4
7 print x main() block 2
8 end; global 99

The first few lines of output are no surprise. trace is a global variable, initialized to 0 (program line 1 above). x is declared as parameter of main() and the startup code sets the corresponding argument to 0. begin starts the scope for local variables of main() (line 2). x is printed, set to 1, and printed again (line 3).

The first inner block does not declare another x (line 4); therefore, the parameter x is not shadowed. It is printed, set to 2, and printed again.

The second, parallel, inner block declares a new x which shadows the parameter x (line 5). The local variable region of the frame is initialized to 0; therefore the inner x prints as 0, is set to 3 and printed again.

The third, again parallel, inner block declares another x which shadows the parameter x (line 6). In a parallel block it shares memory (by location, not by name); therefore this x prints as 3, is set to 4 and printed again.

Back in the scope for local variables of main() the parameter x is still 2 and is printed as such (line 7).

Deeply nested in the second inner block the result of main() was set to 99 (line 5) and that is printed by the startup code.

Example 7/09 demonstrates shadowing: there are exactly two memory slots for various versions of x. The inner two declarations for x are in parallel blocks; therefore, the two x share one memory slot.

  • As usual, press to represent and check the grammar, and
  • press to compile the executable and show the extra information provided because there is a global variable trace:
> run = g.parser().parse(program, actions)
block [ b at +5 ]
block [ c at +5, x at +6 ]
block [ d at +5, x at +6 ]
block [ a at +4 ]
block [ x at +0 ]
0: memory => this.Entry(1, 7)(memory)
  ...
trace at 0
function main start 0 parms 1 frame size 7
stack starts at 1
execution starts at 36
(memory, steps) => StackMachine(memory, steps)

The global variable trace is at memory location 0 (line 9 above). Each block layout is displayed when recognition of the block is complete. The first inner block contains variable b at offset 5 in the frame for main() (line 2). Each of the next two inner blocks contains a variable x at offset 6 in the frame for main() (lines 3 and 4). The main() block contains variable a at offset 4 in the frame for main() (line 5). Finally, the parameter x is at offset 0 in the frame for main() (line 6).

All the blocks belong to the frame for main(). The highest offset in the frame for main() is 6 (lines 3 and 4); therefore, the frame size is 7 (line 10).

Note that the variables b, c, and d are declared first in the parallel inner blocks; therefore they share offset 5 in the frame for main() (lines 2 to 4).

  • What changes if the order of d and x is interchanged in program line 6?

Example 7/10 has more names, more functions, several frames, and more scopes:

var a, b, c;
function f (x);
function d (d) begin print 100, a, d, x end;
function e (a) begin print 200, a, b end;
function f (a) begin
  a = 1; b = 2; c = 3; d(4); e(5);
  begin var a, b; 
    a = 10; b = 20; 
    print 300, a, b, c; d(40); e(50)
  end;
  print 400, a, b, c  
end;
function main () begin c = 5; f(6) end;
$ /^%% grammar/,/^%% actions/ | 07/09.eg
Seven.Blocks09
  • Prepare the grammar and compile.
  • There is one error. Repair it by globally defining the missing variable and a variable trace.
  • Recompile and explain how the listing
> run = g.parser().parse(program, actions)
block [  ]
block [ d at +0 ]
block [  ]
block [ a at +0 ]
block [ a at +4, b at +5 ]
block [  ]
block [ a at +0 ]
block [  ]
block [  ]
  ...
a at 0
b at 1
c at 2
x at 3
trace at 4
function f start 15 parms 1 frame size 6
function d start 0 parms 1 frame size 4
function e start 8 parms 1 frame size 4
function main start 55 parms 0 frame size 3
stack starts at 5

corresponds to the following frame layouts:

frame +0 +1 +2 +3 +4 +5
global a b c x trace
d() d old pc old fp result
e() a old pc old fp result
f() a old pc old fp result
a b
main old pc old fp result
  • Remove trace, compile and run the program, and explain the output:
> run()
100 0 4 0
200 5 2
300 10 20 3
100 0 40 0
200 50 2
400 1 2 3
0
0:[ 0 2 3 0 ]

Example 7/11 contains a recursive version of Euclid's algorithm with a global trace and a rather risky use of nested blocks.

var trace;
function euclid (x,y) begin
  if x > y then 
    var gt; gt = euclid(x-y, y) 
  else 
    var lt;
    if y > x then lt = euclid(x, y-x)
    else return x
    fi
  fi;
  begin var common; print common; euclid = common end
end;

function main (x, y) begin
  x = input 54; y = input 36;
  return euclid(x, y)
end;
$ /^%% grammar/,/^%% actions/ | 07/09.eg
Seven.Blocks09
  • Compile the program and confirm that the frames are laid out as follows:
frame +0 +1 +2 +3 +4 +5
global trace
euclid() x y old pc old fp result
gt
lt
common
main() x y old pc old fp result
  • Run the program and confirm that it stacks a maximum of five frames, right when 18 is returned as the greatest common divisor of 36 and 54:
0:[ 0 ] 
  1:[ 54 36 54 0 0 ]
    6:[ 54 36 46 1 0 0 ] 
      12:[ 18 36 10 6 0 0 ]
        18:[ 18 18 22 12 18 0 18 ] 26: memory => this.StoreFP(4)(memory)
  • Where is the risky use of nested blocks -- why does this program produce correct results?

Grammar Modifications

There are a few changes to the grammar:

prog: [ vars ] funs;
vars: 'var' names ';';
names: Name [{ ',' Name }];
funs: { fun };
fun: head parms [ block ] ';';
head: 'function' Name;
parms: '(' [ names ] ')';
block: begin [ vars ] stmts 'end';
begin: 'begin';

stmt: assign | print | return | block | loop | select;
loop: While cmp Do [ vars ] stmts 'od';
select: 'if' cmp then [ else ] 'fi';
then: Then [ [ vars ] stmts ];
else: Else [ vars ] stmts;

As before, global variables can be defined with a vars phrase prior to functions (lines 1 and 2 above). They can be accessed wherever they are not shadowed.

As before, parameters can be declared with a names phrase enclosed in parentheses following a function name in a function declaration or definition (lines 5 and 7). They can be accessed throughout the function body as long as they are not shadowed. Parameters themselves shadow global variables.

As before, a block is the body of a function (line 5); however, it can also be a statement, i.e., blocks can now be nested (line 11).

A block delimits the scope of the variables defined by its vars phrase (line 8) and entered into a symbol table for the scope by the names() action (line 3). They will shadow all, global variables, parameters of the encompassing function, and variables in encompassing blocks — this is the semantics of JavaScript's let, rather than var.

Similarly, variables can also be defined local to a loop body (line 12) or the then or else branch of a select statement (lines 14 and 15).

Obviously, actions will have to create and discard the symbol tables which represent the scopes and it is tempting to assign these responsibilities to the vars and parms rules to create as few scopes as possible. However, a symbol table must exist exactly while the closest scope within a program is recognized which contains the names phrase, i.e., the actions for the fun, block, loop, then, and else rules are involved in managing these symbol tables.

These actions happen when recognition is complete, i.e., they will discard the symbol tables, if any, but each of the corresponding rules needs a hook where an action happens early enough to create a symbol table before the names() action enters names. The obvious hooks are the literals 'var' and '(' at the beginning of the vars and parms rules; however, vars is also used to define global variables (line 1) and the global symbol table symbols is allocated as a property of the actions class. Therefore, the actions for head, begin, Do, Then, and Else will have to arrange for new scopes.

Scope Management

Previously, a function description contained a Map .locals for parameter and local variable names and a property .size which determined the frame size and which _alloc() uses to assign new addresses. _dcl() stores new entries in funct.locals and _find() searched funct.locals before the global symbol table.

To implement block scopes, .locals has to refer to the top-most Map of a stack of symbol tables, one per nested scope, .size has to refer to the top-most address of a stack of addresses for new variables in a scope, and _find() has to search backwards through the stack of symbol tables before consulting the global symbol table. Together, these stacks describe one frame, i.e., they should be owned by the function description for the frame.

It helps to create a class Block for scope descriptions which contain a Map .locals for names defined in the scope and a property .size to use for new addresses:

  get Block () { return this.#Block ??= class {
      locals = new Map();    // maps names in block to descriptions
      size;                       // next variable address in block

      constructor (size) { this.size = size; }
      toString () // ...
    };
  }
  #Block;     

.size must be initialized when a new scope is created.

Previously, .locals and .size were implemented with getters and setters and they can be overwritten in an extended function description:

class Blocks09 extends Seven.Parameters06 {
  get Fun () { return this.#Fun ??= class extends super.Fun {
      frameSize = 0;        // because this.size is local to blocks            
      blocks;                // block stack, [0] is innermost block
      get locals () { return this.blocks[0].locals; }
      set locals (locals) {
        try { return this.blocks[0].locals = locals; }
        catch (e) { console.trace(e); throw e; } }
      get size () { return this.blocks[0].size; }
      set size (size) { return this.blocks[0].size = size; }
      
      constructor (owner, name) {        // creates outermost block
        super(owner, name);
        this.blocks = [ new this.owner.Block(0) ];
      }

.blocks[] is a list which represents the nested scopes (line 4 above). It is created and initialized for the scope of parameters when a function description is constructed (line 14). Parameter addresses start in the frame at 0.

To simplify searching, the list will contain the scopes in order from innermost to outermost. Therefore, .locals and .size can simply be delegated to block[0] (lines 5 to 10).

.frameSize is a new property because .size will not necessarily be both, the next new address and the total size of the frame (line 3).

      push () {           // add block, start in encompassing block
        this.blocks.unshift(
          new this.owner.Block(this.blocks[0].size)
        );
      }      

      pop () {         // remove block, maintain maximum frame size
        this.frameSize =
          Math.max(this.frameSize, this.blocks[0].size);
        if (this.owner.symbols.get('trace')               // trace?
              instanceof this.owner.Var)
          puts(this.blocks[0].toString());
        this.blocks.shift();
      }

      end () {                    // [extend] pop outermost block
        this.pop(); super.end();
      }
      // ...
    };
  }
  #Fun;     

There is no need to reset .frameSize if parameters are allocated during a forward declaration for a function because the function definition will later allocate the same amount of memory.

push() creates a new innermost scope (line 2 above) which will use new addresses at .size of the current scope (line 3).

pop() maintains .frameSize as the maximum .size reached by all nested scopes (lines 8 and 9), displays the definitions in the scope (line 12) if there is a global variable trace (lines 10 and 11), and discards the scope (line 13). At this point .size reverts to the next outer scope, i.e., the part of the frame used by the discarded scope will be reused.

Finally, end() is extended to discard the scope created by the constructor (line 17).

With the block stack in place, _find() can be extended to search innermost to outermost through the symbol tables (lines 5 to 8 below) before consulting the global symbol table (line 9):

  _find (name, report) {
    let sym;
    try {
      if (this.funct)                  // loop inner to outer block
        this.funct.blocks.forEach(block => {
          sym = block.locals.get(name);
          if (typeof sym != 'undefined') throw sym;
        });
      return sym = this.symbols.get(name);                // global
    } catch (sym) {                             // found in a block
      if (sym instanceof Error) throw sym;      // shouldn't happen
      return sym;
    } finally {
      if (report && !sym)
          this.parser.error(`${name}: undefined`);
    }
  }

Actions

Scopes manage absolute global and relative local memory addresses at compile time, frames manage memory at run time, i.e., instruction generation does not change for block scopes.

There are new actions for begin and block which call funct.push() and funct.pop() at compile time, respectively, to create and discard a scope for a block as a statement or function body.

Similarly, the actions for Do, Then, and Else are extended to call funct.push() and the actions for loop, then, and else are extended to call funct.pop() to create and discard their respective scopes, for example:

  // Do: 'do';
  Do () { this.funct.push(); return super.Do(); }

  //  loop: While cmp Do [ vars ] stmts 'od';
  loop (While, c, Do, v, s, o) {
    this.funct.pop(); super.loop(While, c, Do);
  }

  // ...
}

Nested Functions

In this section the little language with block scopes will be extended to permit nested function definitions. Changes to the grammar can be seen on this page, new stack machine and action methods can be seen in the method browser.

Nested function definitions provide another level of scopes and shadowing, e.g., to hide details and instrumentation of an algorithm from the actual use of the algorithm. Consider example 7/12:

function factorial (n) begin
  var f, calls;
  function factorial (n) begin
    calls = calls + 1;
    if n <= 1 then return 1 fi;
    factorial = n * factorial(n - 1)
  end;
  if n >= 0 then
    f = factorial(n);
    print calls;
    return f
  fi
end;

function main (n) begin
  while n <= 5 do
    print n, factorial(n);
    n = n + 1
  od
end;

The outer factorial() (line 1 above) accepts any value as an argument but only performs the requested computation for non-negative numbers (line 8). The number of calls to the inner factorial() (line 3) is counted (line 4) and reported (line 10), but the counter variable calls is hidden inside the outer factorial() (line 2).

The scope of calls demonstrates that nesting functions requires a modification to the management of frames for local variables as discussed earlier:

  • calls is not global, it is in the frame of the outer factorial().

  • calls is accessed from within each activation of the inner factorial().

  • Because of recursion, there can be many such inner activations stacked up, i.e., each inner activation needs a reference to the common outer frame.

global function recurse

Memory use evolves:

  • Initially there is the globals frame.

  • Once the outer factorial() is called and itself calls the inner factorial() there are two more frames.

  • If there are recursive calls from the inner factorial() to itself there are more and more frames stacked up.

All frames are chained on the dynamic link (at left) so that the code at the function exits can clean them up in reverse order.

Global variables have absolute addresses, determined at compile time.

Local variables have relative addresses, i.e., offsets within their frame, which have to be added to the base address where the frame begins at run time. The offset is determined at compile time but — as the graphic indicates — the base address depends on the call history.

If function definitions are nested, the inner function — i.e., frame — needs to know the base address of the outer frame.

This is known as the static link (at right). It points from an inner frame to the topmost frame for the outer function which the inner function is nested into — the "static" nesting at compile time which determines which outer names are visible.

The base address for a local variable can be found by following the static link for as many levels outward as the variable name is in a scope more outward at compile time, i.e., the deeper functions are nested the more address tracing is required to access more global variables at run time.

The elements of the static link are "somewhere" in each frame, i.e., above the function parameters if any. Therefore, just like there is a frame pointer register there has to be a register pointing to the beginning of the static link, and it, too, has to be saved when a function is called and restored on return.

An alternative implementation, the so called display, requires a single level of address tracing — at the cost of one memory slot per function nesting level in each frame.

The display is the stack of currently visible base addresses for a frame, i.e., it is a copy of all elements of the static link of the frame. A display pointer register is used to locate the bottom of the display. At the bottom of the display is the address of the topmost frame for the outermost function. The top of the display is the address of the display's own frame.

In summary, if parms is the number of parameters of a function and depth is the nesting level of the function, starting with 1 for global functions, the frame contains

register offset content
.fp +0 ... argument values for the parameters, if any
+parms the return address for the function,
i.e., the previous value of .pc
+parms +1 a slot for the dynamic link,
i.e., the address of the frame of the caller,
i.e., the previous value of .fp
+parms +2 a slot for the static link,
i.e., the address of the display of the caller,
i.e., the previous value of .dp
.dp +parms +3 a slot for the function result
+parms +3 +1 the end of the static link,
i.e., the address of the outermost
encompassing function's topmost frame
+parms +3 + ... parts of the static link,
i.e., the addresses of other encompassing
functions' topmost frames
+parms +3 +depth begin of the static link,
i.e., the address of the function call's frame,
i.e., the address which is also in .fp
+parms +3 +depth+ ... values for the local variables and stack

Global variables have absolute addresses, determined at compile time.

Local variables in the current frame can be addressed relative to the base address of the current frame in .fp of the stack machine.

Local variables in outer frames are addressed by using the nesting level of their function relative to the bottom of the display to retrieve the base address of the appropriate frame. Nesting levels for functions start at 1. They can be directly used as indices into the display because .dp is set to point to the function result slot at offset parms +3 which directly precedes the display.

The display itself is copied to the new frame when a function is called:

  • if the new nesting level is less than the current one, the new display is popped until it has a length equal to the new nesting level minus 1 — because only something that is visible at compile time can be called,

  • if the nesting level stays the same, only the top entry is popped — because something at the same nesting level can access the same things which the caller can, and it cannot access the caller itself,

  • otherwise, the nesting level can only increase by 1.

  • In all cases the current base address from .fp — which points to the current frame — is pushed onto the new display.

This section explains how nested functions are implemented in example 7/12 on top of the block scopes implemented in example 7/09.

Grammar Modifications

The grammar has to be modified so that function definitions can be nested:

block: begin body 'end';
begin: 'begin';
body: [ vars ] [ funs ] stmts;

loop: While cmp Do body 'od';
While: 'while';
Do: 'do';
select: 'if' cmp then [ else ] 'fi';
then: Then [ body ];
else: Else body;
Then: 'then';
Else: 'else';

A body contains variable definitions, if any, followed by function declarations and definitions, if any, followed by one or more statements (line 3 above). The names defined within the body are only visible within the body, i.e., within the functions and statements, unless shadowed.

A body is part of a block (line 1), i.e., a function definition or a block statement, and it is also part of a loop (line 5) and the then and else parts of a select statement (lines 9 and 10).

Display Implementation

Argument values are pushed onto the stack before a function is called. The Entry instruction at the beginning of each function has to initialize the frame as described above, given that parms is the number of parameters of the function, depth is the nesting depth, starting with 1 for global functions, and size is the memory requirement of the new frame:

class Machine13 extends Seven.Machine06 {
  Entry (parms, depth, size) {
    return memory => {
      const locals = size - parms - 4 - depth,   // local variables
        fp = memory.length - parms - 1,         // new frame's base
        dp = memory.length + 2;             // new display's bottom
                                     // push old-fp, old-dp, result
      memory.push(memory.fp, memory.dp, 0); 
      if (depth > 1) memory.push(           // push part of display
        ... memory.slice(memory.dp + 1, memory.dp + depth) );
      memory.push(fp);                     // push new frame's base
      if (locals)                    // push local variables if any
        memory.push(... Array(locals).fill(0));
      memory.fp = fp;                           // new dynamic link
      memory.dp = dp;                       // new display's bottom
    };
  }

The frame contains a slot for the function result, initialized to 0 (line 8 above). The initial part of the display is copied from the previous frame (lines 9 and 10) and the address of the new frame is added on top (line 11).

The Exit instruction at the end of each function no longer needs to know the number of parameters:

  Exit (memory) {

    const fp = memory.fp,        // current frame, i.e., @arguments
      dp = memory.dp;             // current display, i.e., @result
    memory.fp = memory[dp - 2];                   // restore old-fp
    memory.dp = memory[dp - 1];                   // restore old-dp
                                    // pop frame, push old-pc result
    memory.splice(fp, Infinity, memory[dp], memory[dp - 3]);
  }

It reverts to the previous frame and display pointers (lines 5 and 6 above) and removes all but the return address and result value from the stack as the Return instruction expects it (lines 8).

Parameters and local variables have to be addressed relative to the display pointer using two new instructions:

  /** `stack: ... -> ... frame[depth][addr]` */
  LoadDP (addr, depth) {
    return memory =>
      memory.push(memory[memory[memory.dp + depth] + addr]);
  }

Both, LoadDP and StoreDP, need to know the static nesting depth at which a variable is defined at compile time and the variable's offset in the frame. depth is used as an index into the display to obtain the appropriate frame's address which is combined with the offset (lines 4 and 10 above).

The display pointer simplifies displaying memory for a trace, i.e., toString() is overwritten in the Memory class:

  get Memory () {
    return this.#Memory ??= class extends super.Memory {
      dp = 0;                      // display (static link) pointer
      // frames[] is no longer used
      
      toString () {
        let fp = this.fp,                   // begin of (top) frame
          to = this.length,                   // end of (top) frame
          dp = this.dp,                              // static link
          output = []; 
        do {
          if (!dp) fp = 0;                          // global frame
          output.unshift(`${fp}:[ ${this.slice(fp, to).
            map(slot => this.mapSlot(slot)).join(' ')} ]`);
          to = fp;               // end and begin of previous frame
          fp = this[dp - 2];              // previous frame pointer
          dp = this[dp - 1];                // previous static link
        } while (to);
        return output.join(' ');
      }
    };
  }
  #Memory;
}

The current frame extends from the frame pointer to the end of memory (lines 7 and 8 above). The previous frame and display pointers can be obtained relative to the display pointer (lines 16 and 17). Output is collated in reverse order using unshift() (line 13).

Infrastructure

Nesting functions is a natural extension of block structure. However, chapter eight will show that the feature can cause a very significant complication for runtime memory management; therefore, it should be an optional part of the inheritance chain of the action classes for the little language versions. Nesting functions is implemented as a function which can be applied to extend an action class such as Blocks09 which supports block structure:

const Nest13 = superclass => class extends superclass {
  constructor (parser, machine = new Machine13()) {
    super(parser, machine);
  }
  // ...
};

The result of the function call Nest13(Blocks09) is a class. The function Nest13() overwrites and adds methods and properties of its argument class. In particular, by default it uses Machine13 to generate the stack machine (line 2 above).

funct references the current function description. If functions are nested, this description has to be the top entry of a stack so that recognition of a nested function can interrupt recognition of the encompassing functions:

  get funct () { return this.functs[0]; }
  set funct (sym) {
    if (sym) this.functs.unshift(sym);             // push function  
    else this.functs.shift();                       // pop function
  }

  get functs () { return this.#functs; }  // current function stack
  #functs = [ null ];

funct is overwritten as a getter where functs[0] describes the current function (line 1 above). The list is initialized with null to indicate that there is no current function (line 8). The stack is pushed with unshift() if a function description is assigned to funct (line 3) and popped with shift() if null is assigned (line 4).

Variable access uses new instructions; therefore, the class of variable descriptions has to be extended:

  get Var () { return this.#Var ??= class extends super.Var {
      load () {                         // [replace] load by depth
        if (!this.depth)                                  // global
          this.owner.machine.gen('Load', this.addr);
        else if (this.depth+1 != this.owner.functs.length)// nested
          this.owner.machine.gen('LoadDP', this.addr, this.depth);
        else                                               // local
          this.owner.machine.gen('LoadFP', this.addr);
      }
      store () {                       // [replace] store by depth
        if (!this.depth)                                  // global
          this.owner.machine.gen('Store', this.addr);
        else if (this.depth+1 != this.owner.functs.length)// nested
          this.owner.machine.gen('StoreDP', this.addr, this.depth);
        else                                               // local
          this.owner.machine.gen('StoreFP', this.addr);
      }
      toString () {
        // ...
      }
    };
  }
  #Var;

load() and store() use direct addressing (lines 4 and 12 above), addressing relative to the display (lines 6 and 14), or relative to the frame pointer (line 8 and 16), depending at what nesting depth the variable is defined by _alloc():

  _alloc (name) {
    if (this.funct)                        // create local variable
      return new this.Var(this, name, this.funct.size ++,
        this.funct.depth);
    else                                  // create global variable
      return new this.Var(this, name, this.size ++, 0);
  }  

For global variables, depth is set to 0 (line 6 above) Otherwise it is taken from funct (line 4).

The class of function descriptions has to be extended to account for the changes in frame layout:

  get Fun () { return this.#Fun ??= class extends super.Fun {
      depth;        // length of static link, 1 for global function

      constructor (owner, name) {   // sets depth from owner.functs
        super(owner, name);
        this.depth = owner.functs.length;        // functs[.. null]
      }

depth records the nesting depth, i.e., the current length of functs[] (lines 2 and 6 above),

setParms() is extended because the display pointer and the display will also be stored in the frame (lines 3 and 4 below):

      setParms () { // frame: parms, old-pc, old-fp, old-dp, result
        super.setParms();
        this.addr = this.size ++;         // insert slot for old-dp
        this.size += this.depth;         // leave slots for display
      }

storeOk() has to check all of functs[] to see if a function result is assigned within the function (line 2 below):

      storeOk () {            // [replace] consider outer functions
        if (this.owner.functs.some(f => f == this)) return true;
        this.owner.parser.error(`${this.name}: ` +
          `assigned to outside function`);
        return false;
      }

      store () {                        // [replace] consider depth
        if (this == this.owner.funct)                      // local
          this.owner.machine.gen('StoreFP', this.addr);
        else                                      // outer function
          this.owner.machine.gen('StoreDP', this.addr, this.depth);
      }

LoadFP can still be used to store a function result in the current function (line 9 above) but LoadDP is necessary to store the result of an encompassing function (line 12).

entry() and exit() have to be overwritten because the Entry and Exit instructions expect different arguments. More importantly, function definitions can appear at the beginning of a block, i.e., within a begin, loop, or select statement. Code which is generated during recognition of a function definition cannot easily be moved aside; therefore, nested function definitions have to be bypassed with branch instructions.

One branch before the Entry of the first in a sequence of function definitions is sufficient if it lands after the Exit and Return of the last function in sequence; i.e., entry() and exit() together have to code the branch and they can share information in the Block where the sequence of functions is defined:

      entry () {                   // [extend] make room for bypass
        if (this.depth > 1) {                    // nested function
                                  // remember where this is defined
          this.scope = this.owner.funct.blocks[0];
          if (typeof this.scope.bypass == 'undefined')
            this.scope.bypass =      // make room for bypass branch
              this.owner.machine.code.push(null) - 1;
        }
        super.entry();
      }

      exit () {               // [replace] uses depth, fixes bypass
        this.owner.machine.code[this.start] =
          this.owner.machine.ins('Entry', this.parms,
            this.depth, this.frameSize);
        this.owner.machine.gen('Exit');      // needs no parms info
        const end = this.owner.machine.gen('Return');
        if (this.scope)                    // need to repair bypass
          this.owner.machine.code[this.scope.bypass] =
            this.owner.machine.ins('Branch', end);
      }    

A bypass branch is only required for nested functions (line 2 above). entry() is called when funct still references the encompassing function; therefore, a reference to the Block which contains the description of the new function can be stored as property .scope of the new function description (line 4). The same Block contains the descriptions of all functions defined in a sequence in the same scope; therefore, a new property .bypass in this Block can be shared by all functions in the sequence:

  • If .bypass is undefined, entry() for the first function in the sequence allocates a slot for a branch and stores the address in .bypass (lines 5 to 7).

  • exit() for each function checks for the shared Block and inserts an appropriate branch instruction in the .bypass slot (lines 18 to 20).

The result is a single branch instruction preceding all function definitions in the same block and landing past the exit of the last of these function definitions.

One last extension: function definitions have to be checked at the end of every block (line 2 below), not only once globally:

      pop () {            // [extend] check for undefined functions
        this.owner._check_defs(this.locals);
        super.pop();
      }
    };
  }

Actions

The actions for block, loop, and else now receive fewer parameters than before, e.g.,

loop: While cmp Do [ vars ] stmts 'od';
  vs.
loop: While cmp Do body 'od';

The parser checks the argument count for actions; therefore, these actions have to be overwritten, for example:

  loop (While, cmp, Do, body, od) {
    super.loop(While, cmp, Do, undefined, undefined, od);
  }

Other than that, only one more action has to be overwritten:

  // head: 'function' Name;
  head (_, name) {
    let sym = this._find(name);
    try {
      if (sym instanceof this.Fun) {
        if (sym.depth >= this.functs.length) {  // same nesting level
          if (typeof sym.start != 'number') throw true;    // forward
          this.parser.error(`${name}: duplicate`);
        }                      // else define at deeper nesting level
      } else if (sym instanceof this.Var &&     // same nesting level
                   sym.depth >= this.functs.length - 1) 
        this.parser.error(`${name}: used as variable and function`);
      sym = this._dcl(new this.Fun(this, name));       // (re-)define
    } catch (e) { throw e;                        // shouldn't happen
    } finally {
      sym.entry();                // generate code for function entry
      return this.funct = sym;                         // in function
    }
  }

head() defines a new Name as a function (line 13 above), generates code at function entry (line 16) and pushes the description as the new value of funct (line 17). If a prior definition for the Name can be found (line 3) it has to be investigated:

  • The description of a function declaration at the same nesting level (line 6) without a start address (line 7) can be used to define the function,
  • otherwise it is a duplicate name (line 8).
  • A more global function description can be shadowed by a new local definition.
  • A variable definition at the same level is an error (lines 10 to 12).

If an error is found a local definition is created to continue recognition (line 13).

_find() searches for a name beginning with the innermost Block outward through all nested scopes (line 6 below) and functions (line 4) and ending with the global symbol table (line 11):

  _find (name, report) {
    let sym;
    try {
      this.functs.forEach(funct => {   // loop inner to outer funct
        if (funct)                     // loop inner to outer block
          funct.blocks.forEach(block => {
            sym = block.locals.get(name);
            if (typeof sym != 'undefined') throw sym;
          });
      });
      return sym = this.symbols.get(name);                // global
    } catch (sym) {                             // found in a block
      if (sym instanceof Error) throw sym;      // shouldn't happen
      return sym;
    } finally {
      if (report && !sym)
          this.parser.error(`${name}: undefined`);
    }
  }

forEach() can be used because scopes and functions are stored innermost first. throw is used to abort the search as soon as a match is found (line 8) and a catch clause converts the throw into a return (line 14). It is only prudent to rethrow any errors caught in this construction (line 13). Finally, an error is reported if the name cannot be found and a report was requested (lines 16 and 17).

Examples

Example 7/13 shows a function which uses an inner helper function to compute the factorial for non-negative numbers. There is significant shadowing:

  • Add a global trace variable to see which names belong to which scope.

  • Trace execution to see which variable is accessed using the display and when.

Example 7/14 is yet another variant of Euclid's algorithm with a helper function which returns zero for invalid arguments. Again, there is shadowing.

Example 7/15 is the variant of Euclid's algorithm with mutually recursive functions studied in example 7/08, modified to take advantage of nested functions.

Example 7/16 is a variant of example 7/10 where some functions are nested.

  • Add a trace variable to see which names belong to which scope.

  • Compare to the previous output (below) and explain the difference:

    100 1 4 0
    200 5 2
    300 10 20 3
    100 1 40 0
    200 50 2
    400 1 2 3
  • Delete the first assignment to c and explain the difference in output.

Example 7/17 is a puzzle with nested functions.

  • Explain the output — and then add a variable definition to obtain the expected output:
    expected actual
    3 -1 -13 -11
    4 -2 4 -2

Example 7/18 demonstrates that inner functions can set the result of outer functions which they are nested into (line 4 below) — function names do act as assign-only variables:

function main () begin
  function a (x) begin
    function b (x) begin
      function c (x) begin c = x + 1; b = x + 2; a = x + 3 end;
      print c(100 + x)
    end;
    print b(10 + x)
  end;
  print a(5)
end;

Finally, example 7/19 is another collection of semantic errors:

function main ();
function a ();
function a (dup, dup) begin var dup; dup = 1 end;
function a (x) begin var y; y = 1 end;
function f () begin var g;
  function undef ();
  function main (x) begin return y end;
  function g () begin print 2 end;
  f = 3; g = 4
end;
  • a duplicate parameter dup (line 3 above)
  • a mismatch in parameter counts (lines 2 and 3)
  • a duplicate function definition a() (line 4)
  • an undefined variable y (line 7)
  • a name g used as a variable and nested function (lines 5 and 8)
  • an assignment to g outside the function g() because g was redefined as a function (line 9)
  • an undefined nested function (line 10)
  • there is a function definition for main() (line 7) but it is not global

Note that a local variable can shadow a parameter (line 3).

Quick Summary

  • A programming language can be strongly typed — to be checked at compile time — or dynamically typed like JavaScript — where all values are typed and operators convert as necessary.

  • Grammar rules can be type-aware; however, this approach to type checking fails unless there are very few mixed type operations in a language.

  • Type checking can be done during syntax analysis and code generation for a stack machine, mostly by checking argument types for machine instructions and returning types from the actions generating the instructions.

  • Functions require branch instructions which capture an address to return to. Recursive functions require stacking return addresses.

  • Argument values and return values for functions can be passed on the stack machine's stack.

  • The values of local variables for a recursive function have to be on the stack. Together with argument values, return value, return address, and other administrative information they are the activation record or frame of a function call.

  • Local variables and parameters are addressed relative to a frame pointer or activation record pointer which is used to locate the frame and which also has to be stacked during a call to another function.

  • The visibility of names and shadowing of global names by local names can be implemented by stacking symbol tables at compile time. Variables within the same frame in non-intersecting scopes can share memory but they are not always initialized.

  • Nesting functions at compile time requires a static link at run time which connects the visible nested scopes for variable access.

  • Addressing variables in encompassing scopes requires only a single level of indirection through the display which is a complete copy of the static link in each frame.

  • If a sequence of actions within a parent rule need to share information they can use a stacked context which the first action creates and the action of the parent rule deletes.

Previous: 6. Compiling Little Languages Next: 8. Functions as Values