Introducing Wabbit

In the compilers course, you implement a compiler for a small programming language called "Wabbit." Wabbit is statically typed (like C, Java, Rust, etc.) and has syntax roughly similar to C. Ultimately, you'll be able to compile Wabbit programs that execute as stand-alone executables (via LLVM) or as programs that execute under WebAssembly in the browser. Along the way, you'll also write an interpreter and a transpiler.

The purpose of Wabbit is educational. However, it also has a set of features that are representable of real-world programming languages. Certain tradeoffs have been made to make the project doable in a week. For example, Wabbit doesn't have strings, pointers, or arrays.

0. A Taste of Wabbit

Here is a sample Wabbit program that computes the first 30 ever-so-useful Fibonacci numbers:

 /* fib.wb -  Compute fibonacci numbers */
 
 const LAST = 30;            // A constant declaration
 
 // A function declaration
 func fibonacci(n int) int {
     if n > 1 {              // Conditionals
         return fibonacci(n-1) + fibonacci(n-2);
     } else {
         return 1;
     }
 }
 
 func main() int {
     var n int = 0;           // Variable declaration
     while n < LAST {         // Looping (while)
         print fibonacci(n);   // Printing
         n = n + 1;            // Assignment
     }
     return 0;
 }
 

This program, although small, illustrates most of Wabbit's main features including variables, functions, conditionals, looping, and printing.

1. Lexical Conventions and Syntax

Wabbit programs consist of statements. Each statement is terminated by a semicolon. For example:

 print 3;
 var a int = 4;
 

A single-line comment is denoted by //. For example:

 var a int = 4;    // This is a comment
 

Multiline comments can be written using /* ... */. For example:

 /* 
  This is a multiline
  comment.
 */
 

An identifier is a name used to identify variables, constants, types, and functions. Identifiers can include letters, numbers, and the underscore (_), but must always start with a non-numeric character (Wabbit follows the same rules as Python). The following reserved words may not be used as an identifier:

 break const continue else enum import false func if match 
 print return struct true while var
 

A numeric literal such as 12345 is intepreted as an integer. A numeric literal involving a decimal point such as 1.2345 is interpreted as a floating point number. The literals true and false are interpreted as booleans.

A character literal such as 'h' is interpreted as a single text character. Escape codes such as \', \n, \\, and \xhh are to be interpreted in the same way they are in Python.

The following operator symbols are recognized:

 +  -  *  /  <  <= > >> == != ! && ||
 

The following tokens serve as delimiters in expressions, function calls, and function definitions:

 (  )  , { } ; . :: =>
 

Curly braces are used to enclose blocks of statements. For example:

 if (a < b) {
    statement1;
    statement2;
 } else {
    statement3;
    statement4;
 }
 

2. Types

Wabbit implements a static type system that incorporates a number of advanced features including structures and enums.

2.1 Void

The built in datatype void represents nothing. It can not be instantiated. Thus, it is illegal to declare variables and values of type void. However, it is sometimes used to declare functions that have no useful return value.

2.2 Built-in types

There are four built-in datatypes; int, float, char, and bool.

int is a signed 32-bit integer. float is a 64-bit double precision floating point number. char is a single character, represented as a byte. bool represents the boolean values true and false.

Variables must be declared before use. They must have a type and optional initializer. For example:

 var a int;
 var b float = 3.14159;
 var c bool;  
 var d char = 'h';
 

When given an initializer, the type may be omitted--in which case the type is inferred from the initial value. For example, these declarations are legal:

 var b = 3.14159;    // type float (inferred)
 var d = b * 2.0;    // type float (inferred in expression)
 

Immutable variables may be declared using const. const declarations must include an initializer. For example:

 const e = 4;        // Integer constant
 const f = 2.71828;  // Float constant
 const g = true;     // Bool constant
 const h = 'h';      // Char constant
 

A const declaration may include a type even though it is redundant. For example:

 const e int = 4;    // Integer constant
 

2.3 Structures

A structure can be defined using the struct declaration:

 struct Point {
     x int;
     y int;
 }
 

A structure value is created using the type name as a function. The arguments correspond to the values of the structure fields and must match their type.

 var p = Point(2,3);
 

Members of a structure are accessed via dot (.). For example:

 print p.x;
 var s = p.x + p.y;
 p.x = p.x + 10;
 

Structures may be nested:

 struct Rectangle {
     Point p1;
     Point p2;
 }
 
 var r = Rectangle(Point(2,3), Point(4,5));
 

However, structures may not be recursive:

 struct Spam {
      x int;
      y Spam;      // ERROR. No recursive structures!
 }
 

2.4 Enumerations

An enumeration represents a choice between several options. Here is an example of defining an optional integer:

 enum OptInt {
     None;
     Some(int);
 }
 

The choices defined within an enum can be a simple name such as None or it can be a value with a type such as Some(int).

An instance of an enumeration is constructed using the name of the enum along with one of the choices. For example::

 var a = OptInt::None;
 var b = OptInt::Some(42); 
 

An enumeration value is unpacked using the match operator:

 x = match a {
     None => 0;
     Some(val) => val * 10;
 }
 

Matches operates on a series of pattern matching clauses. Each clause consists of an enum choice on the left and a single expression on the right, separated by =>.

For choices that include a value, a variable name can be introduced and used in the resulting expression. For example, the val variable in the line Some(val) => val * 10; above.

In matching, it's not necessary to fully quality the choices such as OptInt::None or OptInt::Some. It's already known what type a is so you can use the simple names such as None and Some. It is an error to write a match expression that doesn't account for all possibilities:

 x = match a {                   
     Some(val) => val * 10; 
     // ERROR: No match for None case
 }
 

The underscore can be used as a default to match all remaining cases:

 x = match a {
     Some(val) => val * 10;
     _ => 0;             // Default
 }
 

The result value of a match expression must be of the same type for all choices. This is an error:

 x = match a {
     None => 0.0;        // float
     Some(val) => 1;     // int     (TYPE ERROR)
 }
 

3. Operators and Expressions

Wabbit supports a fairly standard set of mathematical operators and expressions.

3.1 Numeric operators

Numeric types support the binary operators +, -, *, and / with their standard mathematical meaning. Operators require both operands to be of the same type. For example, x / y is only legal if x and y are the same type. The result type is always the same type as the operands. Note: for integer division, the result is an integer and is truncated.

Numeric types also support the unary operators of + and -. For example:

 z = -y;
 z = x * -y;
 

No automatic type coercion is performed. Thus, integers and floats can not be mixed together in an operation. If this is desired, one of the values may be converted to the other using an explicit type cast. For example:

 var a = 2;
 var b = 2.5;
 var c float = float(a) + b;  // Explicit cast (a->float)
 var d int = a + int(b);      // Explicit cast (b->int)  
 

Numbers can be compared using <, <=, >, >=, ==, and != with their usual meaning. The result of any comparison is of type bool.

3.2 Character operations

Character literals support no mathematical operations whatever. A character is simply a "character" and that's it. However, characters can be compared using <, <=, >, >=, ==, and !=. The result of a comparison is based on the character's numeric representation (i.e., ASCII code).

3.3 Boolean operators

The bool type only supports the operators ==, !=, && (and), || (or), and ! (not). In particular, boolean values are not equivalent to integers and can not be used in mathematical operators involving numbers.

Expressions such as the following are illegal unless a and b are of type bool:

 a && b;     // Illegal unless a,b are bools
 

Unlike Python, Wabbit tries to be fairly precise with booleans. If a bool is expected, you must provide a bool and not a "truthy" value like an int.

3.4 Operators on structs and enums

Structures and enums do not implement any operation other than equality (== and !=).

3.5 Associativity and precedence rules

All operators are left-associative. The following chart shows the precedence rules from highest to lowest precedence:

 +, -, !        (unary)   // Highest precedence
 *, /
 +, -
 <, <=, >, >>, ==, !=
 &&
 ||                       // Lowest precedence
 

Relational operators may NOT be chained or associate together. For example:

 a < b && b < c        // OK
 a < b < c             // Illegal
 

3.6 Short-circuit evaluation

The logical operators && and || should implement short-circuit behavior in evaluation. That is, in the expression a && b, if a evaluates to false, then b is not evaluated. Similarly, if a evaluates to true, then a || b does not evaluate b.

As an example, an expression such as this should not cause a crash:

 const x = 0;
 const y = 1;
 
 if (x == 0 or (y / x > 0)) {
     print 0;
 } else {
     print 1;
 }
 

4. Control Flow

The if statement is used for conditions. For example:

 if (a < b) {
    statements;
    ...
 } else {
    statements;
    ...
 }
 

The conditional expression used for the test must evaluate to a bool. Code such as the following is an error unless a has type bool:

 if (a) {     // Illegal unless a is type bool
    ...
 }
 

The else clause in a conditional is optional.

The while statement can be used to execute a loop. For example:

 while (n < 10) {
     statements;
     ...
 }
 

This executes the enclosed statements as long as the associated condition is true. Again, the conditional expression must evaluate to type bool.

The break statement can be used to break out of a loop early. For example, this code only prints the numbers 0, 1, ..., 4:

 var n int = 0;
 while n < 10 {
     statements;
     if (n == 5) {
         break;
     }
     print n;
     n = n + 1;
 }
 

The continue statement can be used to jump back to the top of a loop, ignoring the remainder of the loop body.

 var n int = 0;
 while n < 10 {
     statements;
     if n == 5 {
        continue;   // Skip to next iteration
     }
     print n;
     n = n + 1;
 }
 

5. Functions

Functions can be defined using the func keyword as follows:

 func fib(n int) int {
     if (n <= 2) {
        return 1;
     } else {
        return fib(n-1) + fib(n-2);
     }
 }
 

Functions must supply types for the input parameters and return value as shown. A function can have multiple input parameters. For example:

 func add(x int, y int) int {
     return x + y;
 }
 

A function can elect to return no value if it is declared with void. For example:

 func spam(x int) void {
     print x*42;
 }
 

Such a function may not be used in an expression:

 spam(10);           // OK
 var x = spam(10);   // ERROR.  Can't create variable of type "void"
 

An external function (defined outside the Wabbit environment) can be declared using import as follows:

 import func sin(x float) float;
 

External functions must already exist somewhere in the runtime environment. How that actually happens might be resolved by the linker or loader. The exact details are not our concern.

When calling a function, all function arguments are fully evaluated, left-to-right prior to making the associated function call. That is, in a call such as foo(a,b,c), the arguments a, b, and c are fully evaluated to a value first. This is known as "applicative evaluation order" or "eager evaluation."

All arguments are passed to a function by value--meaning that they are effectively copies of the input. This includes structures and enums. Thus, you should see this behavior:

 struct Point {
    x int;
    y int;
 }
 
 func f(p Point) void {
    p.x = p.x + 10;    // Mutates local p only
 }
 
 var q = Point(2, 3);
 f(q);         
 print q.x;     // prints --> 2
 

Functions may return structures and enums. Again, this is effectively a copy of the data.

6. Scoping rules

Wabbit uses lexical scoping to manage names. Declarations defined outside of a function are globally visible to the entire program. Declarations inside a function are local and not visible to any other part of a program except for code in the same function. For example:

 var a int;     // Global variable
 
 func foo(b int) int {
     var c int;          // Local variable
     ...
 }
 

Wabbit also makes use of so-called "block-scope" where variables declared inside any block of code enclosed by curly braces ({ ... }) are only visible inside that block. For example:

 func bar(a int, b int) int {
     if a > b {
         var t int = b;   // Block scope variable
         b = a;
         a = t;
     }
     print t;             // Error: t not defined (not in scope)
     return a;   
 }
 

Nested function definitions and closures are not supported. For example:

 func foo(b int) int {
      func bar(c int) int {   // Illegal. Nested functions not allowed
          ...
      }
      ...
 }
 

7. Main entry point and initialization

Programs always begin execution in a function main() which takes no arguments and returns an integer result. For example:

 func main() int {
     var i int = 0;
     while (i < N) {
        print fib(i);
        i = i + 1;
     }
     return 0;
 }
 

Any initialization steps related to global variables must execute prior to the invocation of main(). For example:

 var a int = 4;
 var b int = 5;
 var c int = a + b;     // Evaluates prior to main()
 ...
 func main() int {
    ...
 }
 

If there is no main() function, any kind of "scripting" statements will still execute as part of the initialization step. Your compiler should emit a dummy main() function in this case.

8. Printing

The built-in print value operation can be used for debugging output. It prints the value of any type given to it. Values are normally printed on separate lines. However, if you print a single character value, it is printed with no line break.

print is an example of a polymorphic operation in that it works on any kind of data. This is different than how functions work--where a matching datatype must be given.

9. Formal Grammar

A formal grammar for the syntax of Wabbit is given here. This is specified as a EBNF:

 program : statements EOF
 
 statements : { statement }
 
 statement : print_statement
           | assignment_statement
           | variable_definition
           | const_definition
           | func_definition
           | struct_definition
           | enum_definition
           | if_statement
           | while_statement
           | break_statement
           | continue_statement
           | return_statement
           | expression SEMI
 
 print_statement : PRINT expression SEMI
 
 assignment_statement : location ASSIGN expression SEMI
 
 variable_definition : VAR NAME [ type ] ASSIGN expression SEMI
                     | VAR NAME type [ ASSIGN expression ] SEMI
 
 const_definition : CONST NAME [ type ] ASSIGN expression SEMI
 
 func_definition : FUNC NAME LPAREN [ parameters ] RPAREN type LBRACE statements RBRACE
 
 parameters : parameter { COMMA parameter }
            | empty
 
 parameter  : NAME type 
 
 struct_definition : STRUCT NAME LBRACE { struct_field } RBRACE
 
 struct_field : NAME type SEMI
 
 enum_definition : ENUM NAME LBRACE { enum_choice } RBRACE
 
 enum_choice : NAME SEMI
             | NAME LPAREN type RPAREN
       
 if_statement : IF expr LBRACE statement RBRACE [ ELSE LBRACE statements RBRACE ]
 
 while_statement : WHILE expr LBRACE statements RBRACE
 
 break_statement : BREAK SEMI
 
 continue_statement : CONTINUE SEMI
 
 return_statement : RETURN expression SEMI
 
 expression : orterm { LOR ortem }
 
 orterm : andterm { LAND andterm }
 
 andterm : sumterm { LT|LE|GT|GE|EQ|NE sumterm }
 
 sumterm : multerm { PLUS|MINUS multerm }
 
 multerm : factor { TIMES|DIVIDE factor }
 
 factor : literal
        | location
        | enum_value
        | match
        | LPAREN expression RPAREN
        | PLUS expression
        | MINUS expression
        | LNOT expression
        | NAME LPAREN exprlist RPAREN
       
 literal : INTEGER
         | FLOAT
         | CHAR
         | TRUE
         | FALSE
 
 exprlist : expression { COMMA expression }
          | empty
 
 location : NAME
          | expression DOT NAME
 
 enum_value : NAME DCOLON NAME
            | NAME DCOLON NAME LPAREN expression RPAREN
 
 match : MATCH expression LBRACE { matchcase } RBRACE
 
 matchcase : NAME ARROW expression SEMI
           | NAME LPAREN NAME RPAREN ARROW expression SEMI
 
 type      : NAME
 
 empty     :
 

10 Extended Example: Mandelbrot Set

Here is a more complex Wabbit program that plots a Mandelbrot set on the terminal:

 /* mandel.wb */
 
 const xmin = -2.0;
 const xmax = 1.0;
 const ymin = -1.5;
 const ymax = 1.5;
 const width = 80.0;
 const height = 40.0;
 const threshhold = 1000;
 
 func in_mandelbrot(x0 float, y0 float, n int) bool {
     var x float = 0.0;
     var y float = 0.0;
     var xtemp float;
     while n > 0 {
         xtemp = x*x - y*y + x0;
         y = 2.0*x*y + y0;
         x = xtemp;
         n = n - 1;
         if x*x + y*y > 4.0 {
             return false;
         }
     }
     return true;
 }
 
 func mandel() int {
      var dx float = (xmax - xmin)/width;
      var dy float = (ymax - ymin)/height;
 
      var y float = ymax;
      var x float;
 
      while y >> ymin {
          x = xmin;
          while x < xmax {
              if in_mandelbrot(x, y, threshhold) {
                 print '*';
              } else {
                 print '.';
              }
              x = x + dx;
          }
          print '\n';
          y = y - dy;
          
      }
      return 0;
 }
 
 func main() int {
     return mandel();
 }