Write a Compiler (in Python)

Upcoming Course Dates (Chicago):

• July 9-13, 2018.

Instructor: David Beazley

Price: $2500

Includes:

  • Breakfast and lunch
  • Course materials

Overview

Come to Chicago and shatter your brain by writing a compiler for a new programming language! In this workshop, you'll undertake an intense coding project comparable to that given to graduate computer science students in their first-year compilers course.

Compilers is often considered a capstone course for computer science majors. There is a reason for that--a compiler touches almost every topic of computer science ranging from mathematical theory to computer architecture. Writing a compiler is also an exercise in managing software complexity. Compilers have a lot of moving parts, involve unusual tools, are difficult to test, and are challenging to debug. You'll learn a lot by writing a compiler.

Target Audience

This class is for more experienced programmers who'd like to expand their skills by taking on a good challenge. If you're a self-taught programmer and you're curious about what writing a compiler is all about, then this is the course for you.

Instruction Format

This course is almost entirely project focused. A typical day might consist of an hour of discussion followed by 7-8 hours of coding. During coding, there is typically considerable group discussion about coding techniques, design tradeoffs, testing, and other related topics.

Prerequisites

Knowledge of Python programming basics is assumed. No background in compilers is required although awareness of common programming language concepts (e.g., type systems, functions, classes, scoping rules, etc.) is strongly advised. Some knowledge of regular expressions, computer architecture (machine instructions, memory, etc.), and prior use of a compiled language is also recommended.

Syllabus

The course is structured around the goal of creating a small programming language (currently based on Go) and to have it compile to executable programs via LLVM. The code produced by your compiler will have performance comparable to programs written in C. You'll also see how to make a Just-In-Time (JIT) compiler.

There are 8 project milestones:

  1. Lexing and tokenization. We start by writing a tokenizer for a simple expression language.
  2. Parsing and Abstract Syntax Trees (ASTs). Expressions are parsed into an AST data structure. This is done with the assistance of the SLY parser generator tool.
  3. Type checking. We write an static program analyzer that checks the source code for type errors and other semantic problems.
  4. Intermediate code generation. The AST is converted to an intermediate representation based on static single assignment (SSA).
  5. Code generation. The intermediate code is turned to runnable programs using LLVM. With the completion of this stage, students have the basic foundation of the complete compiler. The next project stages start to add new features.
  6. Relations and booleans. Relational operators and boolean tests are added so that control flow structures can be added.
  7. Control flow. Conditionals and loops are added to the language. Control-flow analysis, basic blocks, and other related concepts are introduced.
  8. Functions. User defined functions are added to the language.

Practical Takeaways

Although most programmers are unlikey to write a compiler in their day-to-day work, this course touches on a wide variety of practical topics that are applicable elsewhere. These include:

Are You Nuts?

Writing a compiler in only 5 days? Is it even possible? To be sure, compilers is often regarded as one of the most difficult CS courses that one can take. If you take it at a University, you'll probably get a professor who will take you through the infamous Dragon Book, spend a lot of time doing mathematical proofs (e.g., deriving the LALR(1) parsing algorithm), and make the focus of the course on preparing graduate students for future research in programming languages. I have taught that class. This is NOT that class.

Instead, this is a compilers course aimed at practioners. As such, the main focus is on coding and software development. Yes, you will learn about important core compiler concepts such as regular expressions, context free grammars, type systems, and programming language semantics. However, instead of doing mathematical proofs involving parsing theory, we'll focus on how you would actually go about implementing a parser and doing things such as writing unit tests for it.

To be sure, you will write a lot of code in this course. The course runs for more than 40 hours over five days. The final completed project consists of approximately 3000 lines of Python and is every bit as involved as the project you typically find in a college-level compilers course for computer science majors. However, this course is structured for success. Most participants finish the final project.

About the Instructor

This class is led by David Beazley. Although most known for his work in the Python community, Dave was formerly a tenure-track assistant professor in the Department of Computer Science at the University of Chicago where he taught a Compilers course along with a variety of other topics in systems and programming languages. Dave is also the creator of the PLY and SLY parsing tools for Python.