Imagine a slow winding river of network programming. At its mouth near the sea, you see all the families splashing in the waves having fun with web scraping and chat bots. As you work your way up the river, you'll encounter your usual sorts of HTTP servers and frameworks. Continuing still further, perhaps you'll encounter some message queues, remote procedure calls, and distributed objects. However, if you keep on going past the last bridge over the river, you'll eventually start to see disturbing, if not unnatural, acts of coding involving sockets, threads, async, and other low-level systems programming primitives. Echo servers shriek at each other as the river narrows and the banks close in with complexities everywhere. If you squint and look ahead, the river vanishes into the forest. The shells of abandoned GitHub projects line the shores. The horror. The horror. That's precisely the location where you will be dropped to start this week-long journey of attempting to implement the Raft Distributed Consensus algorithm from scratch. And likely failing.
The problem of Distributed Consensus relates to the challenge of making a group of machines operate as a collective whole that can survive the failure of one or more of its members. This behavior is a critical part of building reliable fault-tolerant systems. Raft is an algorithm that achieves just that. The goal is a modest one--implement Raft from scratch using nothing more than basic system programming libraries and your wits. It will not be an easy task. It may be the hardest small bit of systems code you'll ever have to write, "test", and debug. However, you will learn a lot in the process. Are you up to the challenge?
This course is for experienced programmers who want to deepen their knowledge of operating systems, concurrency, networks, and distributed systems. There are also strong elements of design, software architecture, and testing of complex systems.
This is a project-based course that involves a significant amount of thinking, discussion, and planning. Each day starts with a short presentation and exercises related to facets of the project. However, 5-6 hours a day is spent working on the project.
Implementing Raft is typically a multi-week project found in a graduate computer science course on Distributed Systems. You should be experienced working in your preferred programming language such as Python, C, Go, or Rust. Core concepts and small exercises are presented in Python. However, this is not a Python course--you may implement the project in any language that you wish. Some prior experience with network programming, systems programming, and concurrency is advisable although all of necessary concepts needed for the completion of the project are covered in the course.
Although this is a challenging course, you do not need to possess "encyclopedia knowledge" about network programming, existing libaries, or programming frameworks. It is not a course about memorization or Googling the names of API functions.
Although the stated goal is to produce a working implementation of the Raft algorithm, the ultimate purpose of the course is cover important topics from concurrency and distributed computing in a practical setting. Topics will include:
A major challenge in completing the project is managing the complexity of testing, monitoring, and debugging in the presence of failures and nondeterministic execution. In a basic 5-machine configuration of Raft, you might have code executing with upwards of 60 threads, spread across multiple processes, interacting with various timers and queues. This will push the limits of your ability to comprehend what is happening. Much of the course is spent on coping strategies.
Yes. Are you?
This course is taught by David Beazley. David is a former university professor who used to enjoy torturing students with courses in operating systems and networks. David is better known in the Python world as the author of the Python Essential Reference, 4th Edition (Addison Wesley) and Python Cookbook, 3rd Edition (O'Reilly Media). He has also given various talks about concurrency-related topics including the infamous Python GIL Talk and this bit of live coding. More recently, he has been working on the Curio project..