How ChocoPy uses Python and RISC-V to teach compiler creation

ChocoPy uses a modern, well-known language and platform for computer science instruction, avoiding legacy cruft from aging CISC platforms.

How Python became the fastest growing programming language in the world

While high-level languages like C++, Go, and Python immeasurably simplify programming on a day-to-day basis, these are fundamentally necessary abstractions—processors fundamentally only work in assembly, with compilers required to reduce high-level source code to assembly. While the ability to write in assembly is less a practical requirement on modern hardware, the ability to read assembly—and a practical understanding of compilers—can immeasurably help programmers understand the inner workings processors, and use that information to write more efficient programs.

ChocoPy, a restricted subset of Python 3, was designed for use in CS164 (Programming Languages and Compilers) at UC Berkeley. While ChocoPy is a subset of Python, the designers used a light touch—the goal was for the language "to be expressive enough to write non-trivial programs in," according to the project whitepaper. It also mentioned that, "In particular, we wanted to support an object-oriented paradigm with sufficient complexity to illustrate important nuances of static type checking and efficient code generation." 

SEE: Getting started with Python: A list of free resources (TechRepublic)

Python was used as the basis as the creators, "wanted to use a language whose syntax, type-checking rules, and operational semantics were formally specified," as, "these concepts tie the theory component taught in class to practical aspects of compiler development." By using Python syntax, code editors with support for syntax highlighting should already support ChocoPy for maximum readability. 

RISC-V—specifically, RV32IM, is the target assembly language for ChocoPy. Using a modern assembly language is far more practicable for students outside the classroom. Z80, 6502, or 68K assembly that prior generations of programmers cut their teeth on are not relevant to modern computing, while x86 assembly is subject to decades of changes to graft in additional instructions. (All four, as CISC instruction sets, are less than optimal learning tools in terms of learning programming as opposed to learning an architecture.)

A web-based RISC-V simulator provides a test environment and step-through debugger. The simulator is written in Kotlin, and, "can be compiled to both JavaScript—for the Web GUI version—and to the JVM—for use by our Java-based auto-grader," according to the authors.

ChocoPy retains support for integers, booleans, strings, user-defined classes, lists (including nested lists), class inheritance and method overloading, and nested functions that can access non-local variables, according to the authors. The project was influenced heavily by COOL, an earlier project for teaching compiler construction.

Student reception to ChocoPy is quite warm. "Of the 15 survey respondents, 14 checked the positive sentiment, 'I'm getting insights into how real compilers work'while nine indicated that they loved ChocoPy. Zero respondents checked the negative statement, 'I don't like ChocoPy'" according to the authors. 

For more on Python in the enterprise, check out "How to write four million lines of Python: Lessons from Dropbox on using the programming language at scale" and "Python is eating the world: How one developer's side project became the hottest programming language on the planet" on TechRepublic.

Also see


Image: iStockphoto/DenisKot