Our Experience Porting the YJIT Ruby Compiler to Rust

Last year, my team at Shopify implemented YJIT, a new Just-In-Time (JIT) compiler for CRuby, which was recently upstreamed as part of Ruby 3.1. Because the CRuby codebase is implemented in C99, we also decided to implement YJIT in C99 so that integration with the rest of the CRuby codebase would be as simple as possible. However, we found that implementing a JIT compiler in plain C quickly became tedious, and as we kept adding features to YJIT, we found that the complexity of our project became hard to manage.

Many people would tell you that the biggest burden when programming in C is worrying about buffer overflows and accidentally dereferencing null pointers. However, I think that what can make programming in C tedious on a day-to-day basis is that the C language doesn’t provide many tools to manage complexity. There are no modules or namespaces, so you have to prefix identifiers to avoid name collisions. You have to worry about the order of your declarations and adding prototypes in the right places. A lot of information is duplicated in various header files. Constants and macros use the C preprocessor, which can lead to strange bugs. There are no classes or interface types to cleanly encapsulate functionality, and there are no standard container types. We implemented our own dynamic array type, which we had to manipulate through awkward preprocessor macros with no type checking.

Compilers are some of the most complex pieces of software in existence, and I think there’s a fair argument that JIT compilers are even more complex (or at least harder to debug) than ahead-of-time compilers. With YJIT, we started by directly compiling the CRuby bytecode into x86 machine code. Still, we have plans to implement our own Intermediate Representation (IR) so that we can decouple the machine code generation from the frontend and add support for ARM64 platforms in addition to x86. Implementing a custom IR, adding that extra layer of indirection, and implementing it all in terms of C preprocessor macros, while manually managing dynamic memory allocations without something like interface types just seemed like too much. We felt like we were reaching the limits of what we could comfortably do in C.

Alan Wu, senior developer on the YJIT team, started to grow interested in Rust, and he suggested that we could port YJIT to this language. He was interested in Rust because of the strong type safety guarantees it offers. I quickly became interested in his proposal because I was confident that Rust would provide us with better tools to manage the growing complexity of YJIT. We were unsure if the CRuby core developer team would greenlight a move like this since there is no other Rust code in the CRuby codebase. Thankfully, they gave us the thumbs up in January, and our efforts to port YJIT to Rust began.

Besides Rust, we briefly considered other options, such as porting YJIT to Zig. This port would have been possible since YJIT has very few dependencies. However, we chose Rust because of its relative maturity and its large and active community. In this post, I want to give a nuanced perspective on our experience porting YJIT from C to Rust. I'll talk about the positives, but also discuss the things that we found challenging or suboptimal in our experience.

The YJIT team has six developers, with four people taking an active part in the Rust porting effort. I was one of the founding members and have been acting as team lead for 18 months. I’ve been programming for 24 years and got started by learning C++ in 1998. Since then, I’ve had some experience with a variety of systems languages including C, C++, and D. I’ve only been programming in Rust for four months, so a lot of what you’re going to read comes from the perspective of someone who’s relatively new and still learning things about the language.

The fact that YJIT has very few dependencies can be seen as a positive. However, our biggest dependency is that we must link and integrate with CRuby. YJIT is a relatively simple JIT compiler that totals about 11,000 lines of C code. The CRuby codebase is a large C99 codebase, close to 30 years old. The interface between the two is non-trivial. Among other things, YJIT needs to be able to parse CRuby’s bytecode and manipulate every primitive type in the Ruby language. Some of the APIs we use are internal and cannot be guaranteed to be stable over time.

Initial Impressions

YJIT is not a massive project, but compilers are complex enough that it’s easy to introduce subtle bugs that can be hard to track down. The strategy we decided to take for the port has been to try and translate the C code into Rust more or less directly. We did make various small improvements as we went along, but we refrained from making major architectural changes during the port. We commented out our C code and started porting functions and structs one by one, keeping the general structure as similar to the original as we could while doing so. The advantage of the strategy we took is that it made porting the code faster and less error-prone, but the downside is that we are not yet taking full advantage of Rust idioms.

Rust markets itself as a systems programming language, and my first surprise was that programming in Rust felt closer to writing ML code than writing C++. There are enough syntactic and semantic differences that the language feels like it’s not really in the same family as C and C++. I later learned that, interestingly, the first Rust compiler was actually written in OCaml, and so the cultural influence of ML-style languages must have been present from the beginning.

The Rust learning curve is relatively steep, and a lot of googling was required, but most of the YJIT team members started to get more comfortable about two to three weeks into the porting effort. Thankfully, because Rust has a large and active community, the documentation is plentiful and easy to find, making the learning process a lot smoother.

Rust being able to operate without a garbage collector is a major strength for a JIT compiler. Many people complain that the Rust borrow checker is hard to wrap your head around. Having done a PhD in compiler design and being very familiar with dataflow analysis, there weren’t any major surprises for me. Still, even though I think I understand how the borrow checker works quite well, I needed to learn the rules of the system as to how and where you’re allowed to borrow, and we still ran into situations where dynamic borrow checking with mutable RefCells was an issue.

Pattern Matching and Macros

One of the best features of the Rust language is the ML-inspired pattern-matching syntax. It’s simple, powerful, and meshes well with Rust’s enum (tagged union) and struct types.

The Rust macro system is a huge improvement over C preprocessor macros, in terms of both safety and ergonomics. The macros nicely reuse the Rust pattern-matching syntax so that you can define how code is generated in a way that feels fairly intuitive and natural.

We’ve made use of macros in various places. In particular, we wanted to have a way to have machine code generated by YJIT increment some statistics/profiling counters, but only when YJIT is compiled in dev (debug) mode. Among other things, we use these counters to track what causes YJIT to exit to the interpreter when running our benchmarks, and we use the data generated to decide what to optimize next.

The Build System

Cargo is the Rust build system and package manager. Our experience with it has been generally positive. It’s a nice tool. The system of optional features for conditional compilation works really well; it's a lot better than having a large number of preprocessor ifdefs in a C codebase. The ability to embed tests into the source code is also very nice.

To integrate with the CRuby codebase, we compiled our Rust project into a static library which we referenced in the CRuby makefile. The trickiest part was to get this to work from the C side with the CRuby linking process.

CRuby is distributed as a tarball and built from source. To make it easier to distribute and package CRuby while keeping it self-contained, it was important to us that YJIT could be built offline without accessing the internet and without downloading any crates from crates.io. Unfortunately, it has proven challenging to achieve that with cargo. There’s a cargo build —-offline switch that you can specify, but when we did, the cargo tool complained that it couldn’t access the internet, even though we were not using any external crates in our build. We opened a GitHub issue to report this. The response was that we should use cargo vendor and check the result into version control to work around this problem. This solution feels suboptimal. I feel like it should be possible to just build offline when you’re not building any features requiring external crates without needing to use cargo vendor. The solution that we converged on is to use rustc directly when building in release mode.

One of the main dependencies for YJIT is the platform C library. One thing that has been somewhat annoying for us is that you need to rely on crates.io to access the libc crate. It feels like that should be built into the default installation that you can use offline, but it’s not the case. What we ended up doing is defining our own helper functions in the CRuby codebase which we exposed on the Rust side.

Bindgen and the FFI

YJIT needs to integrate with CRuby, which is a large existing codebase. We interface with on the order of 140 C functions, 30 structs/unions, and 500 constants in a codebase that spans hundreds of source files. Bindgen is Rust’s tool for automatically exporting definitions from C. It can parse C header files and export definitions for functions, structs, enums, unions, and global variables. It can even parse some simple preprocessor constants.

Bindgen requires you to add regex-style patterns for the names of functions, structs and constants you want to export to an allowlist. Unfortunately, we ran into challenges with this system. For some patterns, it doesn’t find any definition and the export just silently fails with no error message. We couldn’t find a “verbose” mode to explain the import failures. Did bindgen fail to parse a header file? Did it not find the definition we asked for? Was it something else? The only thing we could do was to guess and try changing various settings. We opened a GitHub issue to report this problem but did not yet get a response as of the time of this writing.

Bindgen probably works fairly well for small projects or projects that are primarily in Rust with a minimal amount of C code. In particular, bindgen probably works best for creating Rust crates that interface with the public API of a C library. The definitions for such APIs are often contained in a small, clearly defined set of header files. Unfortunately, YJIT has to interface with a large existing C codebase, and many of the definitions we need are not part of a stable public API. We had to supplement bindgen with a list of manually written C bindings for all the things bindgen silently did not import.

Integer Types and Casting

Because we’re writing a compiler, we deal with a lot of integer math, and we essentially need to use every integer type that Rust provides: both signed and unsigned values, 8, 16, 32, and 64 bits wide. We also frequently need to perform operations involving integers of different types. Unlike C, Rust won’t automatically promote integer types to wider types. It forces you to manually cast any mismatching integer types for every operation. It also forces you to use the usize type (akin to C’s size_t) wherever you need to index into an array or slice.

It seems to me that the way Rust handles integer casting leaves something to be desired, and I know I’m not the only one who feels this way as there have been discussions pertaining to these issues dating up to several years back. It can be frustrating for programmers, because a priori, there's no reason why you couldn’t safely promote a u8, u16, or u32 into a usize, and asking programmers to manually cast integers everywhere makes the code more noisy and verbose.

In my opinion, Rust’s insistence on manual casting everywhere encourages people to write inefficient code, because writing verbose code feels uncomfortable and adds friction. You can make your code less verbose by reducing the number of integer casts, and you can reduce the number of casts by using the widest integer types possible everywhere. If you do that, your code will superficially look nicer, but it will also be less efficient.

In many cases, you can probably afford to use 64-bit integers instead of 8-bit integers. We’re talking about mere bytes of space savings, right? Well, maybe not. We care because JIT compilers can allocate tens or even hundreds of millions of objects, and the smaller these objects are, the better they fit into the data cache. Compactness matters for performance, and it will still matter for as long as processors have caches and limited memory bandwidth. By reducing the friction around integer casts, Rust could actually help programmers write more efficient code.

Cyclic Data Structures

As is pretty typical for any kind of optimizing compiler, YJIT needs to manipulate a cyclic data structure that represents a Control Flow Graph (CFG). This data structure needs to be allocated and modified on the fly, which has been a bit tricky because the borrow checker doesn’t allow cycles. Rust offers various mechanisms such as RefCell and Rc (the reference-counted type, a kind of smart pointer) to allow multiple objects to collectively own and mutate another object. Rust calls this “interior mutability”, and the documentation says it’s “something of a last resort”. It’s available but not exactly ergonomic to use. This is a known issue in Rust.

The Rc and RefCell types would likely work fine for a compiler that was written purely in Rust and compiles code ahead of time instead of just in time, but we ran into a subtle bug with our reference-counted memory management because we generate machine code that has to keep a reference to blocks in the CFG. Eventually, we will implement a Garbage Collector (GC) for machine code that interfaces with the Ruby GC, which makes questions of ownership even more complicated. We concluded that the best way to do what we wanted was to use the Box type to manage our CFG. Box is the type Rust provides to manage memory allocations manually. Sometimes there’s just no other choice but to manage memory manually. Thankfully, Rust provides tools for that too.

String Manipulation

String manipulation in Rust is a clear improvement over C. Manipulating strings in C is a minefield in terms of potential buffer overflows and other memory-safety bugs. Rust can automatically manage the memory allocation and deallocation, greatly reducing the risk of buffer overflows, out-of-bound accesses or memory leaks, and provides more string manipulation functions than the C standard library provides.

There are many good ideas in the way Rust does string manipulation. What seems less optimal is that there are many different string types in Rust. There’s the owned string type String, and there’s the unicode string slice type &str. Because we’re interfacing with a C codebase, we also need to deal with the owned CString type, its borrowed counterpart &CStr, and of course, raw c_char pointers. Rust string operations require different combinations of string types, and converting between these different types is not always straightforward. Each time I want to do something with Rust strings, I need to browse many pages of documentation and look for the right formula.

This is how to convert a C string to a Rust string according to Stack Overflow:

Having done string tokenization and parsing in C, I think Rust’s approach is an improvement. However, it’s definitely an area where Rust prioritizes safety above ergonomics and user-friendliness, which seemed like a recurrent theme for me while learning Rust.

Unsafe Everywhere

Rust has strict rules about aliasing, mutability, and type safety. These rules are there to reduce the risk of crashes, deadlocks, and security vulnerabilities and to enable the Rust compiler to perform various optimizations. This gets tricky when it comes to interfacing with C code or doing some of the memory manipulation that a JIT compiler has to do.

In Rust, calls to a C function and accesses to exported C globals need to be wrapped in an unsafe block. Raw pointer manipulations also need to happen in an unsafe block. The unsafe blocks act as a trap door into a universe where the rules of the Rust type system aren’t enforced. What was a little frustrating for us was that we needed to do a lot of C calls, and these calls often coincide with pointer manipulation, so we needed to wrap hundreds of short snippets of code into unsafe blocks.

At some point, the unsafe blocks everywhere feel like visual noise. Why do I need to wrap every C function call into an unsafe block? The Rust compiler knows I’m calling a C function and that this function doesn’t follow the Rust typing rules. Am I really telling the compiler anything by wrapping each individual C function call into an unsafe block? A C function call is by definition “unsafe”, I shouldn’t have to tell the Rust compiler that. Having to write unsafe every time I call a C function seems like it adds unnecessary friction. A constant reminder that the Rust compiler is silently judging me for relying on C.

Rust offers many trap doors to work around the type system. There are unsafe blocks and the unsafe integer “as” cast without bounds checking. Many Rust types, like Rc, also have methods such as into_raw and from_raw. This can feel inconsistent and strange. On the one hand, you have a language with a sometimes painfully strict type system and a compiler that issues warnings about “incorrect” code style, but you also have a wide assortment of ways to tell the compiler to hold your beer so you can selectively bend and potentially break the safety assumptions the compiler makes whenever you want. Rust is perfectly safe as long as you don’t do anything unsafe. To be completely sure that you’re not doing anything unsafe requires extensive knowledge of the assumptions the Rust compiler makes. If C++ is a chainsaw, then Rust is an electric nail gun that comes with safety goggles and a 400-page safety booklet that, realistically, you probably didn’t quite take the time to fully read.

Room for Growth

In my opinion, Rust is, in many ways, a huge improvement over C or C++. If I were faced with a choice between Rust and C/C++, I think I’d pick Rust 9 times out of 10. The main reason I wanted to use Rust was that I thought it would offer us much better tools to manage code complexity than C, and I think Rust delivers on that front. We’ve encountered small challenges during our port, but we’ve been able to either find solutions or work around them. Still, in my opinion, there are areas where Rust could be improved, and I hope that some of the feedback from this blog post will prompt constructive discussions about the language or other systems programming languages under development.

One important thing to remember is that some of the challenges that we faced porting YJIT are problems you probably wouldn’t have to deal with, or could work around more easily if you were starting a new Rust project from scratch. In particular, there’s an impedance mismatch between Rust and C code. Realistically, if you were starting a new Rust project from scratch, you would probably try to directly interface with as little C code as possible, and you would make better use of Rust idioms than we have at this time, which might lead to cleaner code.

There’s still room for the Rust ecosystem to mature. We’ve mentioned issues that we ran into with the cargo and bindgen tools. There are also many experimental APIs in Rust that provide various usability improvements but aren’t yet standardized. In particular, we could have used SyncLazy in YJIT, but it’s only available as part of nightly builds for the time being.

It took us three months to complete the port of YJIT from C to Rust, and we feel very satisfied with the result. The end product is, in my opinion, much more maintainable and pleasant to work with than the original C codebase. I feel that the port has brought fresh energy into the project and that Rust, flaws and all, provides very good abstraction and code organization mechanisms. In the next few months, we’ll deliver new features and improvements to YJIT, and we’ll also spend some time refactoring the YJIT source code to be more idiomatically Rusty and less like a direct translation from C, so that we can better leverage the strengths of this powerful language.

Maxime Chevalier-Boisvert obtained a PhD in compiler design at the University of Montreal in 2016, where she developed Basic Block Versioning (BBV), a JIT compiler architecture optimized for dynamically-typed programming languages. She is currently leading a project at Shopify to build YJIT, a new JIT compiler built inside CRuby.

We’re hiring a Staff Compiler Engineer (Rust YJIT) to join our team! If working on a Just-In-Time compiler for CRuby is something that interests you, we'd love to chat. Check out the job posting to learn more.

- 위키
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-09 09:36
浙ICP备14020137号-1 $방문자$