Which programming language to choose may seem like a rather geeky and boring topic. But in fact it can have a huge impact on how fast, reliable, and safe your software is. In this blog, I will explore the rationale behind my decision to migrate our whole AI ecosystem, SAIBRE, to Rust. But first, let’s start with a brief 101 on programming languages.
What is a programming language
A programming language is a specialist language that allows a human developer to instruct a computer to perform a task (e.g. to run a program). Traditionally, programming languages have been classified in two ways. The first is the level of abstraction they offer. The second is the programming paradigm they support. Both of these classifications are useful to understand, so let’s explore them.
At the hardware level, any and every program a developer writes is actually a set of simple machine code instructions. These tell the CPU and memory what to do. For instance, if you want to add the value in register 3 to the value in register 4 and store the result, you might have the following code.
Writing machine code is not advisable for anyone wanting to maintain their sanity. So pretty early on, people realized that they needed to abstract what the machine code is actually doing. So, they converted the machine code into something called assembly code. This was (marginally) easier for people to understand. For instance, the instructions above would become
STORE 0x6a2e, 01
As you can see, this is still not that intuitive. As a result, there has been a steady move to increase the level of abstraction and move to a more human-readable form.
Low-level languages like C introduced concepts like named variables, if-else statements, for loops, etc. But they still allow you to manipulate the memory at the bit level. Higher-level languages like Java forego the ability to do low-level manipulation in favor of being easier to understand. The higher up the abstraction layer you go, the easier the language becomes for a human to understand. Historically, you had to trade ease of comprehension for performance, but this is no longer the case with modern languages.
The other common classification is according to the programming paradigm(s) a language supports. There are lots of these paradigms, so we will just look at the most common. At the highest level, languages can be divided into imperative or declarative families.
- In imperative languages, the programmer tells the computer what to do. Two of the most common forms of imperative language are:
- Procedural languages like C, where the programmer precisely defines every step of the program and the whole program is a single entity.
- Object-oriented languages like C++ or Java, where the program is broken down into a number of separate objects that have their own code and data.
- In declarative languages, the program specifies what the result should be, but not how to reach that result. This includes functional languages like Haskell, which treat the problem as a series of mathematical functions.
In reality, many of today’s languages have evolved to be multi-paradigm. For instance, Python supports functional, object-oriented, procedural (imperative), reflective, and structured paradigms. Your choice of paradigm is often dictated by personal preference, but some paradigms are particularly suited to some use cases. For instance, object-oriented programming has long been associated with systems programming.
Why these classifications aren’t the whole story
Trying to classify languages by level of abstraction or paradigm helps you to compare some properties of different languages. However, there’s one more classification that is really significant when it comes to safety or performance. That is the difference between interpreted and compiled languages.
Interpreted languages like Python are executed within a runtime environment. That environment imports your code and directly interprets it and acts upon it. This means you can go straight from writing your code to executing it. In turn, that means you can actually write and modify your code on the fly. However, the code isn’t optimized in any way. If you don’t write the code efficiently, it won’t run efficiently. These properties make Python easier to learn, but harder to optimize.
By contrast, languages like C++ require you to perform an additional step called compilation. This converts your code into machine code optimized specifically for the target CPU. In fact, the process is a bit more complex than this, but we don’t need to go into details. If you want a good primer, check out this tutorial. The important thing to note is that a modern compiler is able to do a lot of clever things to ensure your code runs optimally. These include collapsing unnecessary extra steps such as using an intermediate variable to store a result. This means you can write code that is easy for you to understand as a human but which will then be executed optimally by the computer.
One of the most important things a compiler can enforce is safety. Many of the vulnerabilities in software come about because of flaws in memory management. And a common cause of crashes is when the computer tries to interpret a variable as the wrong type. Compilers are especially good at dealing with these problems. Some languages are defined as having strong type-safety because the compiler prevents you from accidentally casting an int as a float for instance. This can make writing the code more painful but it prevents problems at runtime. Equally, many languages enforce robust memory management. That means you aren’t able to accidentally read or write to the wrong piece of memory. In turn, that prevents a whole class of hacking attacks.
But what about Rust?
So, having been through all that background we can move on to discussing Rust. And more specifically, why I chose to migrate our entire SAIBRE stack into Rust.
Rust is a relatively new language compared with Python, C++ or Java. That means that its developers were able to learn from all the good and bad decisions made for other languages. The result is a language that is optimized for performance and safety, making it the ideal systems programming language for building backends. Like most modern languages, it is defined as multi-paradigm. It’s frequently used to write both functional and object-oriented code, often intermixing the best of both. Importantly, it has strong type-safety and very strict memory management. But what does this mean in practice?
The impact for a developer
For a developer, the Rust compiler can be a bit irritating at first. It will give you many more compile errors than, say, a standard C++ compiler would. That’s because it simply won’t allow many of the lazy shortcuts you may be used to (like implicitly converting a float to a string). However, those compiler errors tend to be more helpful, and once you are used to its quirks, Rust quickly becomes a rewarding language to work in. Indeed, Rust has been one of the most frequent winners of “most loved programming language” in Stack Overflow’s annual survey. That’s because any code written and compiled in Rust will generally just work. There will be way fewer bugs at runtime, meaning the code passes QA faster. In turn, once you deploy it to the backend, your code will be more reliable and robust. That means less pain for DevOps. Another great aspect of using Rust is the large and generally friendly community, always ready to help each other out.
What else makes Rust special?
One of the biggest pluses for Rust is the “cargo” command. This one command combines a powerful project management, build, generation, linting, formatting, and testing utility. This sets Rust apart from most other systems programming languages. Like its inspiration, NPM, cargo allows us to easily define dependencies and packages, language versions, and sub-applications. We can then reliably recreate executables across multiple systems and OS’s. This has been a true godsend for our heterogeneous team who use multiple flavors of Windows, MacOS, and Linux across their machines. As a final awesome benefit, the cargo utility allows instant generation of code documentation.
Going hand in hand with “cargo” is the “crate” ecosystem for packaging code. This makes it incredibly easy to package up libraries, modules, and even applications, much like Python’s “pypi” command. And thanks to the growing Rust community, there is a wide range of useful crates freely available to download and install. All in all, Rust saves us a heap of time and effort compared with chasing down and learning different disparate tools and frameworks.
Finally, let’s consider performance. While it can be hard to compare the relative performance of languages, some things are clear. For starters, interpreted languages like Python struggle when it comes to raw performance. They are SLOW. That’s because first, the code is inevitably not optimized and second, the computer is having to dynamically interpret the code as it goes. By contrast, extremely low-level systems languages like C can be used to write remarkably fast and efficient programs. However, that’s only the case for the top developers. A novice is still able to write code that won’t execute efficiently. Then we come to modern languages like Rust. Here, the compiler is taking on the lion’s share of the optimization work. As a result, the code will run about as fast as the expert-written C code in many cases. That really matters when you are writing code to be deployed at scale.
Migrating SAIBRE to Rust has been well worth it. We saw an order-of-magnitude improvement in the performance of our data and modeling pipelines compared to the previous Python implementation. Our runtime is faster, safer, and way more efficient. This means it will be able to run even on low-power edge devices like IoT sensors. Above all, it makes the codebase far cleaner and easier to maintain because it significantly reduces the occurrence of technical debt.