Why Rust enums are so cool

2022-01-04

Introduction

OOP enums are pretty boring. They are just a way to give names to some integral constants. Rust enums are much more powerful. They solve some design problems much more elegantly than was possible in OOP languages. Read on to find out what Rust enums can do that OOP enums can't.

OOP Enums

Let's say you have an enum in C#:

enum Colour {
    Red,
    Green,
    Blue
}

What can you do with it? You can compare it with other values of the same type:

var colour1 = Colour.Red;
var colour2 = Colour.Green;

//Compare colour1 and colour2 and do something if equal
if (colour1 == colour2) {
    //do something
}

Or maybe you can convert them into an integer:

var colourInt = (int)colour1;

And that's pretty much it.

Now, I'm not completely dismissing enums in OOP languages. They do improve type safety and limit the possible values that a variable can have. Not to mention readability over plain integers. And that makes the code cleaner and reduces chances of bugs. But OOP enums stop short of realizing their full potential.

Rust Enums

The real power of enums starts showing when programming languages allow enum variants to carry data around. Let's take a look at the Option enum to understand what I mean.

Option

An Option in Rust is a container for a value. The container could be empty, or it could hold some value. This is how it is defined:

enum Option<T> {
    None,
    Some(T),
}

In an Option, the None variant represents an empty state and the Some variant can carry a data of type T. Why is that useful? Think about how you would implement something similar in C#. It would likely be a class with a flag indicating whether a value is present or not. And indeed, Nullable in C# is defined (almost) like this:

class Nullable<T> {
    public bool hasValue;
    public T value;
}

Now think about the usability of this class. How would you get the value out of a Nullable instance:

var mightBeNull = new Nullable<string>();
if (mightBeNull.hasValue) {
    //do something with mightBeNull.value
}

The if check in the code above is critical. If you omit it, the mightBeNull.value.Length expression will throw a NullReferenceException:

var mightBeNull = new Nullable<string>();
//no compiler error but still a NullReferenceException
var length = mightBeNull.value.Length;

In stark contrast, you can't directly access the value in Rust:

let might_be_null: Option<String> = Option::None;
//error[E0609]: no field `value` on type `Option<String>`
let some_other_var = might_be_null.value;

Instead the Rust compiler forces you to check if the mightBeNull variable is the Some variant before you can get your hands on the value wrapped inside:

let might_be_null: Option<String> = //get an Option<String> from somewhere
if let Some(value) = might_be_null {
    //do something with value
}

Pretty cool isn't it. While you could easily shoot yourself in the foot in C#, Rust prevented you from committing such silly mistakes. Let's take a look at another example.

Result

Result is the centerpiece of error handling in Rust. Any fallible function can either successfully return a value or fail with an error. This is how Result is defined:

enum Result<T, E> {
    Ok(T),
    Err(E),
}

Here both variants of Result carry some data. The Ok(T) variant carries the return value if a function succeeds. The Err(E) variant carries the error value if it fails.

If you have ever written some C, you must have seen a pattern of error handling in which the return value doubles up as both the return value for success and for error. For example the atof function is declared like this in C:

double atof(const char* str);

This function will try to parse a double from str. If the function succeeds, it returns the parsed value. But if it can't parse a value it returns zero. Do you know what happens if the input string can be parsed into a zero (e.g. "0")? This will also return zero. It means if atof returns zero, you can't tell if that was because the input string was "0" or some unparseable gibberish.

In Rust, the same function would have a much cleaner return type:

fn atof(str: &str) -> Result<f64, u8> {
    //...
}
Note Note

This atof will return an Ok(f64) when it can successfully parse a number but will return an Err(u8) if it can't. There is no chance of using some valid value as an error code because the Ok and Err variants carry separate values.

As with an Option you can't directly get the value of a Result:

let result = atof("abcd");
//error[E0609]: no field `value` on type `Result<f64, u8>`
let value = result.value;

The only safe way to get the value out is to pattern match on atof's return value:

match atof("123.56") {
    Ok(val) => { println!("Parsed value is: {}", val)}
    Err(e) => { println!("Error is {}", e)}
}

And lastly, there is one more safety feature enabled by Result. In C it is too easy to forget to check an error code. In Rust the compiler warns you if you don't use a Result:

//a call which throws away a Result
atof("123.56");

The above line will issue this warning:

warning: unused `Result` that must be used
 --> src\main.rs:6:5
  |
6 |     atof("123.56");
  |     ^^^^^^^^^^^^^^^
  |

That's all about Result for now. Next, let's talk about when you should write your own enums in Rust?

When to use Rust enums

The built in Option and Result types are great, but how do you design your own enums? In general, whenever you have a situation in which a variable can have either of a few possible states, an enum might be a good fit. Consider the following example from the serde-yaml crate:

pub enum Value {
    /// Represents a YAML null value.
    Null,
    /// Represents a YAML boolean.
    Bool(bool),
    /// Represents a YAML numerical value, whether integer or floating point.
    Number(Number),
    /// Represents a YAML string.
    String(String),
    /// Represents a YAML sequence in which the elements are
    /// `serde_yaml::Value`.
    Sequence(Sequence),
    /// Represents a YAML mapping in which the keys and values are both
    /// `serde_yaml::Value`.
    Mapping(Mapping),
}

Here the Value enum represents a value in a yaml file. A yaml value can be either null or a bool or a number and so on. Hence, this is a perfect place to use an enum.

Tackling the same problem in an OOP language leaves you with just two broad options. Either try to shoehorn everything into a single Value class. Or make one class for each type of value (Null, Bool etc.) derived from a base Value class. The first option is just an ugly mishmash of unrelated member variables. The second is better but still a lot of boilerplate. Luckily, in Rust you don't have to make this tradeoff.

Before I wrap up, a small section on some terms from type theory.

Sum (and Product) Types

Rust enums are what are called sum types in type theory. Why that name? Let's consider how many possible distinct values an Option<bool> enum can have. It is the total number of distinct values for Option::None(1) plus total number of distinct values for Option::Some (2). Their sum is 3. Since we add the possible values the variants of an enum can have, that is why enums are called sum types.

Now consider a user defined type in C++, a Point struct:

struct Point {
    int x;
    int y;
};

Think how many distinct Points can be created? If an int is 32 bit wide, there can be 2^32 possible values for x and as many for y. So the total number of distinct Points are: total number of distinct values for x multiplied by total number of distinct values for y. That is why types like Point are called product types.

Conclusion

The last section above was there just to make you aware of the terms some people throw around when talking about types. In reality, the esoteric, mathy sounding names are the least interesting aspect of enums in Rust. They are a tool that solve some design problems better than their OOP counterparts. And once you start using them, you wish other languages had them too.