Why Rust enums are so cool
2022-01-04
Introduction
OOP enums are pretty boring. They are just a way to give names to some integral constants. Rust enums are much more powerful. They solve some design problems much more elegantly than was possible in OOP languages. Read on to find out what Rust enums can do that OOP enums can't.
OOP Enums
Let's say you have an enum in C#:
Red,
Green,
Blue
}
What can you do with it? You can compare it with other values of the same type:
var colour1 = Colour.Red;
var colour2 = Colour.Green;
//Compare colour1 and colour2 and do something if equal
if
//do something
Or maybe you can convert them into an integer:
var colourInt = colour1;
And that's pretty much it.
Now, I'm not completely dismissing enums in OOP languages. They do improve type safety and limit the possible values that a variable can have. Not to mention readability over plain integers. And that makes the code cleaner and reduces chances of bugs. But OOP enums stop short of realizing their full potential.
Rust Enums
The real power of enums starts showing when programming languages allow enum variants to carry data around. Let's take a look at the Option
enum to understand what I mean.
Option
An Option
in Rust is a container for a value. The container could be empty, or it could hold some value. This is how it is defined:
None,
Some,
In an Option
, the None
variant represents an empty state and the Some
variant can carry a data of type T
. Why is that useful? Think about how you would implement something similar in C#. It would likely be a class
with a flag indicating whether a value is present or not. And indeed, Nullable
in C# is defined (almost) like this:
public bool hasValue;
public T value;
}
Now think about the usability of this class. How would you get the value out of a Nullable
instance:
var mightBeNull = ;
if
//do something with mightBeNull.value
The if
check in the code above is critical. If you omit it, the mightBeNull.value.Length
expression will throw a NullReferenceException
:
var mightBeNull = ;
//no compiler error but still a NullReferenceException
var length = mightBeNull.value.Length;
In stark contrast, you can't directly access the value in Rust:
let might_be_null: = Option None;
//error[E0609]: no field `value` on type `Option<String>`
let some_other_var = might_be_null.value;
Instead the Rust compiler forces you to check if the mightBeNull
variable is the Some
variant before you can get your hands on the value wrapped inside:
let might_be_null: = //get an Option<String> from somewhere
if let Some = might_be_null
//do something with value
Pretty cool isn't it. While you could easily shoot yourself in the foot in C#, Rust prevented you from committing such silly mistakes. Let's take a look at another example.
Result
Result
is the centerpiece of error handling in Rust. Any fallible function can either successfully return a value or fail with an error. This is how Result
is defined:
Ok,
Err,
Here both variants of Result
carry some data. The Ok(T)
variant carries the return value if a function succeeds. The Err(E)
variant carries the error value if it fails.
If you have ever written some C, you must have seen a pattern of error handling in which the return value doubles up as both the return value for success and for error. For example the atof
function is declared like this in C:
double ;
This function will try to parse a double from str
. If the function succeeds, it returns the parsed value. But if it can't parse a value it returns zero. Do you know what happens if the input string can be parsed into a zero (e.g. "0")? This will also return zero. It means if atof
returns zero, you can't tell if that was because the input string was "0" or some unparseable gibberish.
In Rust, the same function would have a much cleaner return type:
//...
This atof
will return an Ok(f64)
when it can successfully parse a number but will return an Err(u8)
if it can't. There is no chance of using some valid value as an error code because the Ok
and Err
variants carry separate values.
As with an Option
you can't directly get the value of a Result
:
let result = atof;
//error[E0609]: no field `value` on type `Result<f64, u8>`
let value = result.value;
The only safe way to get the value out is to pattern match on atof
's return value:
match atof
Ok =>
Err =>
And lastly, there is one more safety feature enabled by Result
. In C it is too easy to forget to check an error code. In Rust the compiler warns you if you don't use a Result
:
//a call which throws away a Result
atof;
The above line will issue this warning:
warning: unused `Result` that must be used
--> src\main.rs:6:5
|
6 | atof("123.56");
| ^^^^^^^^^^^^^^^
|
That's all about Result
for now. Next, let's talk about when you should write your own enums in Rust?
When to use Rust enums
The built in Option
and Result
types are great, but how do you design your own enums? In general, whenever you have a situation in which a variable can have either of a few possible states, an enum might be a good fit. Consider the following example from the serde-yaml crate:
/// Represents a YAML null value.
Null,
/// Represents a YAML boolean.
Bool,
/// Represents a YAML numerical value, whether integer or floating point.
Number,
/// Represents a YAML string.
String,
/// Represents a YAML sequence in which the elements are
/// `serde_yaml::Value`.
Sequence,
/// Represents a YAML mapping in which the keys and values are both
/// `serde_yaml::Value`.
Mapping,
Here the Value
enum represents a value in a yaml file. A yaml value can be either null or a bool or a number and so on. Hence, this is a perfect place to use an enum.
Tackling the same problem in an OOP language leaves you with just two broad options. Either try to shoehorn everything into a single Value
class. Or make one class for each type of value (Null
, Bool
etc.) derived from a base Value
class. The first option is just an ugly mishmash of unrelated member variables. The second is better but still a lot of boilerplate. Luckily, in Rust you don't have to make this tradeoff.
Before I wrap up, a small section on some terms from type theory.
Sum (and Product) Types
Rust enums are what are called sum types in type theory. Why that name? Let's consider how many possible distinct values an Option<bool>
enum can have. It is the total number of distinct values for Option::None
(1) plus total number of distinct values for Option::Some
(2). Their sum is 3. Since we add the possible values the variants of an enum can have, that is why enums are called sum types.
Now consider a user defined type in C++, a Point
struct:
int x;
int y;
;
Think how many distinct Point
s can be created? If an int
is 32 bit wide, there can be 2^32 possible values for x
and as many for y
. So the total number of distinct Point
s are: total number of distinct values for x
multiplied by total number of distinct values for y
. That is why types like Point
are called product types.
Conclusion
The last section above was there just to make you aware of the terms some people throw around when talking about types. In reality, the esoteric, mathy sounding names are the least interesting aspect of enums in Rust. They are a tool that solve some design problems better than their OOP counterparts. And once you start using them, you wish other languages had them too.