Typestate
Let's say we're implementing an API for a read-only file. It should adhere to the following state machine:
A file is opened into a reading state, and is read until reaching the end-of-file (eof). In either state, the file can be closed. From the lens of access control, our API design goal is to only allow access to certain operations (eg read()
) when an object is in a corresponding state (eg reading
).
As a first attempt, here's a perfectly reasonable API that's similar to std::fs::File
:
struct File {
reached_eof: bool,
/* .. */
}
impl File {
pub fn open(path: String) -> Option<File> { /* .. */ }
// Returns None if reached EOF
pub fn read(&mut self) -> Option<Vec<u8>> {
if self.reached_eof {
None
} else {
// read the file ..
}
}
pub fn close(self);
}
fn main() {
let mut f = File::open("test.txt".to_owned()).unwrap();
while let Some(bytes) = f.read() {
println!("{:?}", bytes);
}
f.read(); // This works! Just will return None
f.close();
}
The key question for this API: how does it prevent calling read()
in an EOF state? Here, the answer is: it doesn't. Rather, calling read()
after EOF is a runtime error, represented by the None
branch of the Option
type. The state machine is only contained internally (reached_eof
).
Representing states as structs
Let's say the goal is to prevent the user from ever calling read()
in an EOF state. That is, the call to f.read()
after the while let
should not compile in the first place. Now we arrive at the concept of typestate: encoding the states of the state machine into the type system.
// 1. Each state is its own struct
struct ReadingFile { inner: File }
struct EofFile { inner: File }
enum ReadResult {
Read(ReadingFile, Vec<u8>),
Eof(EofFile)
}
impl ReadingFile {
pub fn open(path: String) -> Option<ReadingFile> { /* .. */ }
// 2. Calling `read()` takes ownership of ReadingFile
pub fn read(self) -> ReadResult {
match self.inner.read() {
// 3. Access to `ReadingFile` is only given back if not at EOF
Some(bytes) => ReadResult::Read(self, bytes)
None => ReadResult::Eof(EofFile { inner: self.inner })
}
}
pub fn close(self) {
self.inner.close();
}
}
impl EofFile {
pub fn close(self) {
self.inner.close();
}
}
fn main() {
let mut file = ReadingFile::open("test.txt".to_owned()).unwrap();
loop {
match file.read() {
ReadResult::Read(f, bytes) => {
println!("{:?}", bytes);
file = f;
}
ReadResult::Eof(f) => {
f.close();
break;
}
}
}
// file has been moved, can't call file.read() here
}
This API design has three key ideas:
- Each state is a struct: the
ReadingFile
andEofFile
states are represented as distinct structs. This way, we can associate different methods with each. - Each state has only its transitions implemented: the
read()
method is implemented forReadingFile
, but not forEofFile
, while both can callclose()
. - Transitions between states consume ownership: the
read()
method consumesReadingFile
and returns whichever state comes after. This prevents calling methods on an old state.
Moving states into a type parameter
One ugliness in this design is that the user has to manage a single logical object (the file) but split into multiple actual types (the states). An alternative design is to represent states as a type parameter to a single object, like so:
struct Reading;
struct Eof;
struct File2<State> {
inner: File,
_state: PhantomData<State>
}
enum ReadResult {
Read(File2<Reading>, Vec<u8>),
Eof(File2<Eof>)
}
impl File2<Reading> {
pub fn open(path: String) -> Option<File2<Reading>> { /* .. */ }
pub fn read(self) -> ReadResult {
match self.inner.read() {
// 3. Access to `ReadingFile` is only given back if not at EOF
Some(bytes) => ReadResult::Read(self, bytes)
None => ReadResult::Eof(File2 { inner: self.inner, _state: PhantomData })
}
}
pub fn close(self) {
self.inner.close();
}
}
impl File2<Eof> {
pub fn close(self) {
self.inner.close();
}
}
The API is largely the same, and the usage example from before still works. But on the implementation side, we now have to carry around this PhantomData
marker or else Rust complains about an unused type parameter.