Typestate
Let's say we're implementing an API for a read-only file. It should adhere to the following state machine:
A file is opened into a reading state, and is read until reaching the end-of-file (eof). In either state, the file can be closed. From the lens of access control, our API design goal is to only allow access to certain operations (eg read()) when an object is in a corresponding state (eg reading).
As a first attempt, here's a perfectly reasonable API that's similar to std::fs::File:
struct File {
  reached_eof: bool,
  /* .. */
}
impl File {
  pub fn open(path: String) -> Option<File> { /* .. */ }
  // Returns None if reached EOF
  pub fn read(&mut self) -> Option<Vec<u8>> {
    if self.reached_eof {
      None
    } else {
      // read the file ..
    }
  }
  pub fn close(self);
}
fn main() {
  let mut f = File::open("test.txt".to_owned()).unwrap();
  while let Some(bytes) = f.read() {
    println!("{:?}", bytes);
  }
  f.read(); // This works! Just will return None
  f.close();
}
The key question for this API: how does it prevent calling read() in an EOF state? Here, the answer is: it doesn't. Rather, calling read() after EOF is a runtime error, represented by the None branch of the Option type. The state machine is only contained internally (reached_eof).
Representing states as structs
Let's say the goal is to prevent the user from ever calling read() in an EOF state. That is, the call to f.read() after the while let should not compile in the first place. Now we arrive at the concept of typestate: encoding the states of the state machine into the type system.
// 1. Each state is its own struct
struct ReadingFile { inner: File }
struct EofFile { inner: File }
enum ReadResult {
  Read(ReadingFile, Vec<u8>),
  Eof(EofFile)
}
impl ReadingFile {
  pub fn open(path: String) -> Option<ReadingFile> { /* .. */ }
  // 2. Calling `read()` takes ownership of ReadingFile
  pub fn read(self) -> ReadResult {
    match self.inner.read() {
      // 3. Access to `ReadingFile` is only given back if not at EOF
      Some(bytes) => ReadResult::Read(self, bytes)
      None => ReadResult::Eof(EofFile { inner: self.inner })
    }
  }
  pub fn close(self) {
    self.inner.close();
  }
}
impl EofFile {
  pub fn close(self) {
    self.inner.close();
  }
}
fn main() {
  let mut file = ReadingFile::open("test.txt".to_owned()).unwrap();
  loop {
    match file.read() {
      ReadResult::Read(f, bytes) => {
        println!("{:?}", bytes);
        file = f;
      }
      ReadResult::Eof(f) => {
        f.close();
        break;
      }
    }
  }
  // file has been moved, can't call file.read() here
}
This API design has three key ideas:
- Each state is a struct: the ReadingFileandEofFilestates are represented as distinct structs. This way, we can associate different methods with each.
- Each state has only its transitions implemented: the read()method is implemented forReadingFile, but not forEofFile, while both can callclose().
- Transitions between states consume ownership: the read()method consumesReadingFileand returns whichever state comes after. This prevents calling methods on an old state.
Moving states into a type parameter
One ugliness in this design is that the user has to manage a single logical object (the file) but split into multiple actual types (the states). An alternative design is to represent states as a type parameter to a single object, like so:
struct Reading;
struct Eof;
struct File2<State> {
  inner: File,
  _state: PhantomData<State>
}
enum ReadResult {
  Read(File2<Reading>, Vec<u8>),
  Eof(File2<Eof>)
}
impl File2<Reading> {
  pub fn open(path: String) -> Option<File2<Reading>> { /* .. */ }
  pub fn read(self) -> ReadResult {
    match self.inner.read() {
      // 3. Access to `ReadingFile` is only given back if not at EOF
      Some(bytes) => ReadResult::Read(self, bytes)
      None => ReadResult::Eof(File2 { inner: self.inner, _state: PhantomData })
    }
  }
  pub fn close(self) {
    self.inner.close();
  }
}
impl File2<Eof> {
  pub fn close(self) {
    self.inner.close();
  }
}
The API is largely the same, and the usage example from before still works. But on the implementation side, we now have to carry around this PhantomData marker or else Rust complains about an unused type parameter.