$$ % Typography and symbols \newcommand{\msf}[1]{\mathsf{#1}} \newcommand{\ctx}{\Gamma} \newcommand{\qamp}{&\quad} \newcommand{\qqamp}{&&\quad} \newcommand{\Coloneqq}{::=} \newcommand{\proves}{\vdash} \newcommand{\star}[1]{#1^{*}} \newcommand{\eps}{\varepsilon} \newcommand{\nul}{\varnothing} \newcommand{\brc}[1]{\{{#1}\}} \newcommand{\binopm}[2]{#1~\bar{\oplus}~#2} \newcommand{\mag}[1]{|{#1}|} \newcommand{\aequiv}{\equiv_\alpha} \newcommand{\semi}[2]{{#1};~{#2}} % Untyped lambda calculus \newcommand{\fun}[2]{\lambda ~ {#1} ~ . ~ {#2}} \newcommand{\app}[2]{#1 ~ #2} \newcommand{\fix}[3]{\msf{fix}~({#1} : {#2}) ~ . ~ #3 } \newcommand{\truet}{\msf{true}} \newcommand{\falset}{\msf{false}} \newcommand{\define}[2]{{#1} \triangleq {#2}} % Typed lambda calculus - expressions \newcommand{\funt}[3]{\lambda ~ \left(#1 : #2\right) ~ . ~ #3} \newcommand{\lett}[4]{\msf{let} ~ \hasType{#1}{#2} = #3 ~ \msf{in} ~ #4} \newcommand{\letrec}[4]{\msf{letrec} ~ \hasType{#1}{#2} = #3 ~ \msf{in} ~ #4}a \newcommand{\ift}[3]{\msf{if} ~ {#1} ~ \msf{then} ~ {#2} ~ \msf{else} ~ {#3}} \newcommand{\rec}[5]{\msf{rec}(#1; ~ #2.#3.#4)(#5)} \newcommand{\case}[5]{\msf{case} ~ {#1} ~ \{ L(#2) \to #3 \mid R(#4) \to #5 \}} \newcommand{\pair}[2]{\left({#1},~{#2}\right)} \newcommand{\proj}[2]{#1 . #2} \newcommand{\inj}[3]{\msf{inj} ~ #1 = #2 ~ \msf{as} ~ #3} \newcommand{\letv}[3]{\msf{let} ~ {#1} = {#2} ~ \msf{in} ~ {#3}} \newcommand{\fold}[2]{\msf{fold}~{#1}~\msf{as}~{#2}} \newcommand{\unfold}[1]{\msf{unfold}~{#1}} \newcommand{\poly}[2]{\Lambda~{#1}~.~ #2} \newcommand{\polyapp}[2]{{#1}~\left[{#2}\right]} \newcommand{\export}[3]{\msf{export}~ #1 ~\msf{without}~{#2}~\msf{as}~ #3} \newcommand{\import}[4]{\msf{import} ~ ({#1}, {#2}) = {#3} ~ \msf{in} ~ #4} % Typed lambda calculus - types \newcommand{\tnum}{\msf{num}} \newcommand{\tstr}{\msf{string}} \newcommand{\tint}{\msf{int}} \newcommand{\tbool}{\msf{bool}} \newcommand{\tfun}[2]{#1 \rightarrow #2} \newcommand{\tprod}[2]{#1 \times #2} \newcommand{\tsum}[2]{#1 + #2} \newcommand{\trec}[2]{\mu~{#1}~.~{#2}} \newcommand{\tvoid}{\msf{void}} \newcommand{\tunit}{\msf{unit}} \newcommand{\tpoly}[2]{\forall~{#1}~.~{#2}} \newcommand{\tmod}[2]{\exists ~ {#1} ~ . ~ #2} % WebAssembly \newcommand{\wconst}[1]{\msf{i32.const}~{#1}} \newcommand{\wbinop}[1]{\msf{i32}.{#1}} \newcommand{\wgetlocal}[1]{\msf{get\_local}~{#1}} \newcommand{\wsetlocal}[1]{\msf{set\_local}~{#1}} \newcommand{\wgetglobal}[1]{\msf{get\_global}~{#1}} \newcommand{\wsetglobal}[1]{\msf{set\_global}~{#1}} \newcommand{\wload}{\msf{i32.load}} \newcommand{\wstore}{\msf{i32.store}} \newcommand{\wsize}{\msf{memory.size}} \newcommand{\wgrow}{\msf{memory.grow}} \newcommand{\wunreachable}{\msf{unreachable}} \newcommand{\wblock}[1]{\msf{block}~{#1}} \newcommand{\wloop}[1]{\msf{loop}~{#1}} \newcommand{\wbr}[1]{\msf{br}~{#1}} \newcommand{\wbrif}[1]{\msf{br\_if}~{#1}} \newcommand{\wreturn}{\msf{return}} \newcommand{\wcall}[1]{\msf{call}~{#1}} \newcommand{\wlabel}[2]{\msf{label}~\{#1\}~{#2}} \newcommand{\wframe}[2]{\msf{frame}~({#1}, {#2})} \newcommand{\wtrapping}{\msf{trapping}} \newcommand{\wbreaking}[1]{\msf{breaking}~{#1}} \newcommand{\wreturning}[1]{\msf{returning}~{#1}} \newcommand{\wconfig}[5]{\{\msf{module}{:}~{#1};~\msf{mem}{:}~{#2};~\msf{locals}{:}~{#3};~\msf{stack}{:}~{#4};~\msf{instrs}{:}~{#5}\}} \newcommand{\wfunc}[4]{\{\msf{params}{:}~{#1};~\msf{locals}{:}~{#2};~\msf{return}~{#3};~\msf{body}{:}~{#4}\}} \newcommand{\wmodule}[1]{\{\msf{funcs}{:}~{#1}\}} \newcommand{\wcg}{\msf{globals}} \newcommand{\wcf}{\msf{funcs}} \newcommand{\wci}{\msf{instrs}} \newcommand{\wcs}{\msf{stack}} \newcommand{\wcl}{\msf{locals}} \newcommand{\wclab}{\msf{labels}} \newcommand{\wcm}{\msf{mem}} \newcommand{\wcmod}{\msf{module}} \newcommand{\wsteps}[2]{\steps{\brc{#1}}{\brc{#2}}} \newcommand{\with}{\underline{\msf{with}}} \newcommand{\wvalid}[2]{{#1} \vdash {#2}~\msf{valid}} \newcommand{\wif}[2]{\msf{if}~{#1}~{\msf{else}}~{#2}} \newcommand{\wfor}[4]{\msf{for}~(\msf{init}~{#1})~(\msf{cond}~{#2})~(\msf{post}~{#3})~{#4}} % assign4.3 custom \newcommand{\wtry}[2]{\msf{try}~{#1}~\msf{catch}~{#2}} \newcommand{\wraise}{\msf{raise}} \newcommand{\wraising}[1]{\msf{raising}~{#1}} \newcommand{\wconst}[1]{\msf{i32.const}~{#1}} \newcommand{\wbinop}[1]{\msf{i32}.{#1}} \newcommand{\wgetlocal}[1]{\msf{get\_local}~{#1}} \newcommand{\wsetlocal}[1]{\msf{set\_local}~{#1}} \newcommand{\wgetglobal}[1]{\msf{get\_global}~{#1}} \newcommand{\wsetglobal}[1]{\msf{set\_global}~{#1}} \newcommand{\wload}{\msf{i32.load}} \newcommand{\wstore}{\msf{i32.store}} \newcommand{\wsize}{\msf{memory.size}} \newcommand{\wgrow}{\msf{memory.grow}} \newcommand{\wunreachable}{\msf{unreachable}} \newcommand{\wblock}[1]{\msf{block}~{#1}} \newcommand{\wloop}[1]{\msf{loop}~{#1}} \newcommand{\wbr}[1]{\msf{br}~{#1}} \newcommand{\wbrif}[1]{\msf{br\_if}~{#1}} \newcommand{\wreturn}{\msf{return}} \newcommand{\wcall}[1]{\msf{call}~{#1}} \newcommand{\wlabel}[2]{\msf{label}~\{#1\}~{#2}} \newcommand{\wframe}[2]{\msf{frame}~({#1}, {#2})} \newcommand{\wtrapping}{\msf{trapping}} \newcommand{\wbreaking}[1]{\msf{breaking}~{#1}} \newcommand{\wreturning}[1]{\msf{returning}~{#1}} \newcommand{\wconfig}[5]{\{\msf{module}{:}~{#1};~\msf{mem}{:}~{#2};~\msf{locals}{:}~{#3};~\msf{stack}{:}~{#4};~\msf{instrs}{:}~{#5}\}} \newcommand{\wfunc}[4]{\{\msf{params}{:}~{#1};~\msf{locals}{:}~{#2};~\msf{return}~{#3};~\msf{body}{:}~{#4}\}} \newcommand{\wmodule}[1]{\{\msf{funcs}{:}~{#1}\}} \newcommand{\wcg}{\msf{globals}} \newcommand{\wcf}{\msf{funcs}} \newcommand{\wci}{\msf{instrs}} \newcommand{\wcs}{\msf{stack}} \newcommand{\wcl}{\msf{locals}} \newcommand{\wcm}{\msf{mem}} \newcommand{\wcmod}{\msf{module}} \newcommand{\wsteps}[2]{\steps{\brc{#1}}{\brc{#2}}} \newcommand{\with}{\underline{\msf{with}}} \newcommand{\wvalid}[2]{{#1} \vdash {#2}~\msf{valid}} % assign4.3 custom \newcommand{\wtry}[2]{\msf{try}~{#1}~\msf{catch}~{#2}} \newcommand{\wraise}{\msf{raise}} \newcommand{\wraising}[1]{\msf{raising}~{#1}} \newcommand{\wif}[2]{\msf{if}~{#1}~{\msf{else}}~{#2}} \newcommand{\wfor}[4]{\msf{for}~(\msf{init}~{#1})~(\msf{cond}~{#2})~(\msf{post}~{#3})~{#4}} \newcommand{\windirect}[1]{\msf{call\_indirect}~{#1}} % session types \newcommand{\ssend}[2]{\msf{send}~{#1};~{#2}} \newcommand{\srecv}[2]{\msf{recv}~{#1};~{#2}} \newcommand{\soffer}[4]{\msf{offer}~\{{#1}\colon({#2})\mid{#3}\colon({#4})\}} \newcommand{\schoose}[4]{\msf{choose}~\{{#1}\colon({#2})\mid{#3}\colon({#4})\}} \newcommand{\srec}[1]{\msf{label};~{#1}} \newcommand{\sgoto}[1]{\msf{goto}~{#1}} \newcommand{\dual}[1]{\overline{#1}} % Inference rules \newcommand{\inferrule}[3][]{\cfrac{#2}{#3}\;{#1}} \newcommand{\ir}[3]{\inferrule[\text{(#1)}]{#2}{#3}} \newcommand{\s}{\hspace{1em}} \newcommand{\nl}{\\[2em]} \newcommand{\evalto}{\boldsymbol{\overset{*}{\mapsto}}} \newcommand{\steps}[2]{#1 \boldsymbol{\mapsto} #2} \newcommand{\evals}[2]{#1 \evalto #2} \newcommand{\subst}[3]{[#1 \rightarrow #2] ~ #3} \newcommand{\dynJ}[2]{#1 \proves #2} \newcommand{\dynJC}[1]{\dynJ{\ctx}{#1}} \newcommand{\typeJ}[3]{#1 \proves \hasType{#2}{#3}} \newcommand{\typeJC}[2]{\typeJ{\ctx}{#1}{#2}} \newcommand{\hasType}[2]{#1 : #2} \newcommand{\val}[1]{#1~\msf{val}} \newcommand{\num}[1]{\msf{Int}(#1)} \newcommand{\err}[1]{#1~\msf{err}} \newcommand{\trans}[2]{#1 \leadsto #2} \newcommand{\size}[1]{\left|#1\right|} $$

&Notepad

Implementing a Type-safe printf in Rust

Will Crichton — June 21, 2020

I show how to use heterogeneous lists and traits to implement a type-safe printf in Rust. These mechanisms can ensure that two variadic argument lists share important properties, like the number of format string holes matches the number of printf arguments.

Part of an ongoing series about type-level programming in Rust. Consider reading part one first! All code in this note can be found in this gist.

Rust has a great printf function, println!. It has a rich formatting language, but also catches mistakes at compile-time. For example, println! will check that the number of arguments matches the number of holes in the format string:

error: 2 positional arguments in format string, but there is 1 argument
  --> src/printf.rs:36:13
   |
36 |   println!("{} {}", "Hello");

How does the Rust compiler make this check? Because both the format string and the argument are within the macro, Rust will check that the number of holes matches the number of arguments. Consequently, format strings have to be within the macro. If we write:

let s = "{} {}";
println!(s, "hi")

This code fails to compile with the error:

error: format argument must be a string literal
  --> src/printf.rs:42:12
   |
42 |   println!(s, "hi");

I’ll show you how to implement type-safe printf without using procedural macros. More generally, this note contains a Rust recipe for functions where:

Arguments are variadic
Each argument can be a different type (Vec<T> not allowed)
Multiple variadic inputs share a parallel property, e.g. the number of arguments should match the number of format holes

Core mechanism: HList

First, we need to understand the main type-level mechanism: a heterogeneous list (or H-list). An H-list is a sequence of values of potentially different types. For example, [1, "a", true] is an H-list, but not a valid Rust vector. H-lists are implemented in Rust using a linked-list style:

struct HNil;
struct HCons<Head, Tail> {
  head: Head,
  tail: Tail
}

let example: HCons<i32, HCons<bool, HNil>> =
  HCons{head: 1, tail: HCons{head: true, tail: HNil}};

The key idea is that the type of an H-list changes every time you make a change to it. By contrast, if you push to a Vec<T>, the type of the vector stays the same.

Just like Rust has vec![], we can use the frunk crate to get an hlist! macro.

let example = hlist![1, true]; // same as above

Setting up printf

Let’s go back to the ingredients of printf. We need a format string and an argument list. The key idea is to represent both with an H-list, and carefully use Rust’s traits to ensure our desired property: the number of arguments should match the number of holes.

First, to represent format strings, we will have a sequence of structs that represent each part of the string.

pub struct FString(&'static str);
pub struct FVar;

// Assume that we compile "Hello {}! The first prime is {}" into this code.
// That would be a simple syntactic transformation.
let example = hlist![
  FString("Hello "), FVar, FString("! The first prime is "), FVar
];

To represent arguments, we will use a matching H-list of values. For example:

let args = hlist!["world", 2];

Then, our goal is to create a function format such that this is true:

assert_eq!(
  example.format(args),
  "Hello world! The first prime is 2"
);

And this should be a compile-time (NOT run-time) error:

example.format(hlist!["Only one arg"]);

The Format trait

In the spirit of type-level computation, we start by defining a trait.

trait Format<ArgList> {
  fn format(&self, args: ArgList) -> String;
}

Here, self is the H-list of the format directives, and ArgList is the H-list of the variadic arguments. Format need to take ArgList as a type parameter, because its type will change as we remove elements from the ArgList list.

Now, we proceed to implement the Format trait by cases. First, the base case for reaching the end of the format list HNil:

impl Format<HNil> for HNil {
  fn format(&self, _args: HNil) -> String {
    "".to_string()
  }
}

This impl says that when we reach the end of a format list, just return the empty string. And the only argument we will accept is an empty argument list. Combined with the next impls, this inductively ensures that extra arguments are not accepted.

Next, we will implement FString. This implementation should use the string constant contained in the FString struct, and combine it recursively with the rest of the format list. We don’t use variadic arguments for FString, so they get passed along. In Rust, this English specification becomes:

impl<ArgList, FmtList> Format<ArgList>
for HCons<FString, FmtList>
where FmtList: Format<ArgList>
{
  fn format(&self, args: ArgList) -> String {
    self.head.0.to_owned() + &self.tail.format(args)
  }
}

Note that we have to add FmtList: Format<ArgList> to ensure the recursive call to self.tail.format works. Also note that we aren’t implementing Format directly on FString, but rather on an H-list containing FString.

Finally, the most complex case, FVar. We want this impl to take an argument from the ArgList, then format the remaining format list with the remaining arguments.

impl<T, ArgList, FmtList> Format<HCons<T, ArgList>>
for HCons<FVar, FmtList>
where
  FmtList: Format<ArgList>,
  T: ToString,
{
  fn format(&self, args: HCons<T, ArgList>) -> String {
    args.head.to_string() + &self.tail.format(args.tail)
  }
}

Be careful to observe which H-list is being accessed by head and tail. Here, the args H-list provides the data to fill the hole via args.head.

Checking our properties

With this implementation, our correct example successfully compiles and runs:

let example = hlist![
  FString("Hello "), FVar, FString("! The first prime is "), FVar
];
assert_eq!(
  example.format(hlist!["world", 2]),
  "Hello world! The first prime is 2"
);

What about our incorrect example? If we write this:

example.format(hlist!["just one arg"]);

This code fails to compile with the error:

error[E0308]: mismatched types
  --> src/printf.rs:48:18
   |
48 |   example.format(hlist!["just one arg"]);
   |                  ^^^^^^^^^^^^^^^^^^^^^^
   |                  expected struct `Cons`, found struct `HNil`
   |
   = note: expected struct `HCons<_, HNil>`
              found struct `HNil`

While the error is enigmatic, our mistake is at least correctly caught at compile-time. This is because Rust deduces that example.format() expects an H-list of the shape HCons<_, HCons<_, HNil>>, but it finds HNil too soon in our 1-element H-list. A similar error occurs when providing too many args.

Stupendous! We have successfully implemented a type-safe printf using H-lists and traits.

Extending our abstraction

Right now, our Format function just checks that the format list and argument list are the same length. We could extend our format structures, for example to ensure that an FVar must be a particular type, or must use Debug vs. Display. Here’s the sketch of such a strategy:

use std::marker::PhantomData;

// Add flags for whether using Display or Debug
pub struct FDisplay;
pub struct FDebug;

// Use a type parameter with PhantomData to represent the intended type
pub struct FVar<T, Flag>(PhantomData<(T, Flag)>);

// Now, T has to be the same between the format list and arg list
// Also, FDisplay flag requires that `T: Display`
impl<T, ArgList, FmtList> Format<HCons<T, ArgList>>
for HCons<FVar<T, FDisplay>, FmtList>
where
  FmtList: Format<ArgList>,
  T: Display,
{
  fn format(&self, args: HCons<T, ArgList>) -> String {
    // using format! is cheating, but you get the idea
    format!("{}", args) + &self.tail.format(args.tail)
  }
}

// Similar impl for `T: Debug` when `FDebug` is used

With this approach, if our format list and arg list differ in type:

let fmt = hlist![FString("n: "), FVar::<i32, FDisplay>(PhantomData)];
fmt.format(hlist!["not a number"]);

Then the code will not compile with the error, &'static str is not i32.

Shout-out to this blog post from Haskell’s servant framework which got me thinking about adapting their strategy into Rust.