$$ % Typography and symbols \newcommand{\msf}[1]{\mathsf{#1}} \newcommand{\ctx}{\Gamma} \newcommand{\qamp}{&\quad} \newcommand{\qqamp}{&&\quad} \newcommand{\Coloneqq}{::=} \newcommand{\proves}{\vdash} \newcommand{\star}[1]{#1^{*}} \newcommand{\eps}{\varepsilon} \newcommand{\brc}[1]{\{{#1}\}} % Untyped lambda calculus \newcommand{\fun}[2]{\lambda ~ {#1} ~ . ~ {#2}} \newcommand{\app}[2]{#1 ~ #2} % Typed lambda calculus - expressions \newcommand{\funt}[3]{\lambda ~ \left(#1 : #2\right) ~ . ~ #3} \newcommand{\lett}[4]{\msf{let} ~ \hasType{#1}{#2} = #3 ~ \msf{in} ~ #4} \newcommand{\rec}[3]{\msf{rec}(#1; ~ x.y.#2)(#3)} \newcommand{\case}[5]{\msf{case} ~ {#1} ~ \{ L(#2) \to #3 \mid R(#4) \to #5 \}} \newcommand{\pair}[2]{\left({#1},{#2}\right)} \newcommand{\proj}[2]{#1 . #2} \newcommand{\inj}[3]{\msf{inj} ~ #1 = #2 ~ \msf{as} ~ #3} \newcommand{\letv}[3]{\msf{let} ~ {#1} = {#2} ~ \msf{in} ~ {#3}} % Typed lambda calculus - types \newcommand{\tprod}[2]{#1 \times #2} \newcommand{\tsum}[2]{#1 + #2} % WebAssembly \newcommand{\wconst}[1]{\msf{i32.const}~{#1}} \newcommand{\wbinop}[1]{\msf{i32}.{#1}} \newcommand{\wgetlocal}[1]{\msf{get\_local}~{#1}} \newcommand{\wsetlocal}[1]{\msf{set\_local}~{#1}} \newcommand{\wload}{\msf{i32.load}} \newcommand{\wstore}{\msf{i32.store}} \newcommand{\wsize}{\msf{memory.size}} \newcommand{\wgrow}{\msf{memory.grow}} \newcommand{\wunreachable}{\msf{unreachable}} \newcommand{\wblock}[1]{\msf{block}~{#1}} \newcommand{\wblockr}[2]{\msf{block}~{#1}~{#2}} \newcommand{\wloop}[1]{\msf{loop}~{#1}} \newcommand{\wbr}[1]{\msf{br}~{#1}} \newcommand{\wbrif}[1]{\msf{br\_if}~{#1}} \newcommand{\wreturn}{\msf{return}} \newcommand{\wcall}[1]{\msf{call}~{#1}} \newcommand{\wlabel}[3]{\msf{label}_{#1}~\{#2\}~{#3}} \newcommand{\wframe}[1]{\msf{frame}~{#1}} \newcommand{\wtrapping}{\msf{trapping}} \newcommand{\wbreaking}[2]{\msf{breaking}_{#1}~{#2}} \newcommand{\wreturning}[1]{\msf{returning}~{#1}} \newcommand{\wconfig}[5]{\{\msf{module}~{#1};~\msf{mem}~{#2};~\msf{locals}~{#3};~\msf{stack}~{#4};~\msf{instrs}~{#5}\}} \newcommand{\wfunc}[3]{\{\msf{params}~{#1};~\msf{locals}~{#2};~\msf{body}~{#3}\}} \newcommand{\wmodule}[1]{\{\msf{funcs}~{#1}\}} \newcommand{\semi}[2]{{#1};~{#2}} \newcommand{\semii}[3]{{#1};~{#2};~{#3}} \newcommand{\semiii}[4]{{#1};~{#2};~{#3};~{#4}} \newcommand{\semiiii}[5]{{#1};~{#2};~{#3};~{#4};~{#5}} \newcommand{\wci}{\msf{instrs}} \newcommand{\wcs}{\msf{stack}} \newcommand{\wcl}{\msf{locals}} \newcommand{\wcm}{\msf{mem}} \newcommand{\wcmod}{\msf{module}} \newcommand{\wsteps}[2]{\steps{\brc{#1}}{\brc{#2}}} % Inference rules \newcommand{\inferrule}[3][]{\cfrac{#2}{#3}\;{#1}} \newcommand{\ir}[3]{\inferrule[\text{(#1)}]{#2}{#3}} \newcommand{\s}{\hspace{1em}} \newcommand{\nl}{\\[2em]} \newcommand{\steps}[2]{#1 \boldsymbol{\mapsto} #2} \newcommand{\subst}[3]{[#1 \rightarrow #2] ~ #3} \newcommand{\dynJ}[2]{#1 \proves #2} \newcommand{\dynJC}[1]{\dynJ{\ctx}{#1}} \newcommand{\typeJ}[3]{#1 \proves \hasType{#2}{#3}} \newcommand{\typeJC}[2]{\typeJ{\ctx}{#1}{#2}} \newcommand{\hasType}[2]{#1 : #2} \newcommand{\val}[1]{#1~\msf{val}} \newcommand{\num}[1]{\msf{Int}(#1)} \newcommand{\err}[1]{#1~\msf{err}} \newcommand{\trans}[2]{#1 \leadsto #2} \newcommand{\size}[1]{\left|#1\right|} $$

&Notepad

Gradual Programming

Will Crichton   —   March 30, 2018
Programming is a fundamentally incremental (or gradual) process, and our programming languages should reflect that. I show several ways in which program models transition over time and discuss how future research can further a human-centric vision for programming languages.

Picking the right problem

What are the big problems in programming languages in 2018? The ones which, if we solve them, will have the greatest impact on the next generation of programmers? 1 This question, to me, is the allure of programming language research, that the tools and theories we develop don’t affect just a single domain, but potentially everybody who programs. But therein lies another problem: how on earth are we supposed to know the needs of every single programmer? It’s easy enough to work on a language for X new type theory or Y new language feature that I personally think is interesting, but what about everyone else?

This is one of the great failings of programming languages as a modern research field. A lot of the research is driven by the intuitions of the researchers, which are in turn shaped by their specific experiences with programming tools, languages, environments, and so on. An intuitionist approach has clearly taken us quite far to our current standing—smart people tend to have good intuitions—but I have to speculate that the apparent stagnation in widespread adoption of modern PL research is due in part to a lack of focus on the end user. A sentiment I’ve seen repeated is that there have been no new big ideas since Prolog.

I believe, then, that viewing programming languages (PL) through a lens of human-computer interaction (HCI) is the most critical meta-problem in the field today. More than ever, we need surveys, interviews, user studies, sociologists, psychologists, and so on to provide data-driven hypotheses about the hard parts of programming. Not just for learners, but for everyone, from the grizzled low-level systems developers to the rising tide of web developers. We’re starting to see this in the Usability of Programming Languages Special Interest Group at CHI (HCI conference), in papers like an Empirical Analysis of Programming Language Adoption, and in emerging working groups for language usability.

However, in the meantime, we can still make progress on core PL problems that we believe to be impactful. The manifesto presented in the remainder of this note stems mostly from my personal experience—I’ve been programming for just over a decade now in games (Lua, C), websites (PHP, JS), high-performance/distributed systems (C++, Go, Rust), compilers (Standard ML, OCaml), and data science (Python, R). In the course of these experiences, I’ve worked on small scripts, personal projects, open-source software, products at tiny (2 people), small (15), medium (500), and big (2000+) companies, and nowadays on academic research. I studied programming language theory at Carnegie Mellon, and these days I teach the programming language course at Stanford.

All of which is to say, although we need more data, I have done my best to form an educated opinion on problems in programming languages that matter across many domains and actually occur in the real world. Yet still there is much I do not know, and as always, I encourage you to read on with a critical eye.

Thinking gradually

I hold this fundamental belief: programming languages should be designed to match the human programming process.2 We should seek to understand how people think about programs and determine what programming processes come intuitively to the human mind. There are all sorts of fascinating questions here, like:

A basic observation about the human programming process is that it is incremental. No one writes the entirety of their program in a single go, clicks compile + publish, and never looks at the code again. Programming is a long slog of trial and error, where the length of the trial and the severity of the error depend heavily on the domain and the tooling. This is why inspectable output and fast compile times matter, e.g. changing an HTML document and refreshing the page shows you instantly what happened. Bret Victor’s Learnable Programming discusses this idea in detail.

I call this process “gradual programming” 3 4. While paradigms like imperative or functional programming characterize certain underlying aspects of our mental model of program, gradual programming describes a process by which a mental model is formed. In that sense, gradual programming is just… programming, but I think a new name is helpful in having a clear discussion.

Gradual programming is the programmer tracking the co-evolution of two things: 1) the syntactic representation of the program, as expressed to the computer via a programming language, and 2) a conceptual representation of the program, inside the mind. In the beginning of the process, the programmer starts with no syntax (an empty file) and usually a fuzzy idea of how the final program should work. From this point, she takes small steps in building components of the program until the final version is complete.

If you do programming, you have most certainly gone through this process before and can likely relate, but usually most of the thought process occurs implicitly (i.e. inside your head) and is never reified into a communicable form. To be concrete about this gradual process, let’s walk through an example in detail. Let’s say I want to write a program to append a line of text to a file. In my head, I have a model of the program that looks like this:

input file = some other input
input line = some input
write input line to the end of input file

Then I pick a language to work with, in this case Python. To start, rather than try to implement the whole program, I pick just the first line, and attempt to concretely encode it in Python.

$ cat append.py
input_file = input()
print(input_file)

Here, I made a number of decisions. First, I decided the input was going to come from stdin (for simplicity), and used Python’s input() standard library function. I had to come up with a name for that value, input_file, and that name had to conform to Python’s syntactic conventions. I also added a print statement not part of my original program model, but instead part of a temporary program model intended to debug my small program. Then if I try to run it:

$ echo "test.txt" | python append.py
Traceback (most recent call last):
  File "append.py", line 1, in <module>
    input_file = input()
  File "<string>", line 1, in <module>
NameError: name 'test' is not defined

Whoops, I mixed up input() and raw_input(). This wasn’t an issue with my programming model—I’m still thinking about the program the same way—just with my encoding in Python. Fixing my mistake:

$ cat append.py
input_file = raw_input()
input_line = raw_input()
print(input_file, input_line)

$ echo "test.txt\ntest" | python append.py
('test.txt', 'test')

Next, I need to figure out how to append the line to the file. In my initial mental model, this was all encapsulated with “write input line to the end of input file,” but now I need to turn that fuzzy idea into more concrete steps that I can easily encode in Python. Specifically, if I understand how a file system works, then I know I need to first open the file in append mode, write the string, and then close the file. After thinking about these details, my new mental model is:

input file = some other input
input line = some input
file = open the input file for appending
write input line to the end of input file
close file

This then translates into Python:

$ cat append.py
input_file = raw_input()
input_line = raw_input()
file = open(input_file, 'a')
file.write(input_line)
file.close()

$ echo "test.txt\ntest" | python append.py

$ cat test.txt
test

Success! Again, the purpose of this is to demonstrate the co-evolution of the syntactic and conceptual model of a program over time. Based on my experience programming as well as teaching others to program, I believe it exemplifies a common process underlying the way many people program.

Axes of evolution

The example above demonstrates the gradual nature of the programming process, but it’s not clear yet how we should be creating tools to match that process. To simplify the problem, we want to break down the evolution of a program along many smaller axes of evolution. Essentially, we ask: what kinds of information do people gradually learn and/or write down about their program? Then we can consider how programming languages can help optimize each axis individually.

1. Concrete / abstract
In building programs, it’s commonplace to start with an example of what you’re trying to make, and then generalize (or “abstract”) that example to cover a wider range of use cases. Abstraction is the backbone of programming languages, usually provided through functions and interfaces. For example, we can abstract our script above into a function:

def append_line(input_file, input_line):
   # our code above

append_line('test.txt', 'test')
append_line('test.txt', 'test again')

However, the fuzzier your program model is initially, the harder it is to immediately jump to an abstract solution, so this evolution from concrete to abstract occurs frequently in today’s programming languages (again, see “Create by Abstracting” in Learnable Programming).

2. Anonymous / named
In the beginning of the programming process during iteration/experimentation, we as programmers naturally want to optimize for the speed of writing code, not reading code. One form of write-optimized code is short-named and anonymous values. For example, a shortened first pass at our script above might look like:

s = raw_input()
f = open(s, 'a')
f.write(raw_input())
f.close()

Here, the variable names are less informative: s instead of input_file, f instead of file, and no name provided to the input_line. However, it takes less time to write, and if the script will never be read again, then why not? If we intend to use the script in a bigger codebase, then we might decide to incrementally change the names to be more informative for the sake of our code reviewers. This is another good example of gradual change that’s easy and commonplace in today’s programming languages.

3. Imperative / declarative
For a multitude of reasons, straight-line, sequential imperative code appears to come more naturally to programmers than functional/declarative code in their conceptual program model. For example, a simple list transformation will likely use for loops:

in_l = [1, 2, 3, 4]
out_l = []
for x in in_l:
  if x % 2 == 0:
    out_l.append(x * 2)

Whereas a more declarative version will abstract away the control flow into domain-specific primitives:

in_l = [1, 2, 3, 4]
out_l = map(lambda x: x * 2, filter(lambda x: x % 2 == 0, in_l))

The distinction between the two is not just stylistic—declarative code is usually much more easily analyzed for structure, e.g. a map is trivially parallelizable whereas a general for loop less so. This transformation occurs most often in languages which support mixed imperative/functional code (at the very least closures).

4. Dynamically typed / statically typed
The rise of dynamically typed languages in the last two decades (Python, Javascript, R, Lua, …) should suffice as evidence that people find dynamic typing useful, regardless of which side of the debate you take. While there are many advantages of dynamic typing (heterogeneous data structures, free duck typing, …), the simplest one is that of productivity by omission: the types of variables aren’t required to be known at compile time, so the programmer doesn’t have to expend mental energy to write them down.

However, types are still immensely useful tools for ensuring correctness and performance, so a programmer may well want to gradually add type annotations to an untyped program as she becomes convinced that a variable should be of a certain type. This is a relatively nascent idea called gradual typing that’s caught on in Python, Javascript, Julia, Clojure, Racket, Ruby, Hack, and others. For example, our program above could look like:

input_file: String = raw_input()
input_line: String = raw_input()
file: File = open(input_file, 'a')
file.write(input_line)
file.close()

5. Dynamically deallocated / statically deallocated
You can view memory management, or lifetimes, through a similar lens as types. In 2018, all programming languages should be memory safe, with the only question being whether memory deallocation is determined at compile time (i.e. like Rust, with a borrow checker) or at run time (i.e. like every other language, with a garbage collector). Garbage collection is unquestionably a productivity boost for programmers, as it’s natural that our initial program model shouldn’t have to consider exactly how long each value should live before deallocation.

However, as before, having fine-grained control over value lifetimes is still useful both for correctness and performance. Ownership and borrowing like in Rust can help structure a system to avoid data races in concurrent programming, as well as avoid the use of a costly garbage collector at runtime. Deterministic destruction can be useful for avoiding mistakes with things like mutexes. Continuing our typed example, it could look like:

input_file: String = raw_input()
input_line: String = raw_input()
file: mut File = open(&input_file, 'a')
file.write(&input_line)
file.close()

Unlike gradual typing, to my knowledge, there is little work on gradual memory management (except this whitepaper).

6. General-purpose / domain-specific
When beginning to write a program, the programmer wants to have every feature in her language of choice available for use in the implementation in order to maximize the speed of prototyping and the productivity of the creative process. This is not usually on the minds of most as they develop software, except perhaps from a code style perspective (“what subset of Python should I be using?”).

However, the emerging wave of high-performance domain-specific languages like TensorFlow, Halide, Ebb, and Weld, have identified that if the programmer only uses a small subset of general-purpose programs, e.g. differentiable pure functions on tensors, then an optimizer can produce substantially more efficient code or automatically perform backpropagation. From a gradual perspective, this suggests a possible future workflow where as the programmer gradually shrinks the subset of the language she uses in a particular part of her program, the compiler is able to provide more features or more optimized generated code.

A vision for gradual programming

The axes identified are not novel in the sense that, for example, static vs. dynamic typing is a well-worn trade-off. However, what I am demonstrating is that these are not one-time decisions that never change over the course of a program. Instead, all of these axes likely a) change over the evolution of an individual program, and b) change in a fine-grained manner, e.g. typed and untyped code should be mixed in the same system. This is anathema to the all-or-nothing approach that most languages take today: everything should be typed, or nothing should be typed. Everything is garbage collected, or nothing is garbage collected. That requires programmers to face absurd tradeoffs, like changing entire language ecosystems just to get the benefits of static typing.

In that light, advancing gradual programming entails the following research process:

  1. Identify parts of the programming process that change gradually over time, but currently require undue overhead or switching languages to adapt.
  2. Develop language mechanisms that enable programmers to gradually move along a particular axis within a homogeneous programming environment.
  3. Empirically validate against real programmers whether the proposed mechanisms match the hypothesized programming process in practice.

Each of these steps needs further investigation. I have provided an initial breakdown of my perspective on the important incremental parts of the programming process, but I have inevitably omitted many others. Several of the axes mentioned (memory management, language specialization) lack any documented attempts to systematize their treatment at the language level. I think that work on extensible compilation will help speed development of language extensions on these fronts.

Even for more well-trodden frontiers like gradual typing, papers as recent as 2016 were asking “is sound gradual typing dead?” (it’s alive and well, thank you very much). CircleCI dropped Clojure’s gradual typing after two years of use, while TypeScript is beloved by many. Although the theory is well understood and the performance increasingly so, little advice or knowledge exists on how programmers interact with gradual types. Are they easy to program? Is a partially typed program more confusing than fully typed/untyped? Can IDEs solve any such issues? And so on.

Another big question in gradual programming is inference vs. annotation. As our compilers get smarter, it becomes easier for information like types, lifetimes, and so on to be inferred by the compiler if not explicitly annotated by the programmer. However, inference engines are rarely perfect, and when they don’t work, every inference-based language feature (to my knowledge) will require an explicit annotation from the user, as opposed to inserting the appropriate dynamic checks. In the limit, I envision gradual systems have three modes of operation: for any particular kind of program information, e.g. a type, it is either explicitly annotated, inferred, or deferred to runtime. This is in itself an interesting HCI question–how can people most effectively program in a system where an omitted annotation may or may not be inferred? How does that impact usability, performance, and correctness? This will likely be another important avenue of research for gradual programming.

Overall, I’m very excited by the promise of gradual programming techniques. As they catch on, I believe programmers of all skill levels will benefit from languages which better match the way they think.

Please direct comments and criqitues to my inbox or Hacker News.

  1. See Graydon Hoare’s “What’s next?” and Stephen Diehl’s “Near Future of Programming Languages” for further discussion on this topic. 

  2. The “human” part of this may seem obvious, but there is a tradeoff in designing a language intended for consumption by computers vs. by people. Languages written by humans have the problem of being in both camps, although there are languages (e.g. LLVM) which are, for the most part, exclusively for machines. 

  3. I would prefer “incremental” programming, but incremental computation already has a different, well-defined meaning, and the PL community has coalesced on “gradual” as the appropriate term. 

  4. To my knowledge, the only prior use of the term “gradual programming” is in this position paper, and they have a somewhat similar motivation but substantively different perspective on the solution. Also note the last author, Jeremy Siek, is one of the inventors of gradual typing.