Chapter 4 Control inversion

Programs often need to traverse a data structure and perform some action on each element of the structure. For example, here is C code that prints the elements of a linked list of integers:

for (list l = lst; l != NULL; l = l->tail) printf("%d\n", l->head);

This code mixes two concerns: traversing the list and performing the “print” action on each element. Iterators are a way to better separate these two concerns. They abstract over the traversal of the data structure, making it easier to reuse with different actions.

Iterators come in two flavors: internal iterators, favored by functional languages, and external iterators, favored by imperative and object-oriented languages.

Internal iterators are higher-order functions that take as arguments the data structure to iterate over and the function to apply to elements of the data structure. The simplest iterator over lists, called List.iter in OCaml, applies a function to each list element and discards the results of the function. We can use it to print a list of integers:

List.iter (fun n -> printf "%d\n" n) lst

There are many other useful iterators over lists, such as List.map, which applies a function to each element of a list and returns the list of results, and List.fold_left, which performs a reduction over a list using a two-argument function. Their OCaml types and definitions follow:

let rec iter : ('a -> unit) -> 'a list -> unit = fun f l -> match l with [] -> () | h :: t -> f h; iter f t let rec map : ('a -> 'b) -> 'a list -> 'b list = fun f l -> match l with [] -> [] | h :: t -> f h :: map f t let rec fold_left : ('res -> 'a -> 'res) -> 'res -> 'a list -> 'res = fun f accu l -> match l with [] -> accu | h :: t -> fold_left f (f accu h) l

External iterators are “cursor” objects that represent positions within data structures, in an abstract, data structure-independent way. In Java for example, iterator objects have two methods: next, which returns the value at the current position and moves the iterator to the next position, and hasNext, which indicates whether there is a next element or we have reached the end of the structure. Iterators are typically used within loops that repeat while hasNext is true:

for (Iterator<Int> i = lst.iterator(); i.hasNext(); ) { System.out.println(i.next()); }

This programming idiom is so common that Java (version 1.5 and later) provides special support for it:

for (int n : lst) { System.out.println(n); }

Python provides many constructs and library functions that use iterators: not only for loops, but also comprehensions and many collective operations such as sum, min, max, and map.

Control inversion. There is a fundamental difference between the two types of iterators: internal iterators are functions that call the user-provided action, while external iterators are objects whose methods are called from user-provided code. This difference is called an inversion of control, often summarized by the catch phrase “don’t call us, we’ll call you!”.

The inverted control provided by external iterators makes it easy to traverse several data structures at the same time. Doing this in the internal iteration style usually requires writing new iterators, and they can be difficult to implement. For example, consider the “same fringe problem”: determining whether two binary search trees contain the same sets of values, even if the two trees are balanced differently. Here is a simple solution in Java using external iterators:

boolean same_fringe(TreeSet<T> s1, TreeSet<T> s2) { Iterator<T> i1 = s1.iterator(); Iterator<T> i2 = s2.iterator(); while (i1.hasNext() && i2.hasNext()) { if (! i1.next().equals(i2.next())) return false; } return ! i1.hasNext() && ! i2.hasNext(); }

It is much more difficult to solve the same fringe problem using a direct recursion over the two trees. Readers are encouraged to give it a try.

Implementing external iterators is easy in an object-oriented language: just use instance variables of the iterator object to “remember where we are” as we traverse the data structure.

class ArrayIterator<T> { private T[] arr; private int i; boolean hasNext() { return i < arr.length; } T next() { T res = arr[i]; i++; return res; } ArrayIterator(T [] arr) { this.arr = arr; this.i = 0; } }

We have underlined the parts of the code that would also occur in a direct traversal of the array, using a regular for loop:

for (int i = 0; i < arr.length; i++) { process(arr[i]); }

Iterators are also easy to implement in functional-imperative languages such as Scheme and OCaml. For example, here is an iterator over an array written in OCaml:

let array_iterator (arr: 'a array) : unit -> 'a option = let i = ref 0 in fun () -> if !i >= Array.length arr then None else (let res = arr.(!i) in incr i; Some res)

The result of array_iterator arr is a function that returns the next array element (if any) on each call. The position in the iteration is maintained in internal mutable state (the i reference, which is free in the function).

We can also define a Python-style next function that applies to any iterator and raises an exception to signal the end of the iteration:

exception StopIteration let next (iter: unit -> 'a option) : 'a = match iter() with Some x -> x | None -> raise StopIteration

In a pure functional language, the next operation cannot update the cursor passed as an argument; instead, it should return a cursor pointing to the next element along with the current element. This leads to a presentation of iterators as enumerations, i.e. lists of elements that are evaluated on demand. For example, in OCaml:

type 'a iter = unit -> 'a enum and 'a enum = Done | More of 'a * 'a iter

If i is an iterator, i () triggers the evaluation of the head of the enumeration, which is either Done if we reached the end of the iteration, or More(x, j) if the next element in the iteration is x and the rest of the iteration is j. We can write iterators over arrays and trees in this style:

let array_iterator (arr: 'a array) : 'a iter = let rec iter i () = if i >= Array.length arr then Done else More(arr.(i), iter (i+1)) in iter 0 type 'a tree = Leaf | Node of 'a tree * 'a * 'a tree let tree_iterator (t: 'a tree) : 'a iter = let rec iter t k () = match t with | Leaf -> k | Node(l, x, r) -> iter l (More(x, iter r k)) () in iter t Done

Programming with pure iterators is similar to programming with lists. For example, here is the “same fringe” problem again, expressed as comparing two lists that are evaluated on demand:

let rec same_iter (i1: 'a iter) (i2: 'a iter) : bool = match i1(), i2() with | Done, Done -> true | More(h1, t1), More(h2, t2) -> h1 = h2 && same_iter t1 t2 | _, _ -> false let same_fringe (t1: 'a tree) (t2: 'a tree): bool = same_iter (tree_iterator t1) (tree_iterator t2)

Section 7.2 shows a systematic way to derive pure external iterators from internal iterators, using the so-called CPS transformation of functional programs. Control operators can also be used for this purpose, as shown in sections 8.7 and 10.3.

4.2 Generators

Generators are iterators written in “direct style”, as opposed to the explicit style described in section 4.1 above, where instance variables or other mutable state explicitly carry the iteration state from one call to the next. In direct style, a generator is written as a normal function that yields multiple results, one after the other.

def elements(a): for i in range(len(a)): yield a[i]

The yield statement produces the next value in the iteration, and transfers control back to the user of the iterator. However, unlike a return statement in a normal function, yield also saves the execution state of elements (namely, the local variables a and i, and the program point just after the yield). When another value of the iteration is requested, the execution of elements resumes where it left off, just after the yield statement. The iteration ends when the elements function returns, after the end of the for loop.

for i in elements([1,2,3]): print(i)

It can also be used more explicitly, using the next statement to retrieve the next value in the iteration:

g = elements([1,2]) # returns a generator print(next(g)) # prints "1" print(next(g)) # prints "2" print(next(g)) # raises the StopIteration exception

This style of generators, with the yield keyword that produces one value at a time, was introduced in 1975 with the CLU language. CLU generators can only be used in for loops and are not first-class values, but enjoy a efficient, stack-based implementation (Atkinson et al., 1978). The Icon string processing language makes intensive use of generators, which are presented as expressions with multiple values and are usable as first-class values (Griswold, 1988). The Python language popularized yield-style generators, even though they were added late in the design process of Python (Schemenauer et al., 2001)

Generating an infinite sequence on demand is a typical use of generators. For example, here is a Python generator that produces the sequence of prime numbers:

def primes(): """Generator for prime numbers""" p = [2]; yield 2 m = 3 while True: i = 0 while i < len(p) and p[i] * p[i] <= m: if m % p[i] == 0: break i += 1 else: p.append(m); yield m m += 2

This is a sieve algorithm that tests successive odd numbers 3, 5, 7, … for primality, using the local array p to hold the primes already found. Note the infinite loop (⁠while True): the iteration never ends by itself; it is the user of the generator who eventually stops calling next. For example, here is a use of the primes generator to perform the prime factorization of a number:

def factor(n): """Print the prime number factorization of n""" pr = primes() while n > 1: p = next(pr) while n % p == 0: print(p); n = n // p

Compiling generators. Generators are typically implemented as iterator objects. Each local variable of the generator becomes an instance variable of the object, so that it persists between calls to the next method. yield statements become return statements from the next method. However, we need to remember “where we are” in the generator code, so that the next call to next will resume where the previous call left off. To do this, we need one additional instance variable, called pc in the example below, which contains the address of a label, if the target language supports Fortran I-style computed jumps, or some other identifier for program points.

For example, consider the following Python generator, with two yield statements:

def generator(): n = 0 while True: yield n; yield (-n); n += 1

Here is an implementation as an iterator object in C++ with GNU extensions (first-class labels and computed goto):

class Generator { int n; void * pc = NULL; public: int next(void) { if (pc) goto *pc; n = 0; while (1) { pc = &&next1; return n; next1: pc = &&next2; return (-n); next2: n += 1; } } };

Each yield statement becomes a return statement followed by a unique label, e.g. next1. The variable pc is set to the address of this label, as in pc = &&next1, just before returning. At the next call, the dispatcher if (pc) goto *pc at the beginning of the next method branches to the label following the previously executed return. When next is called for the first time, pc is null and the code of the generator is entered normally and executed from the top until it reaches a yield statement.

This implementation can also be written in plain C++ without labels as first-class values, using unique integers for labels and a switch statement as the dispatcher:

class Generator { int n; enum { START, NEXT1, NEXT2 } pc = START; public: int next(void) { switch (pc) { case START: n = 0; while (1) { pc = NEXT1; return n; case NEXT1: pc = NEXT2; return (-n); case NEXT2: n += 1; } } } };

This code takes advantage of a peculiar feature of switch statements in C and C++: case and default declarations are treated like goto labels and can be declared within other control structures, such as the while (1) loop in this example. This programming idiom is known as “Duff’s device”.

Stackful generators vs. stackless generators. We say that a generator is stackless if yield statements occur only in the body of the generator, and stackful if yield statements can also occur in functions that are called from the body of the generator. All the examples of generators seen so far are stackless. As an example of a stackful generator, consider traversing a binary tree to enumerate the values at its nodes. In pseudocode:

The statement yield t.value is executed inside n recursive calls to traverse, where n is the depth of the corresponding node.

The implementation of generators described above is limited to stackless generators. To support stackful generators, we would need to somehow store the call stack that goes from the generator to the yield in persistent storage before returning a value, and then restore that call stack when the next function is called. The “stackful” terminology comes from this need to persist part of the call stack between runs of the generator. Implementing stackful generators requires either specific support in the target language, such as control operators (chapter 8 and section 10.3), or a more complex program transformation, such as translation to CPS (section 6.5).

Python generators are always stackless: syntactically, any function containing a yield statement is itself a generator, independent of any generator that may have called that function. For example, consider the naive Python transcription of the binary tree traversal above:

def tree_elements(t): if t: tree_elements(t.left) yield t.value tree_elements(t.right)

This does not traverse the tree: the recursive calls tree_elements(t.left) and tree_elements(t.right) return two new generators that are not used; therefore, no values from t.left or t.right are emitted, only the value at the top node t.

This example can be made to work by using the Python construct yield from on the recursive calls to tree_elements:

def tree_elements(t): if t: yield from tree_elements(t.left) yield t.value yield from tree_elements(t.right)

⁠yield from g⁠  behaves like ⁠for v in g⁠: yield v. The values produced by the generator g are forwarded and inserted in the sequence of values produced by the current generator. The value of a node at depth n in the tree is thus forwarded n times. This is potentially less efficient than a proper implementation of stackful iterators, but remains compatible with the simple stackless implementation as iterator objects described above.

4.3 Coroutines and cooperative threads

The word “coroutine” was introduced by Conway (1963) to describe “subroutines who act as the master program”, and later used by Dahl et al. (1972) to refer to “a method of combining two pieces of program […] operating in some sense at the same level”, as opposed to Algol procedures, which impose a “form of master/subordinate relationship”. Today, “coroutining” refers to all forms of interleaved execution of multiple code fragments that go beyond function calls and returns. More specifically, we can distinguish between:

Symmetric coroutines vs. threads. Here is an example of symmetric coroutines, written in Python-like syntax. It is a producer-consumer scheme, where one coroutine produces items and adds them to a queue, and the other coroutine consumes the items.

q = Queue(maxsize = 100) coroutine produce(): while True: while not q.full(): item = build(); q.put(item) yield to consume coroutine consume(): while True: while not q.empty(): item = q.get(); use(item) yield to produce produce()

Compare this to the same producer-consumer scheme written using cooperative threads:

thread produce(): while True: if not q.full(): item = build(); q.put(item) yield thread consume(): while True: if not q.empty(): item = q.get(); use(item) yield spawn(produce); spawn(consume)

The coroutine-based code fully specifies the interleaving of computations between the producer and the consumer: first the producer fills the queue with items; only then does it transfer control to the consumer, which empties the queue and then restarts the producer. In contrast, the thread-based code leaves some of the interleaving of computations to the scheduler: at each iteration of one of the while loops, the yield statement can pass control to the other thread, or continue within the current thread. A round-robin scheduling strategy ensures that the program makes progress, but other, less fair strategies can result in “livelock”, where one thread never makes progress. (The code can also be rewritten to use condition variables so as to prevent livelock regardless of the scheduling strategy.)

Expressiveness of coroutines. At first glance, it might seem that asymmetric coroutines, symmetric coroutines and cooperative threads implement different types of control transfers, and that no type subsumes the other two. The analysis by de Moura and Ierusalimschy (2009) shows that this is not the case. They identify three design dimensions for coroutines: asymmetric vs. symmetric (depending on the meaning of yield); stackful vs. stackless (depending on where yield can occur); and first-class values vs. limited uses (such as within for loops only). Then, they prove that asymmetric, stackful, first-class coroutines have the expressiveness of one-shot delimited continuations and can, therefore, encode symmetric coroutines, cooperative threads, and many other advanced control structures. Delimited continuations and the one-shot (linearity) restriction are described in sections 8.5 and 10.2. For now, we will describe only two of the encodings.

To encode symmetric coroutines using asymmetric coroutines, we replace each yield to C statement, where C is a coroutine identifier, with a yield of the value C. This yield gives control back to a trampoline: a trivial scheduler that immediately gives control to the value C produced by the yield. For example, the producer-consumer implementation using symmetric coroutines shown above becomes:

def produce_gen(): global consume while True: while not q.full(): item = build(); q.put(item) yield consume def consume_gen(): global produce while True: while not q.empty(): item = q.get(); use(item) yield produce produce = produce_gen() consume = consume_gen() c = produce while True: c = next(c)

The variable c holds the value of the coroutine to be executed next. It is initialized with the first coroutine to run, which is produce in our example. Then, the trampoline repeatedly passes control to coroutine c, by calling next(c), and updates c with the value returned by next(c), which is the next coroutine to execute.

To encode cooperative threads using asymmetric coroutines, we write a scheduler that maintains a set of coroutines, one per thread, and repeatedly selects one coroutine c from this set, restarts it with next(c), and puts it back into the set. For example, here is a round-robin scheduler:

q = Queue() q.put(produce); q.put(consume) # spawning two threads while not q.empty(): c = q.get() try: next(c) q.put(c) except StopIteration: pass

Our scheduler handles the case where a thread returns normally instead of by a yield statement: in this case, it considers the thread to have terminated and does not push it back into the queue.

4.4 Further reading

Scott and Aldrich (2025) gives a good introduction to iterators (section 6.5) and coroutines (section 9.5). Chapter 3 of Dahl et al. (1972) describes the “hierarchical program structures” of Simula, the language that introduced both coroutines and object-oriented classes. Johnson and Foote (1988) is one of the first papers to point out the role of inversion of control in well-designed object-oriented frameworks. de Moura and Ierusalimschy (2009) describe various types of coroutines in a unified framework based on delimited continuations.

Chapter 4 Control inversion: iterators, generators, coroutines

4.1 Iterators

4.2 Generators

4.3 Coroutines and cooperative threads

4.4 Further reading