Chapter 5 Functional languages

Thus far, we have taken for granted that source code must explicitly describe the sequencing of computations in a program, and that programming languages must provide control structures to express this sequencing. However, this assumption is questioned by declarative languages, which leave the sequencing of computations mostly implicit in the source code and rely on the compiler to determine the correct order of computations. In other words, declarative languages emphasize the what (what is to be computed?) rather than the how (how to break the computation in elementary steps? in which order to perform these steps?).

An example of the declarative approach is database query languages such as SQL: the query describes which records to pull from the database; how to search the database is left to the database management system to decide. Other declarative programming paradigms include logic programming (Prolog, Datalog), pure functional programming (Haskell, Agda), and dataflow programming (Simulink, Lustre). Hardware description languages such as Verilog and VHDL are also declarative in nature.

Declarative languages were introduced in the 1970s with the goal of simplifying programming and “liberating it from the von Neumann style”, as Backus (1978) put it. In the 1980s, the focus shifted to parallel computation: it was hoped that declarative languages would be easier to execute in parallel than standard imperative languages, precisely because the former give the compiler more flexibility in scheduling computations. Since the 1990s, declarative languages have been recognized for their safety and their proximity to formal verification.

Can declarative languages do without control structures? Do they “liberate” programmers from the burden of expressing control in their programs? This chapter attempts to answer these questions in the context of pure functional programming, by examining three little languages with increasing expressiveness: XL, a spreadsheet language; APP, an applicative language with functions distinct from values; and FUN, a functional language with functions as values.

5.2 XL: expressions and spreadsheets

Expressions with sharing. Consider the following language of arithmetic expressions and equations, nicknamed “XL”:

Expressions: e	::=	0 ∣ 1.2 ∣ 3.1415 ∣ …	constants
	∣	x ∣ y ∣ z ∣ …	variables
	∣	op(e₁, …, e_n)	operations
Programs: p	::=	{ x₁ = e₁; …; x_n = e_n }	sets of equations

Arithmetic expressions such as x + 1 and y × z are built from constants and variables using operations such as +, −, ×, /, etc. Programs are sets of equations between variables and expressions.

The use of variables that are bound to expressions by equations captures a notion of sharing of computations. For instance, the two programs

compute the same results (in a sense to be made precise later), but p₁ can evaluate 2 × 3 only once and share the result 6 in the evaluation of x + x, while p₂ evaluates 2 × 3 three times.

Spreadsheets such as the one shown in figure 5.1 can be viewed as visual displays of sets of equations:

The variables are the spreadsheet cells B2, C6, etc. Cells containing manually-entered values represent trivial equations variable = constant. Cells containing formulas give rise to nontrivial equations variable = formula.

The acyclicity condition. We want XL programs to be easy to evaluate. This means avoiding equations that are difficult to solve, such as x = x³ − 2x² + 2. We go much further and rule out all equations x = e where e depends on x, either directly or indirectly through other equations, as in the following examples:

In other words, the equations must not contain a dependency cycle. This acyclicity condition holds if and only if the program can be written not just as a set of equations, but as an ordered list of equations

Circuits and dataflow graphs. Figures 5.2 and 5.3 show two alternative presentations of the spreadsheet of figure 5.1. The first is a circuit, with variables (cells) represented by wires and operators by gates. The acyclicity condition corresponds to the absence of loops in the circuit. In other words, this must be a combinatorial circuit.

The other alternative presentation is a dataflow graph: a directed acyclic graph (DAG) with constants and operators as nodes, and edges representing the flow of data (from a node to an operator that requires its value). The acyclicity condition corresponds to the absence of cycles in the graph.

Program evaluation. Evaluating an XL program { x₁ = e₁; …; x_n = e_n } consists in determining the numerical value v_i of each variable x_i. These are the values that are displayed in the cells of the corresponding spreadsheet.

Without committing to a specific evaluation algorithm, we can describe all possible evaluations as successive reductions of the program. Each reduction rewrites the program as another, simpler program. The evaluation process is complete when the program has been rewritten as a set of trivial equations { x₁ = v₁; …; x_n = v_n }, from which the value v_i of the variable x_i is readily apparent. Here are the two reduction rules that can be used:

These rules can be applied anywhere in the right-hand side of an equation. Rule (unfold) is analogous to the “copy rule” of Algol (section 3.2): it states that a variable x can always be replaced by the expression e associated with x by an equation.

Rule (prim) describes the evaluation of an operator op applied to values v₁, …, v_n, which is replaced with its result value v. We write op^* for the interpretation of the operator op. For example, if we take +^* to be the floating-point addition of 32-bit floating-point numbers, we have 123000 + 0.005 → 123000.008.

In a given program, reduction rules can often be applied to different subexpressions, resulting in multiple reduction sequences for this program:

However, all reduction sequences terminate with the same fully-evaluated program. This is a consequence of the following confluence property of rules (prim) and (unfold):

Theorem 1 (confluence) If p → p₁ and p → p₂, there exists a program p′ such that p₁ →^* p′ and p₂ →^* p′.

Evaluation strategies. Although all reduction sequences produce the same result, some sequences are more costly than others — sometimes exponentially so. Consider the following program:

Using a “call by name” evaluation strategy, we replace each x_i with its definition before performing any arithmetic evaluations. This results in an intermediate state

of size 𝒪(2ⁿ), which then needs 𝒪(2ⁿ) reductions by rule (prim) before evaluation is complete.

Using a “call by value” evaluation strategy, we reduce the expression that defines x_i to a value v_i before replacing x_i with v_i. This results in the following intermediate states:

For the XL language in this section, the “call by value” strategy is optimal. This strategy is captured by the following restriction of rule (unfold):

If the program is represented as an ordered list that respect dependencies, this strategy can be implemented by the following evaluation function, which returns the final, fully-evaluated program:

We write eval(e) for the value of a closed expression e (one that does not contain any variables), obtained in linear time by repeated applications of rule (prim).

5.3 APP: expressions and user-defined functions

Adding function definitions. We now extend the XL language of section 5.2 with the ability to define functions F with one or more parameters x₁, …, x_n, and to call these functions within expressions.

Expressions: e	::=	true ∣ false ∣ 0 ∣ 1.2 ∣ …	constants
	∣	x ∣ y ∣ z ∣ …	variables
	∣	if e₁ then e₂ else e₃	conditional
	∣	op(e₁, …, e_n)	built-in operations
	∣	F(e₁, …, e_n)	function application
Definitions: def	::=	x = e	variable definition
	∣	F(x₁, …, x_n) = e	function definition
Programs: p	::=	def ; … ; def

To make examples more interesting, we also added Boolean values and a conditional expression if e₁ then e₂ else e₃, which returns the value of e₂ or the value of e₃ depending on the truth value of e₁.

To facilitate evaluation, we write programs as ordered lists of definitions, with the constraint that a definition can only mention function parameters and variables defined by previous definitions, but can mention functions defined later. This criterion ensures the absence of dependency cycles that do not go through a function application, while still allowing for recursive functions.

For example, here is a function F that computes the Fibonacci sequence, and a definition x for the 20th number in this sequence.

Here is an alternate definition of the Fibonacci sequence, closer to an iterative algorithm:

Notice that the APP language supports general recursion. Computations may not terminate (e.g. G(−1, 0, 1) in the example above). Indeed, APP contains Kleene’s partial recursive functions and is therefore Turing-complete.

Control structures in APP. On the one hand, the APP language is undoubtedly declarative in nature. In particular, there is no explicit sequencing of computations, only data dependencies that force certain operators to be evaluated before others. For instance, in the expression (1 + 2) × 3, the addition must be performed before the multiplication.

On the other hand, APP definitely supports two essential control structures: conditionals and loops. Conditionals are supported by the expression if e₁ then e₂ else e₃, which is similar to an if…then…else statement in an imperative language (i.e. the then or the else branch is executed depending on the value of the Boolean expression e₁).

Loops are supported by tail-recursive functions, i.e. functions where the recursive calls are performed as the final action of the function. For example, the following tail-recursive function

Many types of loops can be expressed as tail-recursive functions, including for loops, while loops, do…while loop, with single-level break and continue.

Mutually-recursive functions can express even more complex forms of loops, which would require goto jumps in imperative languages. For example,

is similar to the following unstructured imperative code, whose control-flow graph is irreducible:

if (n < 0) { n = -n - 1; goto Odd; } else { goto Even; } Even: if (n == 0) return true; n = n - 1; Odd: if (n == 0) return false; n = n - 1; goto Even;

More generally, any flowchart can be expressed as a set of mutually-recursive functions, as observed by McCarthy (1960, section 6).

Evaluation strategy. For conditionals and loops to work as expected, we must constrain the evaluation strategy beyond just “any order of evaluation that satisfies the data dependencies”. As a trivial example, if…then…else conditional expressions must not evaluate both the then and else branches before selecting one of their values; otherwise, a loop such as

would diverge (not terminate) for any value of n, evaluating infinitely many recursive calls to G with n−1, n−2, …

Clearly, the conditional expression if e₁ then e₂ else e₃ must not be “strict” in its three arguments: e₁ should first be evaluated to a Boolean b before starting the evaluation of e₂ (if b is true) or of e₃ (if b is false). It is less obvious how to evaluate function applications: should they be “strict” in their arguments or not? We will return to this question in the next section, where it will be discussed in the context of the richer FUN language.

5.4 FUN: functions as first-class values

Function abstractions in expressions. Another way to extend the XL language of section 5.2 with user-defined functions is to add function abstractions λ x . e and function applications e₁ e₂ to the language of expressions e. There is no need to distinguish between function names and variable names: a variable x can represent any value, including functions. Unlike in APP, functions can be nested and refer to free variables of enclosing functions.

The syntax for this extension, called FUN, is summarized in figure 5.4. The expression λ x . e represents the function that maps the argument x to the result e. A function that takes n arguments x₁, …, x_n can be defined as λ x₁ . λ x₂ … λ x_n . e. Applying such a function f to n arguments e₁, …, e_n is written as n−1 applications to a single argument

A multi-argument function like f is called a curried function, in honor of logician H. B. Curry. If we add operators (+, −, etc) as constants, we can also view arithmetic operations such as x + y as curried function applications such as ((+) x) y.

The syntax in figure 5.4 does not include definitions and appears to lack support for recursion. As will be explained later, these features can be expressed in terms of the core constructs of the FUN language. Nevertheless, the following examples will use recursive definitions.

Higher-order functions. In the FUN language, functions are first-class values: they can be used anywhere a numerical or Boolean value can be used, including as arguments to other functions.

Functions that take functions as arguments are called higher-order functions. They provide new ways to compose and reuse sub-programs. For instance, function composition f ∘ g is definable as a higher-order function, as is function exponentiation f ∘ ⋯ ∘ f (n times):

Here is a generic loop that iterates the function step, starting with x, until the predicate stop becomes true:

Higher-order functions can express generic traversals of data structures such as lists. For instance, Map applies a function to each element of a list and Fold reduces a list using a two-argument function.

Encoding definitions. The examples above show FUN programs as lists of definitions x₁ = e₁ ; … ; x_n = e_n. However, it is not necessary to include these definitions in the abstract syntax of FUN, since they can be encoded as expressions.

A non-recursive definition x = e ; … becomes a let binding let x = e in …, which is just a function application:

A simply-recursive definition f = e becomes a let rec binding let rec f = e in e′, which is itself defined using a fixed-point combinator Y:

Finally, mutually-recursive definitions can be expressed in terms of simply-recursive definitions. Here, we show the encoding for two mutually recursive definitions, f = e₁; g = e₂.

Reduction rules. As in section 5.2, we view the evaluation of a FUN expression as a sequence of reductions. Each reduction rewrites the expression in order to perform one step of computation. Here are the reduction rules we need:

The rule (λ x . e) e′ → e [x := e′] is known as the beta reduction of the lambda calculus. It captures the essence of function application: the formal parameter x is replaced with the effective argument e′ in the body e of the function.

In rule (prim) and elsewhere, we write v for a value, which can be either a constant or a function abstraction.

Values: v

::=

c ∣ λ x . e

Here are some examples of reduction sequences. We take Δ = λ x . x x and Ω = Δ Δ.

As shown by examples (ii) and (iv), some terms have infinite reduction sequences, which means divergence (non-termination). As shown by examples (iii) and (iv), a given term can have both finite and infinite reduction sequences, depending on which subterm we choose to reduce at each step.

These examples demonstrate that the reduction rules alone do not uniquely determine the meaning of a FUN program. Therefore, we must specify a reduction strategy that constrains where and when the reduction rules can be applied to evaluate an expression.

Evaluation strategies. All functional programming languages use only weak reductions: they prohibit reductions within function abstractions (“under a lambda”) and in the then and else branches of a conditional; instead, they wait until the function is applied or the Boolean condition of the if-then-else is known before reducing further. As mentioned at the end of section 5.3, non-weak reductions, also called strong reductions, are problematic in combination with general recursion, since they often lead to non-termination.

Restricting ourselves to weak reductions still leaves us with a choice of strategy for function applications:

Call by name avoids divergence as much as possible, but may duplicate computations. For example, writing Fib for the recursive Fibonacci function and Ω for an expression that diverges, we have:

Call by value avoids repeated evaluations, such as the one of Fib 20 in the last example above, but it can diverge unnecessarily:

Call by need attempts to offer the best of both worlds: it avoids divergence just like call by name, and it avoids repeated evaluations just like call by value. However, the sharing of evaluations comes with conceptual difficulties and implementation costs.

The two occurrences of Fib 20 come from the substitution of the same variable x. Therefore, their evaluations are shared: when the evaluation of one occurrence terminates with value 6765, the other occurrence is known to have value 6765 as well.

Haskell uses call by need as the default strategy for function application. Most other functional programming languages (Lisp, Scheme, SML, OCaml, etc.) use call by value, as do all modern imperative and object-oriented languages. Call by name played an important role in Algol 60 (see section 3.2); today, it is only used for compile-time macro expansion and template expansion.

Encoding evaluation strategies. Even if our language is call by value, we can implement call by name semantics by passing arguments e as thunks: functions λ z . e where z is a fresh variable not used in e. As long as we stick to weak reduction, e will not be reduced in the thunk λ z . e until it is applied to a dummy argument.

This idea gives rise to a systematic program transformation 𝒩(e) that ensures that all function applications in e use call by name:

We obtain call by need instead of call by name if we use memoized thunks (lazy e in OCaml) instead of plain thunks λ z . e.

Implementing call by value in a call by name or call by need language is more challenging. One approach is to use a CPS (continuation-passing style) transformation, as explained in section 6.5. Haskell provides primitive operations that force the evaluation of a function argument, such as the seq operator and the $! strict application operator.

5.5 Reductions under contexts

How can we formalize and define an evaluation strategy with mathematical precision? The approach we follow throughout this book is to define evaluation in terms of reductions under restricted contexts.

Contexts. A context is an expression with a hole, written as [ ]. For example, (1 + [ ]) × 3 is a context.

If C is a context and e an expression, C[e] denotes the expression obtained by replacing the hole [ ] in C with e. For example, if C = (1 + [ ]) × 3 and e = 2/5, then C[e] = (1 + 2/5) × 3.

Reductions under contexts. Using contexts, we can more precisely explain what it means to apply a reduction rule “anywhere in an expression”, as we have done so far. We define whole-program reductions p → p′ as head reductions e →^ε e′ under reduction contexts C, as follows.

For example, if 1 + 2 →^ε 3 is a head reduction, then 4 × (1 + 2) → 4 × 3, using the context C = 4 × [ ].

If we allow any context to be used as the reduction context C, whole-program reduction can take place anywhere in the program. However, we can constrain where reductions can take place by restricting which contexts C can be used as reduction contexts, thus specifying a reduction strategy.

This grammar does not allow λ x . C nor if e₁ then C else e₂ nor if e₁ then e₂ else C as reduction contexts. Therefore, reductions cannot take place in a function body (“under a lambda”) or in the then or else branches of a conditional.

This prevents the reduction of an application until its argument has been evaluated to a value v. This is the essence of call by value.

Optionally, we may want to specify an evaluation order for applications e₁ e₂. Left-to-right evaluation, where we evaluate the function part e₁ before the argument part e₂, is enforced by the following grammar of reduction contexts:

We allow v C as a context but not e C: the argument part of the application can only be reduced if the function part has already been evaluated.

This makes it possible to pass arguments e′ unevaluated to functions. Additionally, call by name ensures that arguments cannot be evaluated before being passed by restricting evaluation contexts as follows:

There is no context that allows reducing e′ in (λ x . e) e′; the only reduction that can take place here is the (β) rule.

Call by need is difficult to describe in terms of reduction rules. The problem lies in the β rule (λ x . e) e′ →^ε e [x := e′]: all occurrences of the parameter x in the function body e must share the same argument e′, to ensure that when one occurrence of x is replaced by the value v of e′, all other occurrences are replaced as well. One solution is to represent expressions by directed acyclic graphs (DAGs) instead of terms, and view reduction rules as graph rewriting rules instead of term rewriting rules. This approach uses the same reduction rules and the same reduction contexts as for call by name.

Ariola et al. (1995) develop another approach to call by need, using variable bindings in terms to represent sharing. The language of expressions is enriched with explicit let bindings:

Expressions: e

::=

… ∣ let x = e₁ in e₂

Function application let-binds the function argument to the function parameter, instead of substituting one for the other:

It is only when the argument e′ has been reduced to a value v that this value v can be substituted for x:

Reduction contexts are those of call by name, extended with two cases for let bindings:

The context let x = C in C′[x] captures the essence of call by need: we can reduce the expression bound to a variable x only if the value of this variable is needed for the rest of the program to make progress, i.e. only if the rest of the program can be written as C′[x] for some evaluation context C′. Since contexts are recursive, these two types of let contexts can capture arbitrarily long chains of dependencies, as in let x = C in let y = C′[x] in C″[y].

5.6 Further reading

Chapter 1 of Structure and Interpretation of Computer Programs, the textbook by Abelson and Sussman (1996), is a classic introduction to declarative programming from a functional perspective. The textbook by Van Roy and Haridi (2004) discusses declarative programming in greater details, particularly from the perspectives of logic and relational programming.

Part 1 of Pierce (2002) explains untyped functional languages (such as our FUN toy language) and their formal semantics. The use of reduction contexts was introduced by Wright and Felleisen (1994) and is explained again and compared with other styles of operational semantics in chapter 5 of Harper (2016).

Part 2 of Peyton Jones (1987) describes the implementation of lazy evaluation using graph reduction.