Chapter 11 Monads

Monads are a popular programming pattern in functional languages: they allow programmers to express effects that the language does not natively support, such as mutable state in Haskell and undelimited continuations in OCaml. Monads are also a powerful semantic device: they provide a generic account of the propagation and sequencing of effects in programming languages. The presentation of monads given in this chapter emphasizes the semantic aspects, since they serve as the starting point for the theory of algebraic effects presented in chapter 12.

11.1 Commonalities in semantics and program transformations

Denotational semantics proceed by associating mathematical objects ⟦ s ⟧ with syntactic objects such as the statements s of an imperative language. As shown in sections 6.2 and 6.3, the domain of denotations depends on the features of the programming language we need to describe. For example, for the simple imperative language IMP, we have considered four different domains:

In (1), we start with the simple idea that a statement s is a store transformer, where a store maps variables to values. In (2), we add “bottom” (⊥) results to account for loops that do not terminate. In (3), we switch to continuation-passing style (CPS), which makes the continuation of the statement s explicit. Finally, in (4), we add an environment as a parameter, e.g. to associate continuations with code labels.

As the domain of denotation changes, so do the denotations of the basic statements, even though their intuitive semantics remain the same. For example, here are the denotations of skip (“do nothing”) and x := e (assignment) in the four domains above:

In all cases, ⟦ skip ⟧ returns the initial store σ unchanged, and ⟦ x:=e ⟧ returns σ after updating the value of x, but the way to return these stores differs.

This phenomenon is even more striking in the case of control structures such as sequencing s₁; s₂.

In all cases, the store at the end of the execution of s₁, if it terminates, becomes the initial store for the execution of s₂. However, there are four different ways to express this intuition in mathematical terms.

At this point, the question is: can we give denotational semantics to basic constructs such as assignment and sequencing in such a way that the semantic equations can remain the same when the semantic domain changes?

Program transformations are another way of giving semantics to programming languages, by translating them into simpler, better understood languages. For example, the CPS transformations of section 6.5 clarify the differences between call by name and call by value in functional languages.

As in the case of denotational semantics, program transformations for a functional language such as FUN share some commonalities. Consider the call-by-value CPS transformation 𝒞 from section 6.5, the double-barreled CPS transformation 𝒞₂ from section 8.4, and the ERS (exception-returning style) transformation ℰ also from section 8.4. Here are the translations for constants, variables, and function abstractions:

In all cases, the value for the expression is immediately returned as the result, but how a value is returned differs between the three transformations: by passing it to the continuation (𝒞), or to the first continuation (𝒞₂), or by wrapping it in a V constructor (ℰ).

All three transformations first evaluate the function e₁, binding its value to a variable f, then evaluate the argument e₂, binding its value to v, and finally apply f to v. The transformations differ in how they bind the value produced by an evaluation, and how they handle evaluations that produce exceptions.

At this point, the question is: can we share the definitions of the transformations for basic constructs such as function abstraction and function application between several program transformations, even though these transformations represent computations differently and handle different language extensions?

11.2 Monads and the computational lambda-calculus

Eugenio Moggi answered the two questions above in the affirmative by introducing the computational lambda-calculus and its categorical semantics based on monads (Moggi, 1989, Moggi, 1991). The computational lambda-calculus is intended to serve as a “meta-language” for denotational semantics and for program transformations: it is the language in which the right-hand sides of denotational equations ⟦ s ⟧ = … and transformation equations 𝒞(e) = … are written.

Values: v	::=	c	constants
	∣	x	variables
	∣	λ x . M	function abstraction
Computations: M, N	::=	v₁ v₂	function application
	∣	if v then M else N	conditional
	∣	val v	trivial computation
	∣	do x ⇐ M in N	sequencing of computations
	∣	…	monad-specific operations

This syntax maintains a strict separation between values and computations. Values are the end results produced by computations; computations are processes that ultimately produce values, but may perform a variety of effects along the way, such as divergence (non-termination), input-output, exceptions, backtracking (non-determinism), etc. In the memorable words of Paul Blain Levy: “values are; computations do.”.

Values include constants (Booleans, integers, etc.), variables, and function abstractions. Computations include function applications and if-then-else conditionals. Since the function part and the argument part of an application are both forced to be values, applications use call by value and are insensitive to evaluation order. val v is the trivial computation that immediately produces the value v. do x ⇐ M in N executes the computations M and N in sequence, with x bound to the value produced by M during the execution of N. This takes care of propagating effects. Different kinds of effects can be produced by adding more constructors for computations, as shown in section 11.3.

Typing. The separation between values and computations is also apparent in the type system of the computational lambda-calculus, with a type T A denoting computations that produce values of type A.

Value types: A	::=	ι	base types (bool, int, …)
	∣	A → B	function types
Computation types: B	::=	T A	computation producing A values

Kleisli triples. Different types of effects can be modeled in the computational lambda-calculus by different interpretations of the type constructor T. For example, T X = X gives a pure language without effects, where a computation is just a value, while T X = 𝒫(X) (sets of X) models non-determinism, where a computation produces a set of possible values.

Any interpretation of T must be accompanied by appropriate ret and bind operations: for all types X, Y,

ret v, also written η(v) in Moggi (1991), injects a value into the world of computations. It defines the semantics of the val v construct of the computational lambda-calculus.

bind M F, also written F^*(M) in Moggi (1991), transforms a function F: X → T Y from values to computations into a function F^*: T X → T Y from computations to computations, and applies it to the computation M. The semantics of do x ⇐ M in N is, then, given by bind M (λ x . N).

Grouped together, (T, ret, bind) must form a so-called Kleisli triple. In particular, ret and bind must satisfy the following three laws:

Roughly speaking, (M1) and (M2) state that ret is a neutral element for bind, and (M3) states that bind is associative.

These three laws of Kleisli triples are known as “the three monadic laws” in the world of programming languages, for the reasons given below.

Connections with monads. In category theory, a monad is a triple (T, η, µ) where T is a functor and

For intuition, think of T as a container type like T X = 𝒫(X) (sets of X). Then, η is the injection of an element x into a singleton container {x}, and µ is the “flattening” of a container of containers into a container:

Symmetrically, given a Kleisli triple (T, ret, bind), we can define a monad over T by taking

The connection between monads and Kleisli triples is so close that programming language people use the word “monad” for Kleisli triples: a programming language monad is a type constructor T equipped with ret and bind operations that satisfy the three laws (M1), (M2) and (M3) above.

Equational theory. It is not really useful to give a reduction semantics to the computational lambda-calculus: such a semantics would depend a lot on the particular monad used to interpret val and do. Instead, Moggi (1991) develops an equational theory for the computational lambda calculus that can prove that two terms are computationally equivalent. Equivalence between terms, written ≡, is a congruence relation that includes the following rules:

The first three equivalences are “symmetrized” versions of the familiar reduction rules for function applications and for conditionals. The last three equivalences are the three monadic laws.

11.3 Examples of monads

We now describe several monads that implement a variety of effects. Each monad is described by its Kleisli triple (T, ret, bind) and by the specific operations it adds to the computational lambda-calculus, such as exception raising and exception handling for the exception monad.

The partiality monad supports computations that do not terminate, representing them by the special “bottom” result ⊥. The typical operation in this monad is a fixed-point combinator

This monad can also be used to represent fatal errors, with ⊥ representing an uncatchable exception.

The nondeterminism monad supports computations that can produce several different results, such as a random draw, as well as computations that fail and produce no result. Computations are represented as sets of values. Operations of this monad include choose (non-deterministic choice) and fail (failure).

The exception monad supports computations that either produce a value or raise an exception. bind M F propagates exceptions raised by M, skipping the evaluation of F. The main operations are raise, to raise and exception, and handle, to catch and handle exceptions arising from a computation:

The continuation monad makes the continuation of a computation of type T A explicit, as a function A → Res, where Res is the type of the result of the whole program. This monad supports control operators such as callcc:

We write Cont A for A → Res, the type of continuations expecting a value of type A.

The state monad supports imperative variables and mutable data structures. The memory state is represented by a store: a mapping from memory locations to their current values. A computation takes its initial store as an extra parameter and returns its final store as an extra result. The bind M F operation threads the store through the evaluation, giving M’s final store to F as initial store.

The typical operations in the state monad are get, to read the current value of a reference (memory location), and set, to update the value of a reference.

The environment monad, also called the reader monad, is a simpler version of the state monad where computations take an environment ρ (a mapping of references to values) as argument, but do not return a modified environment. Consequently, bind M F transmits the same environment ρ to M and to F.

The typical environment monad operations are lookup ℓ, to get the value associated with ℓ in the environment, and assign ℓ v M, to temporarily set ℓ to v in the environment used to evaluate M.

The output monad, also called the writer monad or the logging monad, collects the character strings that are output during a computation. ret produces the empty output ε, while bind M F concatenates the outputs produced by M and by F. The typical operation in this monad is print, to output the given string:

The writer monad can use any monoid (a type equipped with a concatenation operation and an identity element) instead of character strings.

Probabilities. For nondeterministic computations, we can also assign probabilities to the possible outcomes of the computation. For example, we can provide a biased coin toss flip p that returns true with probability p and false with probability 1−p.

The distribution monad extends the nondeterminism monad by assigning a probability for each possible result value. Thus, the set 𝒫(X) of possible results becomes a function X → [0,1] that gives, for each x: X, the probability of getting x as a result. These probabilities must sum up to 1.

In particular, the probability that bind M F returns the value x is the sum over all intermediate values v of the join probability that M returns v and F v returns x.

The expectation monad extends the continuation monad by treating the continuation as a measure µ: X → [0,1] for the set 𝒫(X) of possible results. (See section 7.6 for examples of this approach.)

Monads for combined effects. There is no general construction for composing two monads. However, given two or more effects that we want to support at the same time, it is usually easy to come up with an appropriate monad. Here are some examples of monads for combined effects. (We leave the definitions of ret and bind to the reader as exercises.)

Alternatively, a monad transformer can be used to add the features of a monad to any monad given as argument to the transformer. See Newbern et al. (2025) for examples.

11.4 The monadic transformation

A monadic transformation translates a functional language in the computational lambda-calculus. Here is the translation ℳ(e) of an expression e of the core FUN language:

This translation is independent of the specific monad that is used to give meaning to the val and do constructs of the computational lambda-calculus. However, if we use the continuation monad, val v becomes λ k . k v and do x ⇐ M in N becomes λ k . M (λ x . N k), and we recover the call-by-value CPS transformation 𝒞. Similarly, if we use the exception monad, we recover the ERS transformation ℰ.

Call by value vs. call by name. While independent of the target monad, a monadic transformation captures a specific evaluation strategy. The transformation ℳ above implements call by value: the argument e₂ of a function application e₁ e₂ is reduced to a value (using a do binding) before being passed to the function e₁; consequently, variables x are bound to values. In contrast, here is a monadic translation 𝒩(e) that implements call by name:

The essence of call by name is that variables are bound to computations, not values, and the argument e₂ of a function application e₁ e₂ is passed unevaluated as 𝒩(e₂). To accommodate the syntactic restrictions of our computational lambda calculus, we need to pass thunks λ () . 𝒩(e₂) instead of computations 𝒩(e₂), and apply these thunks to () to recover the underlying computation. This is the same trick we used in section 5.4 to encode call by name in a call by value language.

Adding effects to FUN. While leaving the translation of the core FUN language unchanged, we can add cases to the translation to handle extensions of FUN with new constructs. For example, to extend FUN with recursive function definitions, which can diverge, we add the following case:

and we interpret the transformed term in a partiality monad. Similarly, to extend FUN with exception handling, we add two cases to the translation:

Typing. The monadic transformation ℳ preserves the simple types that can be assigned to source FUN terms, in the following sense:

Here, ↑τ is the simple type τ where function result types are prefixed with T:

In particular, if e is a closed program of type bool, for example, its monadic translation ℳ(e) is a closed program of type T bool.

The call-by-name transformation 𝒩 also preserves simple types. The only difference with call by value is that free variables and function arguments are now suspended computations, so their types must be prefixed with unit → T. More precisely:

Semantic preservation. The monadic transformation ℳ preserves the call-by-value semantics of FUN, as does the 𝒩 transformation for the call-by-name semantics.

Theorem 1 For all terms e and constants c,

if e →_v^* c then ℳ(e) ≡ val c
if e →_n^* c then 𝒩(e) ≡ val c

We outline a proof for the call-by-value case. First, we define the translation v of a FUN value v by

This definition allows us to strengthen the claim: if e →_v^* v and v is a value, then ℳ(e) ≡ val v.

Finally, we prove that if e →_v e′, then ℳ(e) ≡ ℳ(e′). The result follows from the transitivity of ≡ and properties (A) and (B). We show the case of a β-reduction.

11.5 Programming with monads

Continuations were first introduced as a semantic tool (chapter 6), but quickly became popular for programming in functional languages using continuation-passing style (chapter 7). Similarly, monads were introduced by Moggi (1989) as a semantic tool, but quickly became an important functional programming technique, following the work of Wadler (1992) and of the Haskell language community. In this monadic style of programming, programs are written in pure functional style with references to the ret, bind and specific operations of a monad, and then linked with an appropriate implementation of the monad, often provided by standard libraries.

Using non-native effects. A first reason to use monads in functional programming is to provide effects that are not natively supported by the functional language being used. For example, Haskell is a pure functional language, so it natively supports partiality (general recursive functions), but no other effect. Input/output, mutable state, errors/exceptions and control operators are provided through monads defined in the standard library (Newbern et al., 2025). Some of these monads are implemented in pure Haskell; others, such as the IO monad, use imperative implementations with special support from the compiler and runtime system, but the monad interface guarantees that they behave like a purely functional implementation.

As another example, OCaml natively supports I/O, mutable state and exceptions, but did not provide control operators until the introduction of effect handlers in OCaml 5. Previously, the Lwt and Async libraries for lightweight threads and asynchronous I/O relied on a monadic programming style using a continuation monad (Vouillon, 2008).

Finally, the F* proof-oriented functional language natively supports only pure, terminating programs. All effects, including partiality (general recursion) are provided through a hierarchy of monads, which also structure the way functions are specified and verified (Swamy et al., 2016).

Structuring programs with monads. Monads encourage a style of programming in which some of the low-level details are hidden by a monad, thus emphasizing the high-level structure of the code.

For example, many enumeration problems are best expressed as nondeterministic computations in a monad of lists. Consider the problem of constructing the list of all permutations of a given list. Here is a direct, non-monadic solution in Haskell:

permut :: [a] -> [[a]] permut [] = [[]] permut (x:xs) = concat (map (insert x) (permut xs)) insert :: a -> [a] -> [[a]] insert x [] = [[x]] insert x (y:ys) = (x:y:ys) : map (y:) (insert x ys)

It is not immediately obvious what this code does, with all the operations over lists of lists. Here is the same algorithm, re-expressed as a nondeterministic computation (“find any permutation of the given list”) to be performed in the List monad:

permut :: [a] -> [[a]] permut [] = [[]] permut (x:xs) = do ys <- permut xs; insert x ys

The code above reads more easily: to find a permutation of x:xs, find a permutation ys of xs and insert x at any position within ys. We can also make insert clearer:

insert :: a -> [a] -> [[a]] insert x ys = return (x:ys) ++ case ys of [] -> [] y:ys -> do zs <- insert x ys; return (y:zs)

This code reads as follows: to insert x at any position in ys, either insert it at the head (x:ys) or, if ys is not empty, insert x at any position in the tail of ys.

As another example, consider the problem of backtracking imperative actions such as assignments to references and arrays in an OCaml program. A solution (courtesy of Paul-Elliot Anglès d’Auriac) is to define a monad of “undoable” computations, viewed as pairs of the value of the computation and an “undo” function for reverting the imperative actions performed by the computation:

type 'a undo = 'a * (unit -> unit) let ret (x: 'a) : 'a undo = (x, (fun () -> ())) let bind (m: 'a undo) (f: 'a -> 'b undo) : 'b undo = let (x, undo1) = m in let (y, undo2) = f x in (y, (fun () -> undo2(); undo1())) let array_set (arr: 'a array) (idx: int) (new_v: 'a) : unit undo = let old_v = arr.(idx) in arr.(idx) <- new_v; ((), (fun () -> arr.(idx) <- old_v))

The undo monad is a variant of the writer monad, using undo functions of type unit -> unit as output instead of strings. The ret operation produces the identity undo function fun () -> (). An imperative operation such as array assignment array_set produces the function that restores the previous state of the array. The bind operation composes (in the correct order) the undo functions of the two computations that are executed in sequence.

Monad polymorphism. Many monadic functions can be used with any monad, or with any monad that support a particular effect. Such monad-polymorphic functions are easy to define and reuse in Haskell, thanks to its type class mechanism. For example, here is a generic monadic map function over lists:

mmap :: Monad m => (t -> m a) -> [t] -> m [a] mmap f [] = return [] mmap f (x:xs) = do y <- f x; ys <- mmap f xs; return (y:ys)

Note the type of mmap: it applies to any instance m of the Monad type class. Used with an exception monad, it propagates exceptions raised by the mapped function. Used with the I/O monad, it sequences the I/O operations performed by that function. Used with the identity monad, it is equivalent to the standard, non-monadic map function over lists.

As another example, consider the following liftM2 operation, which lifts a pure function of two arguments into a monadic function:

liftM2 :: Monad m => (t1 -> t2 -> b) -> m t1 -> m t2 -> m b liftM2 f xx yy = do x <- xx; y <- yy; return (f x y)

When used with the Maybe monad, liftM2 f returns Just (f x y) if its arguments are Just x and Just y, and Nothing if either argument is Nothing. But when used with the List monad, liftM2 f l1 l2 returns the list of all applications of f to an element of l1 and an element of l2. For example, liftM2 (+) [1,2,3] [4,5,6,7] is [5,6,7,8,6,7,8,9,7,8,9,10].

11.6 Further reading

There are many tutorials on programming with monads. Wadler (1995) is one of the earliest but is still a good introduction. Newbern et al. (2025) covers monadic programming in Haskell in depth, including the use of monad transformers to construct monads that support multiple effects. Clarkson et al. (2025, section 8.8) describes monadic programming in OCaml.

In addition to monads, there are other “design patterns” in functional programming that are inspired by category theory: comonads (Uustalu and Vene, 2008), applicative structures (McBride and Paterson, 2008), and arrows (Hughes, 2000). Applicative structures and arrows provide ways of composing effectful computations that are more flexible than the strict sequencing inherent in the bind operation of monads. Comonads describe “coeffects”, i.e. ways in which a computation depends on its environment.

The theory of algebraic effects, presented in chapter 12, can be seen as the next step after the theory of monads: monads only account for the propagation and sequencing of effects, while algebraic effects also account for the generation and the handling of effects.