Closures are often misunderstood. Beginners and experienced programmers alike tend to mistake them for anonymous functions, or completely forget their purpose and definition.

Here I will informally introduce the concept.

Edit of 2017-08-24: added appendices A and B

The following will use plain JavaScript syntax for the code samples, unless otherwise specified. The concepts are valid in general, regardless of programming language, and do not apply to JavaScript where it makes no sense.

Environment

An environment is a set of variable bindings.

Suppose we have the following code

    let a = 0
    let b = 1
    function c() { ... }

The environment in which the function c is defined is the set of variables \{ a, b, c \}, including the function itself.

Environments can be nested, and inherit variable bindings from their parent environments. For example, in

    let a
    let b
    function c(d) {
        let e
        let a
    }

we have a global environment \{ a, b, c \}, where c also contains an environment \{d, e, a\}. The variable a in the inner environment does not refer to the variable of the same name in the outer environment; it is instead a fresh variable. This is called shadowing.

Scope

A scope is the environment where a variable binding is visible.

The term environment is often conflated with scope, but they are different concepts. An environment is a set of variables while the scope is the environment itself where a variable binding takes effect. Scope refers to a variable binding: variables have scope.

The scope of a variable defines which code has access to it. For example, in

    let a
    function b() {
        let c
    }
    function d(k) {
        let e
    }

the variable a is in the global environment, and so has global scope: every function can see its value. The variables c and e are defined in nested environments and are only accessible in the scope of the functions they are defined in, and so have local scope. The function b cannot access the variable e in the body of d. Likewise, function arguments are local to the function, so the scope of the variable k is the whole d function.

A variable is said to be in scope if there is a path from a local environment to that where the variable is defined. Generally, there are two variable lookup methods:

Lexical scoping

Lexical scoping determines the scope as the environment where a variable is defined. For example, in

    let a
    function b() {
        let c
        function d() {}
    }

the lexical scope of the variable c is the function b, and the function d has access to it.

This means that, since functions retain their lexical scope after being defined, calling them does not affect their variable bindings:

    let x = 1
    function a() {
        let x = 2
        return function b() {
            return x
        }
    }
    a()()

returns 2, since the variable x in the body of b has scope in the environment of the function a and, thus, the presence of an outer variable of the same name does not affect the result.

Lexical scoping can be fully performed by analysing the source code of a program and, thus, can be resolved at compile time.

Most languages feature lexical scoping. It originated in the early 60s, most notably with ALGOL 60, which supported nested functions with lexical scoping. It was subsequently adopted by most languages, even those without nested functions, such as C, C++, Common Lisp, Scheme, Smalltalk, JavaScript, Java, and the vast majority of modern languages.

In some languages, such as C, lexical scoping is limited to blocks, and no nested functions are possible. For example, in this C99 snippet

    void a() {
        for (int i = 0; i != n; ++i) {
            for (int j = 0; j != n; ++j) {
                fn(m[i][j]);
            }
        }
    }

i is accessible by the inner for block, while j is not accessible by the outside block. In some extensions of the C language, such as GNU C[1], it is possible to write nested functions.

Dynamic scoping

Dynamic scoping determines the scope during execution, by looking up a variable in the environment where the code is being evaluated.

    let a
    function b() {
        let c
        function d() {}
    }

Here, no variable bindings are actually performed until execution, because they are resolved as they are evaluated. So, the environment which each variable refers to depends on the actual path of evaluation. This means that, for functions, the scope of dynamic variables in their body is the environment of the caller.

    let x = 4
    function printX() {
        print(x)
    }
    function doWith5(fn) {
        let x = 5
        fn()
    }
    function doWith6(fn) {
        let x = 6
        fn()
    }
    printX()
    doWith5(printX)
    doWith6(printX)

prints, in order, 4, 5, and 6[note 1].

Dynamic scoping is used in a number of languages, notably Emacs Lisp[2] and early Lisp dialects[3] and Perl versions. It is also found in Common Lisp as optional special variables, where it is commonly used to change the behaviour of a function without redefining it. Other languages have semantics similar to dynamic scoping, for example the C preprocessor:

    int c = 0
    #define A(b) fn(b + c)
    void d() {
        int c = 1
        A(1);
    }
    void e() {
        int c = 2
        A(1);
    }

expands to

    int c = 0

    void d() {
        int c = 1
        fn(1 + c)
    }
    void e() {
        int c = 2
        fn(1 + c);
    }

because C macros apply no scoping rules[note 2]. In the resulting C code, what was originally called b in the macro A takes a different meaning depending on where it is expanded. Some Lisp macro systems exhibit similar behaviour, and the ability of a macro system to retain a scope is called hygiene.

This is also found in JavaScript, limited to this, where it is dynamically bound to the environment of the caller.

Free and bound variables

A bound variable is a formal function parameter or a local variable.
A free variable is a variable that is not bound.

The boundedness of a variable depends on the environment; a free variable in one environment may appear bound in another.

For example, regardless of scoping rules, in

    function a(b) {
        function c() {
            return b + d
        }
        return c()
    }

the variable b is bound to the enviroment in a. From the point of view of c, b is a free variable. d is free with respect to both.

Capture

Variable capture is the strategy with which a free variable is bound to an environment.

When free variables are converted to bound variables in some environment, variable capture determines what happens. There are two main capture strategies:

Capture by value

Capture by value copies the value of free variable to a local variable. The resulting local variable keeps no relationship to the original environment. For example, this C++11 snippet

    int a = 1;
    auto b = [a] { return a; }
    std::cout << a << std::endl;
    std::cout << b() << std::endl;
    a = 2;
    std::cout << b() << std::endl;

should always print 1. This is because a is copied in the closure environment, and any changes to the original variable do not affect it.

This can be effectively used by closures to create snapshots of the environments they were defined in.

Languages with by-value capture include SML, OCaml and C++[Appendix A].

Capture by reference

Capture by reference does not copy the value, but instead has a bound variable refer to the same location as the original free variable. In this C++ code,

    int a = 1;
    auto b = [&a] { return a; }
    std::cout << a << std::endl;
    std::cout << b << std::endl;
    a = 2;
    std::cout << b << std::endl;

after we change a, b returns the new value. The variable a refers to the same value.

In this case, returning a closure referencing to automatic storage is problematic, as there will be a dangling reference. In languages with garbage collection or region-based allocation, the referenced variable is kept in memory, and no dangling reference is allowed to exist.

Capture by reference allows closures to mutate their lexical environment.

Languages with by-reference capture include JavaScript, Ruby, Python, Java, C++[Appendix A], Scheme.

Closure

A closure is an instance of a function associated to an environment.

We can think of a closure as a record containing a function and a set of free variables. When the closure is created, the free variables are captured from the scope, and are bound according to the scoping rules.

Depending on the scoping rules enforced by the language, we can have lexical closures and dynamic closures. While the term closure usually refers to lexical closures and, in practice, most closure implementations are lexical, dynamic closures are seldom used; some people may even argue they are not closures at all, but in fact dynamic closures are present in a number of languages and are an important theoretical concept.

Dynamic closures

Dynamic closures can refer to two things:

  • a function that is evaluated in the dynamic environment it is created in[4]

For example,

    function a() {
        return function b() {
            return c
        }
    }
    let c = 0
    let d = a()
    d()
    function e() {
        let c = 1
        d()
    }

the function b creates dynamical closure when evaluated as a value. When called in the global environment, it returns 0. The same happens when called from e, as it captured the variable c from the (global) environment where it was in scope when created.

In this case, the function is bound during program execution to the environment where it was created, and retains its bindings when evaluated in a different environment.

  • a function that is evaluated in the dynamic environment of its caller

For example,

    function a() {
        return function b() {
            return c
        }
    }
    let c = 0
    let d = a()
    d()
    function e() {
        let c = 1
        d()
    }

the function b creates dynamical closure when evaluated as a value. When called in the global environment, it returns 0. When called from e, d returns 1 because it is now evaluated in an environment where c has a different value.

In this case, the function is bound to the immediate dynamic environment of the caller and, thus, does not retain its bindings when evaluated in a different environment. The difference with a plain function, that does not keep an environment, is that a closure must obey the capture rules of its free variables.

If the variables are captured by value, a copy of the dynamic environment is created, and the function is evaluated; functions evaluated further within this function refer now to this environment, and not that of the caller of the function itself. If the variables are captured by reference, no closure is created and it is instead an open function, as no record of the caller environment must be retained.

Modifying runtime behaviour

Dynamic binding is most useful to change the runtime behaviour of functions without redefining them. We can call a function in an environment where a dynamic variable has a specific value, and it will behave differently.

For example, in Common Lisp, we can introduce dynamic variables using defvar and defparameter:

    (defvar value 5)
    
    (defun get-value () value)
    
    (let ((value 7))
      (get-value)) ; returns 7
      
    (get-value) ; returns 5

This can be useful in many situations in Common Lisp. As an example, we see Drakma, a Common Lisp HTTP client.

Drakma can be told to dump every performed request to standard error, without changing or reconfiguring the function. Instead of passing different configuration parameters, we can simply assign to a dynamic variable, drakma:\*header-stream\*.

    (setf drakma:*header-stream* *error-output*)

We will see every request written to standard error. This is because Drakma internally checks whether the dynamic variable is defined and, if defined, acts accordingly.

The runtime behaviour of the Common Lisp parser can also be configured in the same way. Parts of the code can specify a different readtable, and they will be parsed with different rules. Common Lisp exposes the current readtable as the dynamic variable \*readtable\*[5].

Lexical closures

Lexical closures are formed when a function is evaluated[note 3] in the lexical environment it was defined in. With lexical closures, free variables are immediately bound to the lexical environment, and no additional lookup is performed.

Most languages with nested functions allow returning functions as first-class objects:

    let b = 0
    function a() {
        let b = 1
        return function c() {
            return b
        }
    }
    console.log(a())

This JavaScript snipped prints 1, because b in c refers to b local to a, and thus c evaluates to a lexical closure.

Lexical closures, unlike their dynamic counterpart, are consistent across different environments, and are commonly used to keep state, delay and store computation to be performed. They are popular in many languages, and exists anywhere there are lexical scoping rules and functions can be returned as values.

Partial and delayed evaluation

For example, in this Scheme snippet

    (let ((x 1))
     (lambda (y) (+ x y)))

a lexical closure is returned. When called, the closure performs (+ 1 y), because x is evaluated to 1 when the closure is created. This usage of lexical closures is common to implement partial evaluation: when the closure is first created, every instance of a free variable is replaced with the value, or a reference, of the same variable in the lexical environment. Then the closures is called, bound variables are evaluated, and the computation is performed.

Transformers

Suppose we have a function, and we want to modify its behaviour. We can wrap it in another function.

    function fn(x) {
        return x + 1
    }
    function a(x) {
        return fn(x * 2)
    }

This can be automated by writing a transfomer. A transformer takes a function, and returns another function that modifies its behaviour. For example,

    function consolelogify(fn) {
        return function (...args) {
            let r = fn(...args)
            console.log(r)
            return r
        }
    }

takes any function fn, and returns a function that outputs the result of the original function via console.log.

State management

Lexical closures can be used to manage hidden state.

    function makeCounter(init) {
        let count = init
        return function () {
            return count++
        }
    }
    let counter = makeCounter(0)
    console.log(counter())
    console.log(counter())

prints 0 and then 1. State local to the closure is kept hidden, and accessed only through the closure itself.

Objects

Lexical closures are equivalent to objects. See Closures as JavaScript Objects.

In the same spirit as state management, we use closures to incapsulate the object state, and return an interface through which external code interacts.

Some people may argue that a closure can only represent an object with a single method, apply. This is only true in a language where we cannot return multiple values. We can use data structures to store methods for the same state; some languages, like Go and Common Lisp, natively support multiple return values.

    function makeCounter(init) {
        let count = init
        return {
            inc: function () {
                return count++
            },
            dec: function () {
                return count--
            },
            reset: function () {
                return (count = init) 
            }
        }
    }
    let counter = makeCounter(0)
    console.log(counter.inc())
    console.log(counter.dec())

if we think this is cheating because I am returning an object to describe the interface, we can see the same holds if we use any kind of data structure, albeit a bit verbose:

    function makeCounter(init) {
        let count = init
        return [
            function () {
                return count++
            },
            function () {
                return count--
            },
            function () {
                return (count = init) 
            }
        ]
    }
    let counter = makeCounter(0)
    console.log(counter[0]())
    console.log(counter[1]())

Appendix A: C++ lambda capture

Since C++11, C++ supports lambda functions. Lambda functions are anonymous functions that are converted to lexical closures with optional variable capture that can be nested.

The general syntactic form of the C++ lambda is

    [capture-list] (argument-list)optional specifiersoptional -> return-typeoptional { body }

For example,

    [a, b] (std::string const& c) -> size_t {
        return fn(a, b, c);
    }

We will focus on the capture list.

The capture list specifies which variables are captured. By default, no variables are captured. Variables without automatic storage duration (i.e. thread_local, static) or not ODR-used within the body of the lambda can be used without being captured.

If a variable name is specified, by default it is captured by value.

    int a = 0;
    auto f = [a] {
        return a; // returns a copy of a
    }
    a = 1;
    std::cout << f() << std::endl; // prints 1, not 2

Variables can be captured by reference by prepending the variable name with &:

    int a = 0;
    auto f = [&a] {
        return a;
    }
    a = 1;
    std::cout << f() << std::endl; // prints 2, not 1

Capture types can be mixed, as in

    int a = 0;
    int b = 0;
    auto f = [a, &b] {
        return a + b;
    }
    a = 1;
    b = 4;
    std::cout << f() << std::endl; // prints 4, not 0 nor 5

To capture every free variable by value or reference, we can specify, respectively, = or &:

    int a = 0;
    int b = 0;
    auto f = [&] {
        return a + b;
    }
    a = 1;
    b = 4;
    std::cout << f() << std::endl; // prints 5
    int a = 0;
    int b = 0;
    auto f = [=] {
        return a + b;
    }
    a = 1;
    b = 4;
    std::cout << f() << std::endl; // prints 0

It is possible to capture this, by reference by default:

    this->a = 9;
    auto f = [this] {
        return this->a;
    }
    std::cout << f() << std::endl; // prints 9

or by value, as a copy, with *this.

Appendix B: JavaScript variable capture

JavaScript has lexical closures with by-reference capture. A common mistake when writing ES5 code is for loops:

    for (var i = 0; i < 10; i++) {
        setTimeout(function () {
            console.log(i)
        }, 100)
    }

since var declares a variable in the function environment and not in the for block, the same variable is captured by every closure we pass to setTimeout (or any other place). This means that, by the time the loop terminated, every closure sees the same variable with the same value (which is 10 at the end of the loop.

To avoid this, we can wrap the loop body in a function, and immediately pass the (primitive, copied by default) values by argument to manually capture them.

    for (var i = 0; i < 10; i++) {
        (function (k) {
            setTimeout(function () {
                console.log(k)
            }, 100)
        })(i)
    }

This guarantees that each closure receives a fresh instance of the value. Of course, this does not work with objects, as variables are always references to objects.

Since ES6, variables declared using let have block-scope, and each iteration creates a new environment. The same code can be written, trivially:

    for (let i = 0; i < 10; i++) {
        setTimeout(function () {
            console.log(i)
        }, 100)
    }

Notes

  • ^ this is not regular JavaScript, but one with dynamic scoping.
  • ^ except respecting the #if directives, #include, #define and #undef
  • ^ evaluated here means as a value, not called

References

For questions, comments, and corrections contact me on Telegram
Last modified: 2017-12-28 14:44:28 +0000