Editing First-class function (section)

== Concepts ==

In this section, we compare how particular programming idioms are handled in a functional language with first-class functions ([[Haskell (programming language)|Haskell]]) compared to an imperative language where functions are second-class citizens ([[C (programming language)|C]]).

=== Higher-order functions: passing functions as arguments ===
{{further information|Higher-order function}}
In languages where functions are first-class citizens, functions can be passed as arguments to other functions in the same way as other values (a function taking another function as argument is called a higher-order function). In the language [[Haskell (programming language)|Haskell]]:
<syntaxhighlight lang="haskell">
map :: (a -> b) -> [a] -> [b]
map f []     = []
map f (x:xs) = f x : map f xs
</syntaxhighlight>

Languages where functions are not first-class often still allow one to write higher-order functions through the use of features such as [[function pointer]]s or [[Delegate (CLI)|delegate]]s. In the language [[C (programming language)|C]]:
<syntaxhighlight lang="c">
void map(int (*f)(int), int x[], size_t n) {
    for (int i = 0; i < n; i++)
        x[i] = f(x[i]);
}
</syntaxhighlight>

There are a number of differences between the two approaches that are ''not'' directly related to the support of first-class functions. The Haskell sample operates on [[List (computing)|list]]s, while the C sample operates on [[Array data structure|arrays]]. Both are the most natural compound data structures in the respective languages and making the C sample operate on linked lists would have made it unnecessarily complex. This also accounts for the fact that the C function needs an additional parameter (giving the size of the array.) The C function updates the array [[in-place]], returning no value, whereas in Haskell data structures are [[persistent data structure|persistent]] (a new list is returned while the old is left intact.) The Haskell sample uses [[recursion]] to traverse the list, while the C sample uses [[iteration]]. Again, this is the most natural way to express this function in both languages, but the Haskell sample could easily have been expressed in terms of a [[fold (higher-order function)|fold]] and the C sample in terms of recursion. Finally, the Haskell function has a [[Polymorphism (computer science)|polymorphic]] type, as this is not supported by C we have fixed all type variables to the type constant <code>int</code>.

=== Anonymous and nested functions ===
{{further information|Anonymous function|Nested function}}
In languages supporting anonymous functions, we can pass such a function as an argument to a higher-order function:
<syntaxhighlight lang="haskell">
main = map (\x -> 3 * x + 1) [1, 2, 3, 4, 5]
</syntaxhighlight>

In a language which does not support anonymous functions, we have to bind it to a name instead:
<syntaxhighlight lang="c">
int f(int x) {
    return 3 * x + 1;
}

int main() {
    int list[] = {1, 2, 3, 4, 5};
    map(f, list, 5);
}
</syntaxhighlight>

=== Non-local variables and closures ===
{{further information|Non-local variable|Closure (computer science)}}

Once we have anonymous or nested functions, it becomes natural for them to refer to variables outside of their body (called ''non-local variables''):
<syntaxhighlight lang="haskell">
main = let a = 3
           b = 1
        in map (\x -> a * x + b) [1, 2, 3, 4, 5]
</syntaxhighlight>

If functions are represented with bare function pointers, we can not know anymore how the value that is outside of the function's body should be passed to it, and because of that a closure needs to be built manually. Therefore we can not speak of "first-class" functions here.

<syntaxhighlight lang="c">
typedef struct {
    int (*f)(int, int, int);
    int a;
    int b;
} closure_t;

void map(closure_t *closure, int x[], size_t n) {
    for (int i = 0; i < n; ++i)
        x[i] = (closure->f)(closure->a, closure->b, x[i]);
}

int f(int a, int b, int x) {
    return a * x + b;
}

void main() {
    int l[] = {1, 2, 3, 4, 5};
    int a = 3;
    int b = 1;
    closure_t closure = {f, a, b};
    map(&closure, l, 5);
}
</syntaxhighlight>

Also note that the <code>map</code> is now specialized to functions referring to two <code>int</code>s outside of their environment. This can be set up more generally, but requires more [[boilerplate code]]. If <code>f</code> would have been a [[nested function]] we would still have run into the same problem and this is the reason they are not supported in C.<ref>"If you try to call the nested function through its address after the containing function has exited, all hell will break loose." ([https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Nested-Functions.html#Nested-Functions GNU Compiler Collection: Nested Functions])</ref>

=== Higher-order functions: returning functions as results ===
When returning a function, we are in fact returning its closure. In the C example any local variables captured by the closure will go out of scope once we return from the function that builds the closure. Forcing the closure at a later point will result in undefined behaviour, possibly corrupting the stack. This is known as the [[upwards funarg problem]].

=== Assigning functions to variables ===
[[Assignment (computer science)|Assigning]] functions to [[variable (computer science)|variables]] and storing them inside (global) datastructures potentially suffers from the same difficulties as returning functions.
<syntaxhighlight lang="haskell">
f :: [[Integer] -> [Integer]]
f = let a = 3
        b = 1
     in [map (\x -> a * x + b), map (\x -> b * x + a)]
</syntaxhighlight>

=== Equality of functions ===

As one can test most literals and values for equality, it is natural to ask whether a programming language can support testing functions for equality. On further inspection, this question appears more difficult and one has to distinguish between several types of function equality:<ref>[[Andrew W. Appel]] (1995). [http://www.cs.princeton.edu/~appel/papers/conteq.pdf "Intensional Equality ;=) for Continuations"].</ref>

; [[Extensional equality]]: Two functions ''f'' and ''g'' are considered extensionally equal if they agree on their outputs for all inputs (∀''x''. ''f''(''x'') = ''g''(''x'')). Under this definition of equality, for example, any two implementations of a [[stable sorting algorithm]], such as [[insertion sort]] and [[merge sort]], would be considered equal. Deciding on extensional equality is [[undecidable problem|undecidable]] in general and even for functions with finite domains often intractable. For this reason no programming language implements function equality as extensional equality.

; [[Intensional equality]]: Under intensional equality, two functions ''f'' and ''g'' are considered equal if they have the same "internal structure". This kind of equality could be implemented in [[interpreted language]]s by comparing the [[source code]] of the function bodies (such as in Interpreted Lisp 1.5) or the [[object code]] in [[compiled language]]s. Intensional equality implies extensional equality (assuming the functions are deterministic and have no hidden inputs, such as the [[program counter]] or a mutable [[global variable]].)

; [[Reference equality]]: Given the impracticality of implementing extensional and intensional equality, most languages supporting testing functions for equality use reference equality. All functions or closures are assigned a unique identifier (usually the address of the function body or the closure) and equality is decided based on equality of the identifier. Two separately defined, but otherwise identical function definitions will be considered unequal. Referential equality implies intensional and extensional equality. Referential equality breaks [[referential transparency]] and is therefore not supported in [[purity (computer science)|pure]] languages, such as Haskell.