Editing Tail call (section)

==Implementation methods==
Tail recursion is important to some [[high-level programming language|high-level languages]], especially [[functional programming|functional]] and [[logic programming|logic]] languages and members of the [[Lisp programming language|Lisp]] family. In these languages, tail recursion is the most commonly used way (and sometimes the only way available) of implementing iteration. The language specification of Scheme requires that tail calls are to be optimized so as not to grow the stack. Tail calls can be made explicitly in [[Perl]], with a variant of the "goto" statement that takes a function name: <code>goto &NAME;</code><ref>{{cite web|author=Contact details |url=http://perldoc.perl.org/functions/goto.html |title=goto |publisher=perldoc.perl.org |access-date=2013-03-21}}</ref>

However, for language implementations which store function arguments and local variables on a [[call stack]] (which is the default implementation for many languages, at least on systems with a [[hardware stack]], such as the [[x86]]), implementing generalized tail-call optimization (including mutual tail recursion) presents an issue: if the size of the callee's activation record is different from that of the caller, then additional cleanup or resizing of the stack frame may be required. For these cases, optimizing tail recursion remains trivial, but general tail-call optimization may be harder to implement efficiently.

For example, in the [[Java virtual machine]] (JVM), tail-recursive calls can be eliminated (as this reuses the existing call stack), but general tail calls cannot be (as this changes the call stack).<ref>"[https://stackoverflow.com/questions/12045299/what-is-difference-between-tail-calls-and-tail-recursion What is difference between tail calls and tail recursion?]", ''Stack Overflow''</ref><ref>"[http://programmers.stackexchange.com/questions/157684/what-limitations-does-the-jvm-impose-on-tail-call-optimization What limitations does the JVM impose on tail-call optimization]", ''Programmers Stack Exchange''</ref> As a result, functional languages such as [[Scala (programming language)|Scala]] that target the JVM can efficiently implement direct tail recursion, but not mutual tail recursion.

The [[GNU Compiler Collection|GCC]], [[Clang|LLVM/Clang]], and [[Intel C Compiler|Intel]] compiler suites perform tail-call optimization for [[C (programming language)|C]] and other languages at higher optimization levels or when the <code>-foptimize-sibling-calls</code> option is passed.<ref name="llvm-documentation-tco">{{cite web |last1=Lattner |first1=Chris |title=LLVM Language Reference Manual, section: The LLVM Target-Independent Code Generator, sub: Tail Call Optimization |url=http://llvm.org/docs/CodeGenerator.html#tail-call-optimization |website=The LLVM Compiler Infrastructure |publisher=The LLVM Project |access-date=24 June 2018 |ref=llvm.org}}</ref><ref>{{cite web|url=https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html|title=Using the GNU Compiler Collection (GCC): Optimize Options|website=gcc.gnu.org}}</ref><ref>{{cite web |url=https://software.intel.com/en-us/node/522809 |title=foptimize-sibling-calls|website=software.intel.com}}</ref> Though the given language syntax may not explicitly support it, the compiler can make this optimization whenever it can determine that the return types for the caller and callee are equivalent, and that the argument types passed to both function are either the same, or require the same amount of total storage space on the call stack.<ref>{{cite web|url=http://www.drdobbs.com/tackling-c-tail-calls/184401756|title=Tackling C++ Tail Calls}}</ref>

Various implementation methods are available.

===In assembly===
{{unreferenced section|date=June 2014}}
Tail calls are often optimized by [[interpreter (computing)|interpreters]] and [[compiler]]s of [[functional programming]] and [[logic programming]] languages to more efficient forms of [[iteration]]. For example, [[Scheme (programming language)|Scheme]] programmers commonly express [[while loop]]s as calls to procedures in tail position and rely on the Scheme compiler or interpreter to substitute the tail calls with more efficient [[jump (computer science)|jump]] instructions.<ref>{{cite web | url=https://gcc.gnu.org/ml/gcc/2000-07/msg00595.html | title=proper tail recursion for gcc | publisher=GCC Project | date=20 July 2000 | access-date=10 March 2015 | author=Probst, Mark}}</ref>

For compilers generating assembly directly, tail-call elimination is easy: it suffices to replace a call opcode with a jump one, after fixing parameters on the stack. From a compiler's perspective, the first example above is initially translated into pseudo-[[assembly language]] (in fact, this is valid [[x86 assembly language|x86 assembly]]):

<syntaxhighlight lang="asm">
 foo:
  call B
  call A
  ret
</syntaxhighlight>

Tail-call elimination replaces the last two lines with a single jump instruction:

<syntaxhighlight lang="asm">

 foo:
  call B
  jmp  A
</syntaxhighlight>

After subroutine <code>A</code> completes, it will then return directly to the return address of <code>foo</code>, omitting the unnecessary <code>ret</code> statement.

Typically, the subroutines being called need to be supplied with [[parameter (computer science)|parameter]]s. The generated code thus needs to make sure that the [[call frame]] for A is properly set up before jumping to the tail-called subroutine. For instance, on [[computing platform|platform]]s where the [[call stack]] does not just contain the [[return statement|return address]], but also the parameters for the subroutine, the compiler may need to emit instructions to adjust the call stack. On such a platform, for the code:

 '''function''' foo(data1, data2)
    B(data1)
    '''return''' A(data2)

(where <code>data1</code> and <code>data2</code> are parameters) a compiler might translate that as:{{efn| The <code>call</code> instruction first pushes the current code location onto the stack and then performs an unconditional jump to the code location indicated by the label. The <code>ret</code> instruction first pops a code location off the stack, then performs an unconditional jump to the retrieved code location. }}

<syntaxhighlight lang="nasm" line>
 foo:
   mov  reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register.
   push reg            ; put data1 on stack where B expects it
   call B              ; B uses data1
   pop                 ; remove data1 from stack
   mov  reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register.
   push reg            ; put data2 on stack where A expects it
   call A              ; A uses data2
   pop                 ; remove data2 from stack.
   ret
</syntaxhighlight>

A tail-call optimizer could then change the code to:

<syntaxhighlight lang="nasm" line>
 foo:
   mov  reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register.
   push reg            ; put data1 on stack where B expects it
   call B              ; B uses data1
   pop                 ; remove data1 from stack
   mov  reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register.
   mov  [sp+data1],reg ; put data2 where A expects it
   jmp  A              ; A uses data2 and returns immediately to caller.
</syntaxhighlight>

This code is more efficient both in terms of execution speed and use of stack space.

===Through trampolining===
Since many [[Scheme (programming language)|Scheme]] compilers use [[C (programming language)|C]] as an intermediate target code, the tail recursion must be encoded in C without growing the stack, even if the C compiler does not optimize tail calls. Many implementations achieve this by using a device known as a [[Trampoline (computers)|trampoline]], a piece of code that repeatedly calls functions. All functions are entered via the trampoline. When a function has to tail-call another, instead of calling it directly and then returning the result, it returns the address of the function to be called and the call parameters back to the trampoline (from which it was called itself), and the trampoline takes care of calling this function next with the specified parameters. This ensures that the C stack does not grow and iteration can continue indefinitely.

It is possible to implement trampolines using [[higher-order function]]s in languages that support them, such as [[Groovy (programming language)|Groovy]], [[Visual Basic .NET]] and [[C Sharp (programming language)|C#]].<ref name="onyourtail">Samuel Jack, [http://blog.functionalfun.net/2008/04/bouncing-on-your-tail.html Bouncing on your tail]. ''Functional Fun''. April 9, 2008.</ref>

Using a trampoline for all function calls is rather more expensive than the normal C function call, so at least one Scheme compiler, [[Chicken (Scheme implementation)|Chicken]], uses a technique first described by [[Henry Baker (computer scientist)|Henry Baker]] from an unpublished suggestion by [[Andrew Appel]],<ref name="Chicken">Henry Baker, [http://home.pipeline.com/~hbaker1/CheneyMTA.html "CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A."]</ref> in which normal C&nbsp;calls are used but the stack size is checked before every call. When the stack reaches its maximum permitted size, objects on the stack are [[garbage collection (computer science)|garbage-collected]] using the [[Cheney algorithm]] by moving all live data into a separate heap. Following this, the stack is unwound ("popped") and the program resumes from the state saved just before the garbage collection. Baker says "Appel's method avoids making a large number of small trampoline bounces by occasionally jumping off the Empire State Building."<ref name="Chicken" /> The garbage collection ensures that mutual tail recursion can continue indefinitely. However, this approach requires that no C function call ever returns, since there is no guarantee that its caller's stack frame still exists; therefore, it involves a much more dramatic internal rewriting of the program code: [[continuation-passing style]].