Tail Calls, Optimization, and ES6

In this penultimate post about the stack, we take a quick look at tail
calls, compiler optimizations, and the proper tail calls landing in the
newest version of JavaScript.

A tail call happens when a function F makes a function call as its final
action. At that point F will do absolutely no more work: it passes the ball to
whatever function is being called and vanishes from the game. This is notable
because it opens up the possibility of tail call optimization: instead of
creating a new stack frame for the function call, we can simply reuse
F‘s stack frame, thereby saving stack space and avoiding the work involved in
setting up a new frame. Here are some examples in C and their results compiled
with mild optimization:

Simple Tail Callsview raw

int add5(int a)
{
        return a + 5;
}

int add10(int a)
{
        int b = add5(a); // not tail
        return add5(b); // tail
}

int add5AndTriple(int a) {
        int b = add5(a); // not tail
        return 3 * add5(a); // not tail, doing work after the call
}

int finicky(int a) {
        if (a > 10) {
                return add5AndTriple(a); // tail
        }

        if (a > 5) {
                int b = add5(a); // not tail
                return finicky(b); // tail
        }

        return add10(a); // tail
}

You can normally spot tail call optimization (hereafter, TCO) in compiler output
by seeing a jump instruction where a call would have been
expected. At runtime TCO leads to a reduced call stack.

A common misconception is that tail calls are necessarily
recursive. That’s not the case: a tail call may be recursive,
such as in finicky() above, but it need not be. As long as caller F is
completely done at the call site, we’ve got ourselves a tail call. Whether it
can be optimized is a different question whose answer depends on your
programming environment.

“Yes, it can, always!” is the best answer we can hope for, which is famously the
case for Scheme, as discussed in SICP (by the way, if when you program
you don’t feel like “a Sorcerer conjuring the spirits of the computer with your
spells,” I urge you to read that book). It’s also the case for Lua. And
most importantly, it is the case for the next version of JavaScript, ES6, whose
spec does a good job defining tail position and clarifying the few
conditions required for optimization, such as strict mode. When
a language guarantees TCO, it supports proper tail calls.

Now some of us can’t kick that C habit, heart bleed and all, and the answer
there is a more complicated “sometimes” that takes us into compiler optimization
territory. We’ve seen the simple examples above; now let’s resurrect
our factorial from last post:

Recursive Factorialview raw

#include 

int factorial(int n)
{
	int previous = 0xdeadbeef;	

	if (n == 0 || n == 1) {
		return 1;
	}

	previous = factorial(n-1);
	return n * previous;
}

int main(int argc)
{
	int answer = factorial(5);
	printf("%d\n", answer);
}

So, is line 11 a tail call? It’s not, because of the multiplication by n
afterwards. But if you’re not used to optimizations, gcc’s
result with O2 optimization might shock you: not only it
transforms factorial into a recursion-free loop, but the
factorial(5) call is eliminated entirely and replaced by a
compile-time constant of 120 (5! == 120). This is why debugging optimized
code can be hard sometimes. On the plus side, if you call this function it will
use a single stack frame regardless of n’s initial value. Compiler algorithms
are pretty fun, and if you’re interested I suggest you check out
Building an Optimizing Compiler and ACDI.

However, what happened here was not tail call optimization, since there was
no tail call to begin with. gcc outsmarted us by analyzing what the function
does and optimizing away the needless recursion. The task was made easier by
the simple, deterministic nature of the operations being done. By adding a dash
of chaos (e.g., getpid()) we can throw gcc off:

Recursive PID Factorialview raw

#include 
#include 
#include 

int pidFactorial(int n)
{
        if (1 == n) {
                return getpid(); // tail
        }

        return n * pidFactorial(n-1) * getpid(); // not tail
}

int main(int argc)
{
        int answer = pidFactorial(5);
        printf("%d\n", answer);
}

Optimize that, unix fairies! So now we have a regular
recursive call and this function allocates O(n) stack
frames to do its work. Heroically, gcc still does TCO for getpid
in the recursion base case. If we now wished to make this function tail recursive,
we’d need a slight change:

tailPidFactorial.cview raw

#include 
#include 
#include 

int tailPidFactorial(int n, int acc)
{
        if (1 == n) {
                return acc * getpid(); // not tail
        }

        acc = (acc * getpid() * n);
        return tailPidFactorial(n-1, acc); // tail
}

int main(int argc)
{
        int answer = tailPidFactorial(5, 1);
        printf("%d\n", answer);
}

The accumulation of the result is now a loop and we’ve
achieved true TCO. But before you go out partying, what can we say about the
general case in C? Sadly, while good C compilers do TCO in a number of cases,
there are many situations where they cannot do it. For example, as we saw in our
function epilogues, the caller is responsible for cleaning up the
stack after a function call using the standard C calling convention. So if
function F takes two arguments, it can only make TCO calls to functions taking
two or fewer arguments. This is one among many restrictions. Mark Probst wrote
an excellent thesis discussing Proper Tail Recursion in C where he
discusses these issues along with C stack behavior. He also does
insanely cool juggling.

“Sometimes” is a rocky foundation for any relationship, so you can’t rely on TCO
in C. It’s a discrete optimization that may or may not take place, rather than
a language feature like proper tail calls, though in practice the compiler
will optimize the vast majority of cases. But if you must have it, say for
transpiling Scheme into C, you will suffer.

Since JavaScript is now the most popular transpilation target, proper tail calls
become even more important there. So kudos to ES6 for delivering it along with
many other significant improvements. It’s like Christmas for JS programmers.

This concludes our brief tour of tail calls and compiler optimization. Thanks
for reading and see you next time.

Print Share Comment Cite Upload Translate

APA

() » Tail Calls, Optimization, and ES6. Retrieved from https://www.truth.cx/2014/05/23/tail-calls-optimization-and-es6/.

MLA

" » Tail Calls, Optimization, and ES6." - , https://www.truth.cx/2014/05/23/tail-calls-optimization-and-es6/

HARVARD

» Tail Calls, Optimization, and ES6., viewed ,

VANCOUVER

- » Tail Calls, Optimization, and ES6. [Internet]. [Accessed ]. Available from: https://www.truth.cx/2014/05/23/tail-calls-optimization-and-es6/

CHICAGO

" » Tail Calls, Optimization, and ES6." - Accessed . https://www.truth.cx/2014/05/23/tail-calls-optimization-and-es6/

IEEE

" » Tail Calls, Optimization, and ES6." [Online]. Available: https://www.truth.cx/2014/05/23/tail-calls-optimization-and-es6/. [Accessed: ]

Select a language:

<html><head></head><body>In this penultimate post about the stack, we take a quick look at tail
calls, compiler optimizations, and the proper tail calls landing in the
newest version of JavaScript.
A tail call happens when a function <code>F</code> makes a function call as its final
action. At that point <code>F</code> will do absolutely no more work: it passes the ball to
whatever function is being called and vanishes from the game. This is notable
because it opens up the possibility of tail call optimization: instead of
<a href="https://manybutfinite.com/post/journey-to-the-stack" title="Journey to the Stack">creating a new stack frame</a> for the function call, we can simply reuse
<code>F</code>'s stack frame, thereby saving stack space and avoiding the work involved in
setting up a new frame. Here are some examples in C and their results compiled
with <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/asm-tco.sh" rel="noopener">mild optimization</a>:
<figure class="highlight c"><figcaption>Simple Tail Calls<a href="https://manybutfinite.com/code/x86-stack/tail.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 </pre></td><td class="code"><pre>int add5(int a) { return a + 5; } int add10(int a) { int b = add5(a); // not tail return add5(b); // tail } int add5AndTriple(int a) { int b = add5(a); // not tail return 3 * add5(a); // not tail, doing work after the call } int finicky(int a) { if (a > 10) { return add5AndTriple(a); // tail } if (a > 5) { int b = add5(a); // not tail return finicky(b); // tail } return add10(a); // tail } </pre></td></tr></tbody></table></figure>
You can normally spot tail call optimization (hereafter, TCO) in compiler output
by seeing a <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/tail-tco.s#L27" rel="noopener">jump</a> instruction where a <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/tail.s#L37-L39" rel="noopener">call</a> would have been
expected. At runtime TCO leads to a reduced call stack.
A common misconception is that tail calls are necessarily
<a href="https://manybutfinite.com/post/recursion/">recursive</a>. That’s not the case: a tail call may be recursive,
such as in <code>finicky()</code> above, but it need not be. As long as caller <code>F</code> is
completely done at the call site, we’ve got ourselves a tail call. Whether it
can be optimized is a different question whose answer depends on your
programming environment.
“Yes, it can, always!” is the best answer we can hope for, which is famously the
case for Scheme, as discussed in <a href="http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-11.html" rel="noopener">SICP</a> (by the way, if when you program
you don’t feel like “a Sorcerer conjuring the spirits of the computer with your
spells,” I urge you to read that book). It’s also the case for <a href="http://www.lua.org/pil/6.3.html" rel="noopener">Lua</a>. And
most importantly, it is the case for the next version of JavaScript, ES6, whose
spec does a good job defining <a href="https://people.mozilla.org/~jorendorff/es6-draft.html#sec-tail-position-calls" rel="noopener">tail position</a> and clarifying the few
conditions required for optimization, such as <a href="https://people.mozilla.org/~jorendorff/es6-draft.html#sec-strict-mode-code" rel="noopener">strict mode</a>. When
a language guarantees TCO, it supports proper tail calls.
Now some of us can’t kick that C habit, heart bleed and all, and the answer
there is a more complicated “sometimes” that takes us into compiler optimization
territory. We’ve seen the <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/tail.c" rel="noopener">simple examples</a> above; now let’s resurrect
our factorial from <a href="https://manybutfinite.com/post/recursion/">last post</a>:
<figure class="highlight c"><figcaption>Recursive Factorial<a href="https://manybutfinite.com/code/x86-stack/factorial.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 </pre></td><td class="code"><pre>#include <stdio.h> int factorial(int n) { 	int previous = 0xdeadbeef;	 	if (n == 0 || n == 1) { 		return 1; 	} 	previous = factorial(n-1); 	return n * previous; } int main(int argc) { 	int answer = factorial(5); 	printf("%d\n", answer); } </pre></td></tr></tbody></table></figure>
So, is line 11 a tail call? It’s not, because of the multiplication by <code>n</code>
afterwards. But if you’re not used to optimizations, gcc’s
<a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/factorial-o2.s" rel="noopener">result</a> with <a href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html" rel="noopener">O2 optimization</a> might shock you: not only it
transforms <code>factorial</code> into a <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/factorial-o2.s#L16-L19" rel="noopener">recursion-free loop</a>, but the
<code>factorial(5)</code> call is eliminated entirely and replaced by a
<a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/factorial-o2.s#L38" rel="noopener">compile-time constant</a> of 120 (5! == 120). This is why debugging optimized
code can be hard sometimes. On the plus side, if you call this function it will
use a single stack frame regardless of n’s initial value. Compiler algorithms
are pretty fun, and if you’re interested I suggest you check out
<a href="http://www.amazon.com/Building-Optimizing-Compiler-Bob-Morgan-ebook/dp/B008COCE9G/" rel="noopener">Building an Optimizing Compiler</a> and <a href="http://www.amazon.com/Advanced-Compiler-Design-Implementation-Muchnick-ebook/dp/B003VM7GGK/" rel="noopener">ACDI</a>.
However, what happened here was not tail call optimization, since there was
no tail call to begin with. gcc outsmarted us by analyzing what the function
does and optimizing away the needless recursion. The task was made easier by
the simple, deterministic nature of the operations being done. By adding a dash
of chaos (e.g., <code>getpid()</code>) we can throw gcc off:
<figure class="highlight c"><figcaption>Recursive PID Factorial<a href="https://manybutfinite.com/code/x86-stack/pidFactorial.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 </pre></td><td class="code"><pre>#include <stdio.h> #include <sys/types.h> #include <unistd.h> int pidFactorial(int n) { if (1 == n) { return getpid(); // tail } return n * pidFactorial(n-1) * getpid(); // not tail } int main(int argc) { int answer = pidFactorial(5); printf("%d\n", answer); } </pre></td></tr></tbody></table></figure>
Optimize that, unix fairies! So now we have a regular
<a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/pidFactorial-o2.s#L20" rel="noopener">recursive call</a> and this function allocates O(n) stack
frames to do its work. Heroically, gcc still does <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/pidFactorial-o2.s#L43" rel="noopener">TCO for getpid</a>
in the recursion base case. If we now wished to make this function tail recursive,
we’d need a slight change:
<figure class="highlight c"><figcaption>tailPidFactorial.c<a href="https://manybutfinite.com/code/x86-stack/tailPidFactorial.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 </pre></td><td class="code"><pre>#include <stdio.h> #include <sys/types.h> #include <unistd.h> int tailPidFactorial(int n, int acc) { if (1 == n) { return acc * getpid(); // not tail } acc = (acc * getpid() * n); return tailPidFactorial(n-1, acc); // tail } int main(int argc) { int answer = tailPidFactorial(5, 1); printf("%d\n", answer); } </pre></td></tr></tbody></table></figure>
The accumulation of the result is now <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/tailPidFactorial-o2.s#L22-L27" rel="noopener">a loop</a> and we’ve
achieved true TCO. But before you go out partying, what can we say about the
general case in C? Sadly, while good C compilers do TCO in a number of cases,
there are many situations where they cannot do it. For example, as we saw in our
<a href="https://manybutfinite.com/post/epilogues-canaries-buffer-overflows/">function epilogues</a>, the caller is responsible for cleaning up the
stack after a function call using the standard C calling convention. So if
function <code>F</code> takes two arguments, it can only make TCO calls to functions taking
two or fewer arguments. This is one among many restrictions. Mark Probst wrote
an excellent thesis discussing <a href="http://www.complang.tuwien.ac.at/schani/diplarb.ps" rel="noopener">Proper Tail Recursion in C</a> where he
discusses these issues along with C stack behavior. He also does
<a href="http://www.complang.tuwien.ac.at/schani/jugglevids/index.html" rel="noopener">insanely cool juggling</a>.
“Sometimes” is a rocky foundation for any relationship, so you can’t rely on TCO
in C. It’s a discrete optimization that may or may not take place, rather than
a language feature like proper tail calls, though in practice the compiler
will optimize the vast majority of cases. But if you must have it, say for
transpiling Scheme into C, you will <a href="http://en.wikipedia.org/wiki/Tail_call#Through_trampolining" rel="noopener">suffer</a>.
Since JavaScript is now the most popular transpilation target, proper tail calls
become even more important there. So kudos to ES6 for delivering it along with
many other significant improvements. It’s like Christmas for JS programmers.
This concludes our brief tour of tail calls and compiler optimization. Thanks
for reading and see you next time.
</body></html>

First name:

Last name:

Related Posts