Recursion: dream within a dream

Recursion is magic, but it suffers from the most awkward introduction in
programming books. They’ll show you a recursive factorial implementation, then
warn you that while it sort of works it’s terribly slow and might crash due to
stack overflows. “You could always dry your hair by sticking your head
into the microwave, but watch out for intracranial pressure and head explosions.
Or you can use a towel.” No wonder people are suspicious of it. Which is too
bad, because recursion is the single most powerful idea in algorithms.

Let’s take a look at the classic recursive factorial:

Recursive Factorial – factorial.c

#include 

int factorial(int n)
{
        int previous = 0xdeadbeef;

        if (n == 0 || n == 1) {
                return 1;
        }

        previous = factorial(n-1);
        return n * previous;
}

int main(int argc)
{
        int answer = factorial(5);
        printf("%d\n", answer);
}

The idea of a function calling itself is mystifying at first. To make it
concrete, here is exactly what is on the stack when
factorial(5) is called and reaches n == 1:

Each call to factorial generates a new stack frame. The creation and
destruction of these stack frames is what makes the recursive
factorial slower than its iterative counterpart. The accumulation of these
frames before the calls start returning is what can potentially exhaust stack
space and crash your program.

These concerns are often theoretical. For example, the stack frames for
factorial take 16 bytes each (this can vary depending on stack alignment and
other factors). If you are running a modern x86 Linux kernel on a computer, you
normally have 8 megabytes of stack space, so factorial could handle n up to
~512,000. This is a monstrously large result that takes
8,971,833 bits to represent, so stack space is the least of our problems: a puny
integer – even a 64-bit one – will overflow tens of thousands of times over
before we run out of stack space.

We’ll look at CPU usage in a moment, but for now let’s take a step back from the
bits and bytes and look at recursion as a general technique. Our factorial
algorithm boils down to pushing integers N, N-1, … 1 onto a stack, then
multiplying them in reverse order. The fact we’re using the program’s call stack
to do this is an implementation detail: we could allocate a stack on the heap
and use that instead. While the call stack does have special properties, it’s
just another data structure at your disposal. I hope the diagram makes that
clear.

Once you see the call stack as a data structure, something else becomes clear:
piling up all those integers to multiply them afterwards is one dumbass idea.
That is the real lameness of this implementation: it’s using a screwdriver to
hammer a nail. It’s far more sensible to use an iterative process to calculate
factorials.

But there are plenty of screws out there, so let’s pick one. There is
a traditional interview question where you’re given a mouse in a maze, and you
must help the mouse search for cheese. Suppose the mouse can turn either left
or right in the maze. How would you model and solve this problem?

Like most problems in life, you can reduce this rodent quest to a graph, in
particular a binary tree where the nodes represent positions in the maze.
You could then have the mouse attempt left turns whenever possible, and
backtrack to turn right when it reaches a dead end. Here’s the mouse walk in an
example maze:

Each edge (line) is a left or right turn taking our mouse to a new position. If
either turn is blocked, the corresponding edge does not exist. Now we’re
talking! This process is inherently recursive whether you use the call stack
or another data structure. But using the call stack is just so easy:

Recursive Maze Solverview raw

#include 
#include "maze.h"

int explore(maze_t *node)
{
	int found = 0;

    if (node == NULL) {
        return 0;
    }

    if (node->hasCheese) {
        return 1; // found cheese
    }

	found = explore(node->left) || explore(node->right);
	return found;
}

int main(int argc)
{
        int found = explore(&maze);
}

Below is the stack when we find the cheese in maze.c:13. You can also
see the detailed GDB output and commands
used to gather data.

This shows recursion in a much better light because it’s a suitable problem. And
that’s no oddity: when it comes to algorithms, recursion is the rule, not the
exception. It comes up when we search, when we traverse trees and other data
structures, when we parse, when we sort: it’s everywhere. You know how pi
or e come up in math all the time because they’re in the foundations of the
universe? Recursion is like that: it’s in the fabric of computation.

Steven Skienna’s excellent Algorithm Design Manual is a great place to
see that in action as he works through his “war stories” and shows the reasoning
behind algorithmic solutions to real-world problems. It’s the best resource
I know of to develop your intuition for algorithms. Another good read is
McCarthy’s original paper on LISP. Recursion is both in its title
and in the foundations of the language. The paper is readable and fun, it’s
always a pleasure to see a master at work.

Back to the maze. While it’s hard to get away from recursion here, it doesn’t
mean it must be done via the call stack. You could for example use a string like
RRLL to keep track of the turns, and rely on the string to decide on the
mouse’s next move. Or you can allocate something else to record the state of the
cheese hunt. You’d still be implementing a recursive process, but rolling your
own data structure.

That’s likely to be more complex because the call stack fits like a glove.
Each stack frame records not only the current node, but also the state of
computation in that node (in this case, whether we’ve taken only the left, or
are already attempting the right). Hence the code becomes trivial. Yet we
sometimes give up this sweetness for fear of overflows and hopes of performance.
That can be foolish.

As we’ve seen, the stack is large and frequently other constraints kick
in before stack space does. One can also check the problem size and ensure it
can be handled safely. The CPU worry is instilled chiefly by two widespread
pathological examples: the dumb factorial and the hideous O(2ⁿ)
recursive Fibonacci without memoization. These are not indicative of
sane stack-recursive algorithms.

The reality is that stack operations are fast. Often the offsets to data are
known exactly, the stack is hot in the caches, and there are dedicated
instructions to get things done. Meanwhile, there is substantial overhead
involved in using your own heap-allocated data structures. It’s not uncommon to
see people write something that ends up more complex and less performant than
call-stack recursion. Finally, modern CPUs are pretty good
and often not the bottleneck. Be careful about sacrificing simplicity and as
always with performance, measure.

The next post is the last in this stack series, and we’ll look at Tail Calls,
Closures, and Other Fauna. Then it’ll be time to visit our old friend, the Linux
kernel. Thanks for reading!

Print Share Comment Cite Upload Translate

APA

() » Recursion: dream within a dream. Retrieved from https://www.truth.cx/2014/04/10/recursion-dream-within-a-dream/.

MLA

" » Recursion: dream within a dream." - , https://www.truth.cx/2014/04/10/recursion-dream-within-a-dream/

HARVARD

» Recursion: dream within a dream., viewed ,

VANCOUVER

- » Recursion: dream within a dream. [Internet]. [Accessed ]. Available from: https://www.truth.cx/2014/04/10/recursion-dream-within-a-dream/

CHICAGO

" » Recursion: dream within a dream." - Accessed . https://www.truth.cx/2014/04/10/recursion-dream-within-a-dream/

IEEE

" » Recursion: dream within a dream." [Online]. Available: https://www.truth.cx/2014/04/10/recursion-dream-within-a-dream/. [Accessed: ]

Select a language:

<html><head></head><body>Recursion is magic, but it suffers from the most awkward introduction in
programming books. They’ll show you a recursive factorial implementation, then
warn you that while it sort of works it’s terribly slow and might crash due to
stack overflows. “You could always dry your hair by sticking your head
into the microwave, but watch out for intracranial pressure and head explosions.
Or you can use a towel.” No wonder people are suspicious of it. Which is too
bad, because recursion is the single most powerful idea in algorithms.
Let’s take a look at the classic recursive factorial:
<figure class="highlight c"><figcaption>Recursive Factorial - factorial.c</figcaption><table><tbody><tr><td class="gutter"><pre>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 </pre></td><td class="code"><pre>#include <stdio.h> int factorial(int n) { int previous = 0xdeadbeef; if (n == 0 || n == 1) { return 1; } previous = factorial(n-1); return n * previous; } int main(int argc) { int answer = factorial(5); printf("%d\n", answer); } </pre></td></tr></tbody></table></figure>
The idea of a function calling itself is mystifying at first. To make it
concrete, here is exactly what is <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/factorial-gdb-output.txt" rel="noopener">on the stack</a> when
<code>factorial(5)</code> is called and reaches <code>n == 1</code>:
<img src="https://manybutfinite.com/img/stack/factorial.png" class="center">
Each call to <code>factorial</code> generates a new <a href="https://manybutfinite.com/post/journey-to-the-stack" title="Journey to the Stack">stack frame</a>. The creation and
<a href="https://manybutfinite.com/post/epilogues-canaries-buffer-overflows/">destruction</a> of these stack frames is what makes the recursive
factorial slower than its iterative counterpart. The accumulation of these
frames before the calls start returning is what can potentially exhaust stack
space and crash your program.
These concerns are often theoretical. For example, the stack frames for
<code>factorial</code> take 16 bytes each (this can vary depending on stack alignment and
other factors). If you are running a modern x86 Linux kernel on a computer, you
normally have 8 megabytes of stack space, so factorial could handle <code>n</code> up to
~512,000. This is a <a href="https://gist.github.com/gduarte/9944878" rel="noopener">monstrously large result</a> that takes
8,971,833 bits to represent, so stack space is the least of our problems: a puny
integer - even a 64-bit one - will overflow tens of thousands of times over
before we run out of stack space.
We’ll look at CPU usage in a moment, but for now let’s take a step back from the
bits and bytes and look at recursion as a general technique. Our factorial
algorithm boils down to pushing integers N, N-1, … 1 onto a stack, then
multiplying them in reverse order. The fact we’re using the program’s call stack
to do this is an implementation detail: we could allocate a stack on the heap
and use that instead. While the call stack does have special properties, it’s
just another data structure at your disposal. I hope the diagram makes that
clear.
Once you see the call stack as a data structure, something else becomes clear:
piling up all those integers to multiply them afterwards is one dumbass idea.
That is the real lameness of this implementation: it’s using a screwdriver to
hammer a nail. It’s far more sensible to use an iterative process to calculate
factorials.
But there are plenty of screws out there, so let’s pick one. There is
a traditional interview question where you’re given a mouse in a maze, and you
must help the mouse search for cheese. Suppose the mouse can turn either left
or right in the maze. How would you model and solve this problem?
Like most problems in life, you can reduce this rodent quest to a graph, in
particular a binary tree where the nodes represent positions in the maze.
You could then have the mouse attempt left turns whenever possible, and
backtrack to turn right when it reaches a dead end. Here’s the mouse walk in an
<a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/maze.h" rel="noopener">example maze</a>:
<img src="https://manybutfinite.com/img/stack/mazeGraph.png" class="center">
Each edge (line) is a left or right turn taking our mouse to a new position. If
either turn is blocked, the corresponding edge does not exist. Now we’re
talking! This process is inherently recursive whether you use the call stack
or another data structure. But using the call stack is just so easy:
<figure class="highlight c"><figcaption>Recursive Maze Solver<a href="https://manybutfinite.com/code/x86-stack/maze.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 </pre></td><td class="code"><pre>#include <stdio.h> #include "maze.h" int explore(maze_t *node) { 	int found = 0; if (node == NULL) { return 0; } if (node->hasCheese) { return 1; // found cheese } 	found = explore(node->left) || explore(node->right); 	return found; } int main(int argc) { int found = explore(&maze); } </pre></td></tr></tbody></table></figure>
Below is the stack when we find the cheese in maze.c:13. You can also
see the detailed <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/maze-gdb-output.txt" rel="noopener">GDB output</a> and <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/maze-gdb-commands.txt" rel="noopener">commands</a>
used to gather data.
<img src="https://manybutfinite.com/img/stack/mazeCallStack.png" class="center">
This shows recursion in a much better light because it’s a suitable problem. And
that’s no oddity: when it comes to algorithms, recursion is the rule, not the
exception. It comes up when we search, when we traverse trees and other data
structures, when we parse, when we sort: it’s everywhere. You know how pi
or e come up in math all the time because they’re in the foundations of the
universe? Recursion is like that: it’s in the fabric of computation.
Steven Skienna’s excellent <a href="http://www.amazon.com/Algorithm-Design-Manual-Steven-Skiena/dp/1848000693/" rel="noopener">Algorithm Design Manual</a> is a great place to
see that in action as he works through his “war stories” and shows the reasoning
behind algorithmic solutions to real-world problems. It’s the best resource
I know of to develop your intuition for algorithms. Another good read is
McCarthy’s <a href="https://github.com/papers-we-love/papers-we-love/blob/master/comp_sci_fundamentals_and_history/recursive-functions-of-symbolic-expressions-and-their-computation-by-machine-parti.pdf" rel="noopener">original paper on LISP</a>. Recursion is both in its title
and in the foundations of the language. The paper is readable and fun, it’s
always a pleasure to see a master at work.
Back to the maze. While it’s hard to get away from recursion here, it doesn’t
mean it must be done via the call stack. You could for example use a string like
<code>RRLL</code> to keep track of the turns, and rely on the string to decide on the
mouse’s next move. Or you can allocate something else to record the state of the
cheese hunt. You’d still be implementing a recursive process, but rolling your
own data structure.
That’s likely to be more complex because the call stack fits like a glove.
Each stack frame records not only the current node, but also the state of
computation in that node (in this case, whether we’ve taken only the left, or
are already attempting the right). Hence the code becomes trivial. Yet we
sometimes give up this sweetness for fear of overflows and hopes of performance.
That can be foolish.
As we’ve seen, the stack is large and frequently other constraints kick
in before stack space does. One can also check the problem size and ensure it
can be handled safely. The CPU worry is instilled chiefly by two widespread
pathological examples: the dumb factorial and the hideous O(2n)
<a href="http://stackoverflow.com/questions/360748/computational-complexity-of-fibonacci-sequence" rel="noopener">recursive Fibonacci</a> without memoization. These are not indicative of
sane stack-recursive algorithms.
The reality is that stack operations are fast. Often the offsets to data are
known exactly, the stack is hot in the <a href="https://manybutfinite.com/post/intel-cpu-caches/">caches</a>, and there are dedicated
instructions to get things done. Meanwhile, there is substantial overhead
involved in using your own heap-allocated data structures. It’s not uncommon to
see people write something that ends up more complex and less performant than
call-stack recursion. Finally, modern CPUs are <a href="https://manybutfinite.com/post/what-your-computer-does-while-you-wait/">pretty good</a>
and often not the bottleneck. Be careful about sacrificing simplicity and as
always with performance, <a href="https://manybutfinite.com/post/performance-is-a-science">measure</a>.
The next post is the last in this stack series, and we’ll look at Tail Calls,
Closures, and Other Fauna. Then it’ll be time to visit our old friend, the Linux
kernel. Thanks for reading!
<img src="https://manybutfinite.com/img/stack/1000px-Sierpinski-build.png" class="center">
</body></html>

First name:

Last name:

Related Posts