# Is C actually Turing-complete?

I was trying to explain to someone that C is Turing-complete, and realized that I don’t actually know if it is, indeed, technically Turing-complete. (C as in the abstract semantics, not as in an actual implementation.)

The “obvious” answer (roughly: it can address an arbitrary amount of memory, so it can emulate a RAM machine, so it’s Turing-complete) isn’t actually correct, as far as I can tell, as although the C standard allows for size_t to be arbitrarily large, it must be fixed at some length, and no matter what length it is fixed at it is still finite. (In other words, although you could, given an arbitrary halting Turing machine, pick a length of size_t such that it will run “properly”, there is no way to pick a length of size_t such that all halting Turing machines will run properly)

So: is C99 Turing-complete?

I’m not sure but I think the answer is no, for rather subtle reasons. I asked on Theoretical Computer Science a few years ago and didn’t get an answer that goes beyond what I’ll present here.

In most programming languages, you can simulate a Turing machine by:

• simulating the finite automaton with a program that uses a finite amount of memory;
• simulating the tape with a pair of linked lists of integers, representing the content of the tape before and after the current position. Moving the pointer means transferring the head of one of the lists onto the other list.

A concrete implementation running on a computer would run out of memory if the tape got too long, but an ideal implementation could execute the Turing machine program faithfully. This can be done with pen and paper, or by buying a computer with more memory, and a compiler targeting an architecture with more bits per word and so on if the program ever runs out of memory.

This doesn’t work in C because it’s impossible to have a linked list that can grow forever: there’s always some limit on the number of nodes.

To explain why, I first need to explain what a C implementation is. C is actually a family of programming languages. The ISO C standard (more precisely, a specific version of this standard) defines (with the level of formality that English allows) the syntax and semantics a family of programming languages. C has a lot of undefined behavior and implementation-defined behavior. An “implementation” of C codifies all the implementation-defined behavior (the list of things to codify is in appendix J for C99). Each implementation of C is a separate programming language. Note that the meaning of the word “implementation” is a bit peculiar: what it really means is a language variant, there can be multiple different compiler programs that implement the same language variant.

In a given implementation of C, a byte has $2^{\texttt{CHAR_BIT}}$ possible values. All data can represented as an array of bytes: a type t has at most
$2^{\texttt{CHAR_BIT} \times \texttt{sizeof(t)}}$ possible values. This number varies in different implementations of C, but for a given implementation of C, it’s a constant.

In particular, pointers can only take at most $2^{\texttt{CHAR_BIT} \times \texttt{sizeof(void*)}}$ values. This means that there is a finite maximum number of addressable objects.

The values of CHAR_BIT and sizeof(void*) are observable, so if you run out of memory, you can’t just resume running your program with larger values for those parameters. You would be running the program under a different programming language — a different C implementation.

If programs in a language can only have a bounded number of states, then the programming language is no more expressive than finite automata. The fragment of C that’s restricted to addressable storage only allows at most $n \times 2^{\texttt{CHAR_BIT} \times \texttt{sizeof(void*)}}$ program states where $n$ is the size of the abstract syntax tree of the program (representing the state of the control flow), therefore this program can be simulated by a finite automaton with that many states. If C is more expressive, it has to be through the use of other features.

C does not directly impose a maximum recursion depth. An implementation is allowed to have a maximum, but it’s also allowed not to have one. But how do we communicate between a function call and its parent? Arguments are no good if they’re addressable, because that would indirectly limit the depth of recursion: if you have a function int f(int x) { … f(…) …} then all the occurrences of x on active frames of f have their own address and so the number of nested calls is bounded by the number of possible addresses for x.

A C program can use non-addressable storage in the form of register variables. “Normal” implementations can only have a small, finite number of variables that don’t have an address, but in theory an implementation could allow an unbounded amount of register storage. In such an implementation, you can make an unbounded amount of recursive calls to a function, as long as its argument are register. But since the arguments are register, you can’t make a pointer to them, and so you need to copy their data around explicitly: you can only pass around a finite amount of data, not an arbitrary-sized data structure that’s made of pointers.

With unbounded recursion depth, and the restriction that a function can only get data from its direct caller (register arguments) and return data to its direct caller (the function return value), you get the power of deterministic pushdown automata.

I can’t find a way to go further.

(Of course you could make the program store the tape content externally, through file input/output functions. But then you wouldn’t be asking whether C is Turing-complete, but whether C plus an infinite storage system is Turing-complete, to which the answer is a boring “yes”. You might as well define the storage to be a Turing oracle — call fopen("oracle", "r+"), fwrite the initial tape content to it and fread back the final tape content.)