So I was thinking about how garbage collectors work and I thought of an interesting issue. Presumably garbage collectors have to traverse all structures in the same way. They can’t know weather they are traversing a linked list or a balanced tree or whatever. They also can’t use up too much memory in their search. One possible way, and the only way I can think to traverse ALL structures, might be just to traverse them all recursively the way you would a binary tree. This however would give a stack overflow on a linked list or even just a poorly balanced binary tree. But all the garbage collected languages that I have ever used seem to have no problem dealing with such cases.
In the dragon book it uses a “Unscanned” queue of sorts. Basically rather than traversing the structure recursively it just adds things that need to be marked too a queue and then for each thing not marked at the end it is deleted. But wouldn’t this queue get very large?
So, how do garbage collectors traverse arbitrary structures? How does this traversal technique avoid overflow?
In a nutshell: Garbage collectors do
not use recursion. They just control tracing by keeping track of
essentially two sets (that may combine). The order of tracing and cell
processing is irrelevant, which gives considerable implementation
freedom to represent the sets. Hence there are many solutions that are
actually very cheap in memory usage. This is essential since the GC is
called precisely when the heap runs out of memory. Things are a bit
different with large virtual memories, as new pages can be easily
allocated, and the ennemy is not lack of space, but lack of data
I assume you are considering tracing garbage collectors, not reference counting for which your question does not seem to apply.
The question is focussing on the memory cost of tracing for keeping
track of a set: the set U (for untraced) of accessible memory cells that still contain
pointers that have not yet been traced. This is only half the memory problem
for garbage collection. The GC must also keep track of
another set: the set V (for visited) of all cells that have been found to be
accessible, so as to reclaim all the other cells at the end of the
process. Discussing one and not the other makes limited sense, as they may have
similar cost, use similar solutions, and even be combined.
The first thing to note is that all tracing GC follow the same
abstract model, based on systematic exploration of the directed graph
of cells in memory accessible from the program, where memory cells are
vertices and pointers are the directed edges. It uses for that the
the set V (visited) of cells already found to be accessible by
the mutator, i.e. the program or algorithm for which the GC is
performed. The set V is partitioned into two disjoint subsets:
the set U (untraced) of visited cells with pointers that have not
been followed yet;
the set T (traced) of visited cells that had all their pointers traced.
we also note H the set of all cells in the heap, whether or not in use.
Only V and U, or U and T, need to be represented somehow for the algorithm to work.
The algorithm starts from some roots pointers that are known to the
run-time system (usually pointers in stack allocated memory), and puts
all the cells they point in the untraced set U (hence in V too).
Then the collector takes cells in U one by one, and checks for each
cell c all its pointers. For each pointer, if the pointed cell is in V,
then nothing is done, else the pointed cell is added to U, since its
pointers have yet to be checked. When all its pointers have been
processed, the cell c is transferred from the untraced set U to
the traced set T.
The tracing terminates when U is empty. This is bound to happen,
since no cell goes through U more than once. At that points, V=T,
and all cells in V are know to be accessible to the program, thus not
reclaimable. The complement H−V of V in the heap determines what
cells are unreachable by the mutator program and can be reclaimed by
the collector for future allocation to the mutator.
Of couse, details vary depending on how sets are implemented, and on
whether it is V and U, or U and T, which are effectively
I also skip details about what is a cell, whether they come in one size or
many, how we find pointers in them, how they may be compacted, and a
host of other technical issues which you can find in books and surveys
on garbage collection.
You may have notice that this is an extremely simple algorithm. There
is no recursion, but only a loop on the elements of the set U that can
grow as it is being processed, until it eventually empties. No a priori assumption about extra memory.
Whatever allows identifying the sets, and doing cheaply enough the
needed operations will do. Note that the order in which cells are
processed is irrelevant (no specific need for a pushdown stack), which gives a lot of freedom for choosing the means to represent the sets efficiently.
Where known implementations differ is in the way these sets are actually
represented. Many techniques have been actually used:
bit map: Some memory space is preserved for a map that has one bit
for each memory cell, which can be found using the adress of the
cell. The bit is on when the corresponding cell is in the set
defined by the map. If only bit maps are used, you need only 2 bits
alternatively, you may have space for a special tag bit (or 2) in
each cell to mark it.
list: you make a list of those cells that are in the set. You do not need a stack, or a specific data structure. In some
system, astute pointer reversal technique allow building the list
with very little extra memory, precisely log2p bits where p is the number of pointers per cell, this being further reduced by means of stacks of bits.
you may test a predicate on the content of the cell, and its pointers.
you may relocate the cell in a free part of memory intended for all
an only the cells belonging to the set represented.
one interesting special case is having the visited cell relocated
in another contiguous area of memory (figuring the set V), and representing the set T of traced cells by a single, but changing, boundary address that is greater than adresses of cells in T, and less than those in U.
you may actually combine these techniques, even for a single set.
As said, all of the above have been used by some implemented garbage
collector, strange as some may seem. It all depends on the various
constraints of the implementation. And they can be rather cheap in memory usage, possibly helped by processing order policies that can be freely chosen for that purpose, since they do not matter for the end result.
What may seem the strangest one, transfering cells in a new area, is
actually very common: it is called copy collection. It is mostly used with virtual memory.
Clearly there is no recursion, and the mutator algorithm stack does
not have to be used.
Another important point is that many modern GC are implemented for large virtual
memories. Then getting space to implement and extra list or stack is
not an issue as new pages can be easily allocated. However, in large virtual
memories, the ennemy is not lack of space but lack of locality. Then,
the structure representing the sets, and their use, must be geared
towards preserving the locality of data structure and of GC execution.
The problem is not space but time. Inadequate implementations are more likely to show unacceptable slowdown than memory overflow.
I did not give references to the many specific algorithms, resulting from various combinations of these techniques, as this seems long enough.