The Design and Implementation of the
Sport Model Garbage Collector

Warren Harris
Netscape Communications, Corp.
1/19/98

1. Introduction

The Sport Model garbage collector was designed to support the memory management requirements of the Java language. Sport Model is packaged as a library with a C-language API that supports a wide range of garbage collection and memory management needs, and is more widely applicable than to just Java virtual machine implementations. This paper describes Sport Model's design and implementation, and illustrates how it can be integrated into C and C++ applications.

In garbage collector parlance, Sport Model can be described as a hybrid generational, mark and deferred sweep, mostly copying garbage collector with an integrated malloc/free system. However it goes farther than that in that numerous subtle and interacting details factor into a coherent and efficient system. The design goals of Sport Model, simply stated, are:

to support the features required by Java,
minimize garbage collection pause times,
minimize object allocation time,
minimize heap size,
interoperability with native code (JIT) compilers,
interoperability with C and C++,
debuggability of client programs, and
portability.

Each of these will be discussed in detail in the next section, along with the supporting rationale.

2. Design

2.1 Features required by Java

The Java language requires features to be supported by the garbage collector that go beyond the basic automatic reclamation of objects:

Circular references: The Java language requires all objects not reachable from a set of VM roots to be automatically freed. Some garbage collection schemes, notably reference counting, cannot reclaim circular chains of otherwise unreferenced objects. We can rule out this class of garbage collection algorithms.

Finalization: An object's finalize method must be run whenever no more references to the object can be found, but before the object's memory is freed. In addition to basic object finalization, the Java VM requires the ability to finalize all remaining finalizable objects on system shut-down.

Weak references: The sun.misc.WeakRef class requires weak reference support. Weak references are like normal references to objects, except they do not factor into liveness considerations. If no other reference to an object can be found except a weak reference, the Java language requires the weak reference to be set to null. Weak references are useful in implementing caches.

Support for JNI: The Java Native Method Interface (JNI) allows for code written in C and C++ to interact with the Java VM. JNI specifies that all local object references given out to native code and held on the C runtime stack must be retained and not garbage collected. This may be implemented by either employing a conservative garbage collection technique, retaining any objects that are potentially referred to by ambiguous stack references, or by introducing a level of indirection between local references and the objects to which they refer.

Although conservative garbage collection is not strictly required by JNI, the way in which array elements are manipulated also factors into whether it is an appropriate implementation technique. The GetArrayElements operation must be called to obtain a native vector of elements, and the ReleaseArrayElements operation must be called when the native code is done manipulating the contents. One of two implementation strategies may be used for these operations:

Return a direct pointer to the array's contents. This requires the object to not relocate in memory throughout the duration of its manipulation. If a copying collection strategy is used, the object must be pinned (forced to not be copied or otherwised moved). Also this requires the object to remain live even if the only reference to it is the native code's pointer to the array's interior.
Make a copy of the array's contents, and commit changes made to the copy when released. Although commiting the copy can become troublesome if multiple threads are involved, the main drawback of this strategy is that memory must be available for potentially large copies of arrays, and the time spent making the copy and then copying it back can become prohibitive.

These JNI considerations caused the first major design decision for Sport Model -- that it should support conservative collection. By supporting conservative collection, copies of array contents need not be made. Also, conservative collection simplifies the JNI logic which would otherwise have to keep track of which object references were currently in use by native code. Direct pointers to objects can be given out to native code provided that the native code doesn't store them directly in the heap (a requirement of JNI).

If copying collection is employed, the garbage collector would need the ability to pin array objects when their contents are given out to native code. Sport Model does indeed support copying (for reasons described below) and consequently it also supports pinning.

Per-object hash-codes and monitors: Java requires all objects to support hash-codes (per-object unique IDs used for hashing purposes) and to support monitor operations. These can be supported by storage extrinsic to the object (e.g. a monitor cache) or by allocating fields for them within the object itself.

Adding fields to every object for these seldom-used operations can be very costly in terms of space. Data was taken in Communicator while visiting numerous pages containing applets, and the following graph of numbers of objects distributed by size was produced:

From this graph we can see that most objects are small, and the percentage cost of per-object fields is high (moreover, this graph shows skewed results due to the 2 word (8 bytes on a 32-bit machine) object-handle overhead used by our current VM). A clever design by the Electrical Fire team can reduce both the monitor and hash-code overhead to a single word per object. Nevertheless, this can be all be handled external to the garbage collector and is therefore not dealt with directly by Sport Model.

2.2 Minimizing garbage collection pause times

Reducing the pause time incurred by garbage collection was the number one goal after supporting the Java language. Although pause time hasn't been a significant problem in the interactive client (applet) environment, it is a particular problem for the web server environment where it is critical that we not pause all threads while some particular thread forces a garbage collection.

Generation scavenging

The following graphs illustrate data accumulated during the same run of Communicator as the previous graph and show the accumulated lifetimes of objects broken out by size. Units of lifetime are in number of collections -- a collection was forced after every object allocation: Object Lifetime Distribution

From these graphs we can see that a very large percentage of objects become garbage almost immediately -- indeed before the next object is created. This data supports the generational hypothesis which states that the probability that an object will become garbage is inversely proportional to the number of collections it has survived, i.e. that newly created objects are highly likely to be temporary.

A technique known as generation scavenging attempts to exploit this statistic by partitioning the heap into a series of generations, and promoting objects from generation to generation as they survive some number of collections. This allows long-lived objects to be examined much less frequently by the collector, minimizing the working set of objects which must be examined by the collector, and thereby minimizing the amount of time spent collecting. The organization of generations can be depicted as follows:

Sport Model uses a copying style generation scavenging technique to factor long-lived objects from those newly created. Objects are promoted by copying them into sub-heaps associated with a generation after they have survived some number of collections. Since conservative scanning of C runtime stacks is also performed, a mostly-copying technique is used in which only those objects that are not referred to by ambiguous roots may be copied.

Although copying garbage collection is a simple and effective technique for both partitioning the heap into sub-heaps, and maintaining the set of visited objects which must be retained for a collection, it has the negative effect of requiring additional heap space during the course of a collection. The set of old objects must be retained until the copying is complete as well as enough space for the newly made copies. Generation scavenging minimizes this impact by usually focusing on a single generation, but in the worse-case scenario as much as twice as much heap space is required. Obviously this can fail for data-intensive applications running under low-memory conditions.

For Sport Model, the hybrid nature of its collector comes to the rescue during these low-memory conditions. Objects which cannot be copied are simply left in place and marked instead, just as are objects referred to by conservative roots. This allows for near-100% memory utilization, modulo fragmentation concerns which will be discussed below.

Object promotion

A key concern when using generation scavenging is the premature promotion of objects. By requiring objects to survive some number of collections before promoting them to the next generation, we can be assured of a lower-bound on their lifetime, and have a good indication that examining them less frequently will not lead to excessive amounts of garbage being retained for long periods of time. One technique for measuring the number of survivals is to actually place a counter in the object that is incremented for each collection.

When the counter exceeds some predetermined value, the object may be promoted. However, this leads to additional storage in the object for the counter, and is more expensive during the collection process. The counter must be incremented and tested for each live object.

Sport Model uses a low-overhead technique to determine when an object should be promoted. Two versions of the collection algorithm are used: a mark-and-sweep version that is performed most frequently, and a copying version that is used far less frequently. The mark-and-sweep algorithm has no concern for whether it is time to promote an object or not. It is fast and low-overhead. After N uses of the mark-and-sweep algorithm, a copy cycle takes place. The copying algorithm is then used which will promote some objects to the next generation.

In order to not promote objects which were created just before the copying algorithm is run, a bit is used in the object header to indicate that at least one copying cycle has already transpired for the object. When it is time to copy the object, if that bit is not set, it sets it and leaves the object in place in the current generation. If the bit is set, the object is copied and the bit is cleared in the copy in the new generation. This ensures that the object has lived between N and 2N collections, a sufficient criterion for promotion to the next generation. The parameter N may be tuned dynamically by the system (although currently fixed at 4 for the current implementation).

Level-two cache concerns

In addition to minimizing the cost associated with promoting an object, use of the mark-and-sweep algorithm has excellent characteristics with respect to minimizing level-two cache miss rates experienced during collection. Mark-and-sweep far outperforms copying in this regard because it minimizes the working set of pages which must be visited during the collection process [Zorn 91]. Copying requires both the old page to be read and updated with a forwarding pointer as well as the page containing the new copy. In contrast, mark-and-sweep only requires an object's mark-bit to be set.

Sport Model is further optimized for level-two cache concerns in that it stores all object headers together on pages wholly dedicated to object headers, and minimizes them to one byte per object. Objects are associated with their corresponding headers by calculating the index of an object on its containing page (pages contain objects all of the same size). The index is used to obtain the object header in the object header array associated with a page descriptor for that page. Although the cost of looking up the object header is greater than just accessing the first word of the object as is done in most systems, it is relatively inexpensive as the division is performed by a table-lookup algorithm, and saving occurs because of the effects of increased locality of reference on the level-two cache.

Multiprocessor considerations

A primary concern in a multiprocessor environment is that garbage collection not suspend all threads while a collection takes place, particularly those threads that are not participating in computations involving garbage collected objects. For a web server environment, hundreds of threads may be servicing requests a given instance, while only a very small number are executing Java code in response to a request.

Although Sport Model currently makes no provision for multiprocessor environments, it is possible with some minor rearchitecting to allow for multiple youngest generation sub-heaps, one per processor. This would allow Java threads assigned to different processors to allocate objects independently, without sharing a global lock on the sub-heap, and allow the youngest generation for each processor to be collected independently. Only when a copying collection occurs (which is far less frequent) would a lock need to be obtained on the globally shared second generation sub-heap.

One consideration with this sort of architecture is that additional rules would come into play about passing pointers between threads running on different processors. Because the write barrier works by recording modified pages and then uses this information to search for older objects that point to younger ones, modifications would have to be made to accommodate pointers made between youngest generations associated with different processors. This should be doable with only a minor additional penalty at garbage collection time.

This multiprocessor work has not been done due to lack of support for multiple processors in the low-level thread runtime system (NSPR).

2.3 Minimizing object allocation time

lazy sweeping
minimizing collect time

2.4 Minimizing heap size

per-object overhead
- power-of-2 + buddy-system allocation
- discontiguous object headers
  - division algorithms
fragmentation
- copying to compact heap

2.5 Interoperability with native code (JIT) compilers

scavenging stack frames
scavenging registers
pausing threads at safe points

2.6 Interoperability with C and C++

integral malloc/free implementation
write barrier macros

2.7 Debuggability of client programs

debugging write barrier
memory visualization and analysis tools
memory dumping/object traversal tools

2.8 Portability

3. Implementation

Organization of Generations

architecture

low-level page manager
high-level heap manager
application program
- responsible for locking, roots, write barrier, etc.

implementation

data structures
- pages
- page descriptors
  - large objects
- pools
  - malloc pool
- generations
- heap
- classes
  - object classes
  - array classes
  - primitive classes
- objects and arrays
- object descriptors
- card descriptors
collection algorithm
- collection triggered by amount allocated or allocation failure
- some number of mark-and-sweep collections
- followed by less-frequent promoting copying collections
- collection steps

4. The Sport Model API

later

The Design and Implementation of the Sport Model Garbage Collector