Among the facilities that a Java virtual machine has to provide is a storage manager. The storage manager is responsible for the life-cycle of Java objects: allocation of new objects, collection of unreachable objects, and notifications of unreachability if requested. A virtual machine can distinguish itself in the qualities of service the storage manager provides. The HotSpot virtual machine provides several storage managers, to meet the needs of different kinds of applications. The two main storage managers are one to provide short pauses to the application, and one to provide high throughput.
The Klass Hierarchy
A central data structure of the HotSpot virtual machine is the “klass hierarchy”: the data structure that describes objects in the managed storage of the virtual machine, and provides all the operations on those objects. (We use “klass” with a “k” because “class” is a keyword in C++ so we couldn't use that.) The klass hierarchy is self-referential, in that it also contains the descriptions of the descriptions of objects, and so on. Java objects are not C++ objects, so we can not directly invoke methods on them. But we like object oriented programming, so the klass hierarchy provides a mechanism for methods (and virtual methods) on objects.
Perhaps a simple example will suffice. Consider a Java object instance. The storage for that instance consists of the fields declared by the Java programmer. In addition to those, we add a header, the important part of which, for our purposes here, is a pointer to the representation of the Java class that defined this instance. The representation of the Java class is itself an object (but not a Java object; an “instanceKlass”), containing obvious things like the static fields of the class. But in addition, it includes a description of the types of the fields of every instance of this Java class, so that the storage manager can find and adjust reference fields, for example, if it moves objects in memory. Since the representation of the Java class contains references to other objects, it must itself have a header with a pointer to something (another object, also not a Java object; an “instanceKlassKlass”) that describes its fields. This chain of classes would continue indefinitely if not for the fact that instanceKlassKlass's are described by klassKlass, which can describe itself. Perhaps a simple example won't suffice. There are subclasses of klass for each of the types of object managed by the storage manager.
Most Java applications allocate objects frequently, and abandon them almost as frequently. So a performant Java virtual machine should support fast allocation of objects, and be prepared for most objects to become unreachable fairly quickly. If you look in the code you will see code that allocates and initializes objects. But that code is what we call the “slow-path” for allocation, in which a running program calling in the virtual machine runtimes to allocate an object. Most object allocation is performed in the “fast-path”, which is compiled into the application, so no calls to the runtime need to be made. In fact, the fast-path is little more than bumping a pointer and checking to make sure it hasn't run off the end of an allocation region. In order to keep that pointer from becoming a scalability bottleneck, each thread has its own allocation region, and it's own pointer. These are what we call “thread-local allocation buffers” (TLABs). Each thread can allocate from its TLAB without coordinating with other threads, except when it needs a new TLAB. Much work has gone into the sizing of TLABs, and adjusting their sizes based on the application thread performance, to balance the number of TLABs needed, the fragmentation lost due to space held in TLABs by inactive threads, etc.
Objects eventually become unreferenced, and the storage they occupy can be reclaimed for use by other objects. The basic operation of the collector is to traverse the object graph, finding all the reachable objects and preserving them, while identifying all the objects that are unreachable and recovering their storage. It would be prohibitively expensive to traverse the entire object graph for each collection, so a variety of techniques have been applied to make collections more efficient.
We've observed that most Java programs follow the “weak generational hypothesis”: that objects are allocated frequently, but most are not referenced for long, and that few references exist between old objects and young objects. To exploit that, we partition the Java objects into “generations” so that we can apply different algorithms for managing newly-created objects and objects that objects that have remained referenced for a while. In the “young” generation we expect there to be a lot of object allocation, but also a high density of unreachable objects. In the old generation we expect there not to be a lot of object allocation (in fact the only object allocation in the old generation is the “promotion” of objects from the young generation by the storage manager, or a few objects that are allocated directly in the old generation for special reasons). But once objects in the old generation we expect them to remain referenced for a while, and so use different algorithms to manage them.
The Young Generation
The young generation must support fast allocation, but we expect most of those objects to become unreachable fairly quickly. We would like not to have to expend any effort on the unreachable objects, since there are so many of them. So when our young generations need to be collected, we identify the lreachable objects in them and copy them out of the young generation. Once we've copied out the reachable objects, the objects remaining in the young generation are unreachable, and the space for them can be recovered without attention to each individual unreachable object. This operation of copying the reachable objects out of a generation we call “scavenging”, so we often refer to our young generation collectors as “scavengers”. An advantage of scavenging is that it is fast. A disadvantage is that it requires room of a second copy of all the reachable objects from the scavenged generation.
Some Ancillary Data Structures
But wait: how can we identify the reachable objects in the young generation without traversing the whole object graph, which we said was expensive? To do that we take advantage of the second part of the weak generational hypothesis: that there are few references from old objects to young objects. We keep what is called a “remembered set” of references from the old generation to the young generation. (Recall that objects only get into the old generation under the control of the storage manager.) The current HotSpot collectors use a “card marking” remembered set, which bounds the size of the remembered set at the cost of some extra work during collections. That extra work comes from the fact that the remember set is imprecise, so we need to be able to walk backwards through the old generation looking for the start of objects. To facilitate that we have a “block start” table (also bounded in size) which you will see if you look in the code. In addition, the different collectors have some collector-specific ancillary data structures that will be explained in more detail later.
The Old Generation
Once objects have been scavenged out of the young generation we expect them to remain reachable for at least a while, and to reference and be referenced by other objects that remain reachable. To collect the space in the old generation we need algorithms that can't depend on high density of unreachable objects. In particular we can't count on having enough space to scavenge such objects somewhere else. Copying objects in the old generation can be expensive, both because one has to move the objects, and because one has to update all the references to the object to point to the new location of the object. On the other hand, copying objects means that one can accumulate the recovered space into one large region, from which allocation is faster (which speeds up scavenging of the young generation), and allows us to return excess storage to the operating system, which is polite.
The Permanent Generation
In addition to the objects created by the Java application, there are objects created and used by the HotSpot virtual machine whose storage it is convenient to have allocated and recovered by the storage manager. To avoid confusing things, such objects are allocated in a separate generation, the so-called “permanent” generation. In fact, the objects in it are not “permanent”, but that's what it has been called historically. For example, information about loaded classes is stored in the permanent generation, and recovered when those classes are no longer reachable from the application.
In order to provide a variety of qualities of service: e.g., short pauses for collection or high throughput for applications (no, we haven't figured out, yet, how to do both), the HotSpot virtual machine provides a variety of collectors.
The Short Pause Collector
The short pause collector (also called the “low-pause” collector, or the “mostly concurrent” collector) tries to reduce the interruptions the application sees due to garbage collection, while providing good performance recovering storage. Traversing the object graph is time-consuming, so this collector does almost all the traversal while the application is running. That's tricky, since the application is changing the object graph while we are walking it. Once the reachable objects are identified, the unreachable space is reclaimed by linking it onto free lists, merging where possible to avoid fragmentation. By avoiding moving reachable objects, we avoid the cost of updating references to those objects. That also avoids the complications that arise if you try to move objects while the application is using them.
The High Throughput Collector
The point of the high throughput collector (also called the “parallel compacting” collector) is to recover storage as quickly as possible, using the whole machine if possible, to get in there, recover storage and get the application running again. Toward that end, we have designed a parallel compacting collector. Since most of the work of tracing the object graph is waiting for the computer's memory system, the high throughput collector spins up multiple threads to try to hide that latency. The interesting part of the parallel collector is the way it uses multiple threads to copy objects, but still ends up with one large, unfragmented block of free space when it is done.
We have some other collectors in the code base: a serial mark-sweep-compacting collector, and parts of other collectors we were, or are, or will be researching. Those are not as interesting as the short pause collector or the high throughput collector, but they are in there, so you will see them if you look at the code.
There are two styles in which we've built collectors. At first we had a framework into which we could plug generations, each of which would have its own collector. The framework is general enough for us to have built several collectors on it, and does support a limited amount of mix-and-matching of “framework” generations. The framework has some inefficiencies due to the generality at allows. We've worked around some of the inefficiencies. When we built the high-throughput collector we decided not to use the framework, but instead designed an interface that a collector would support, with the high-throughput collector as an instance of that interface. That means that the “interface” collectors can't be mixed and matched, which implies some duplication of code. But it has the advantage that one can work on an “interface” collector without worrying about breaking any of the other collectors.
A Parallelization Framework
In building the parallel collector we built a framework for parallelizing parts of a collection: work queues, dispatching to worker threads, etc. We've used that framework to have parts of the various collectors run in parallel. For example, the young generation collectors run as multiple threads. The high throughput collector always runs with multiple threads, and even the short pause collector can use multiple threads if that won't seriously degrade the performance of the application.
A final responsibility of the storage manager is to provide notifications to the application of objects that have become unreachable. During the traversal of the object graph, when we find subclasses of java.lang.ref.Reference (e.g., FinalReference, WeakReference, SoftReference, or PhantomReference), we don't walk through those objects, but instead put them on lists to be looked at later. When the discovery of reachable objects (including References) is complete, we look at the referents of those References. Depending on the semantics of the Reference type and whether the referent is reachable or not, we may queue the Reference for notification, mark through the referent, or clear the referent field. This part of the storage manager is an interesting dance between the virtual machine and the code in the core libraries for handling References.
“Memory Management in the Java HotSpot
“Performance Through Parallelism: Java HotSpot Virtual
machine Garbage Collection Improvements”