7.7.5 The Train Algorithm While the generational approach is very efficient for the handling of immature objects, it is less efficient for the mature objects, since mature objects are moved every time there is a collection involving them, and they are quite unlikely to be garbage. A different approach to incremental collection, called the train algorithm, was developed to improve the handling of mature objects. It can be used for collecting all garbage, but it is probably better to use the generational approach for immature objects and, only after they have survived a few rounds of collection, "promote" them to another heap, managed by the train algorithm. Another advantage to the train algorithm is that we never have to do a complete garbage collection, as we do occasionally for generational garbage collection. To motivate the train algorithm, let us look at a simple example of why it is necessary, in the generational approach, to have occasional all-inclusive rounds of garbage collection. Figure 7.29 shows two mutually linked objects in two partitions i and j, where j > i. Since both objects have pointers from outside their partition, a collection of only partition i or only partition j could never collect either of these objects. Yet they may in fact be part of a cyclic garbage structure with no links from the outside. In general, the "links" between the objects shown may involve many objects and long chains of references. Partition i Partition j Figure 7.29: A cyclic structure across partitions that may be cyclic garbage In generational garbage collection, we eventually collect partition j, and since i < j, we also collect i at that time. Then, the cyclic structure will be completely contained in the portion of the heap being collected, and we can tell if it truly is garbage. However, if we never have a round of collection that includes both i and j, we would have a problem with cyclic garbage, just as we did with reference counting for garbage collection. The train algorithm uses fixed-length partitions, called cars; a car might be a single disk block, provided there are no objects larger than disk blocks, or the car size could be larger, but it is fixed once and for all. Cars are organized into ^ramsTThere is no limit to the number of cars in a train, and no limit to the number of trains. There is a lexicographic order to cars: first order by train number, and within a train, order by car number, as in Fig. 7.30. ( '<•«• '• : There are two ways that garbage is collected by the train algorithm: ® The first car in lexicographic order (that is, the first remaining car of the first remaining train) is collected in one incremental garbage-collection step. This step is similar to collection of the first partition in the generational algorithm, since we maintain a "remembered" list of all pointers Train 1 car 11 car 12 Train 2 Train 3 car 21 car 22 car 23 car 24 car 31 car 32 car 33 Figure 7.30: Organization of the heap for the train algorithm from qutside_the car. Here, we identify objects with no references at all, as well as garbage cycles that are contained completely within this car. Reachable objects in the car are always moved to some other car, so each garbage-collected car becomes empty and can be removed from the train. Sometimes, the first train has no external references. That is, there are no pointers from the root set to any car of the train, and the remembered sets for the cars contain only references from other cars in the train, not "from other trains. In this situation, the train is a huge collection of cyclic garbage, and we delete the entire train. ^Remembered Sets We now give the details of the train algorithm. Each car has a remembered set consisting of all references to objects in the car from a) Objects in higher-numbered cars of the same train, and b) Objects in higher-numbered trains. In addition, each train has a remembered set consisting of all references from higher-numbered trains. That is, the remembered set for a train is the union of the remembered sets for its cars, except for those references that are internal to the train. It is thus possible to represent both kinds of remembered sets by dividing the remembered sets for the cars into "internal" (same train) and "external" (other trains) portions. Note that references to objects can come from anywhere, not just from lexicographically higher cars. However, the two garbage-collection processes deal with the first car of the first train, and the entire first train, respectively. Thus, when it is time to use the remembered sets in a garbage collection, there is nothing earlier from which references could come, and therefore there is no point in remembering references to higher cars at any time. We must be careful, of course, to manage the remembered sets properly, changing them whenever the mutator modifies references in any object. 492 CHAPTER 7. RUN-TIME iiiV VMumvicivj-a I.I. D i i U i t i - i - A U D i i UULL&UUUIV 4b* M a n a g i n g "Trains Our objective is to draw out of the first train all objects that are not cyclic garbage. Then, the first train either becomes nothing but cyclic garbage and is therefore collected at the next round of garbage collection, or if the garbage is not cyclic, then its cars may be collected one at a time. We therefore need to start new trains occasionally, even though there is no limit on the number of cars in one train, and we could in principle simply add new cars to a single train, every time we needed more space. For example, we could start a new train after every k object creations, for some k. That is, in general, a new object is placed in the last car of the.last train, if there is room, or in a new car that is added to the end of the last train, if there is no room. However, periodically, we instead start a new train with one car, and place the new object there. Garbage Collecting a C a r The heart of the train algorithm is how we process the first car of the first train during a round of garbage collection. Initially, the reachable set is taken to be the objects of that car with references from the root set and those with references in the remembered set for that car. We then scan these objects as in a mark-and-sweep collector, but we do not scan any reached objects outside the one car being collected. After this tracing, some objects in the car may be identified as garbage. There is no need to reclaim their space, because the entire car is going to disappear anyway. However, there are likely to be some reachable objects in the car, and these must be moved somewhere else. The rules for moving an object are: a If there is a reference in the remembered set from any other train (which will be higher-numbered than the train of the car being collected), then move the object to one of those trains. If there is room, the object can go in some existing car of the train from which a reference emanates, or it can go in a new, last car if there is no room. o If there is no reference from other trains, but there are references from the root set or from the first train, then move the object to any other car of the same train, creating a new, last car if there is no room. If possible, pick a car from which there is a reference, to help bring cyclic structures to a single car. After moving all the reachable objects from the first car, we delete that car. Panic M o d e There is one problem with the rules above. In order to be sure that all garbage will eventually be collected, we need to be sure that every train eventually becomes the first train, and if this train is not cyclic garbage, then eventually all cars of that train are removed and the train disappears one car at a tim. However, by rule (2) above, collecting the first car of the first train can produc a new last car. It cannot produce two or more new cars, since surely all tf objects of the first car can fit in the new, last car. However, could we be in situation where each collection step for a train results in a new car being addec and we never get finished with this train and move on to the other trains? The answer is, unfortunately, that such a situation is possible. The problei arises if we have a large, cyclic, nongarbage structure, and the mutator manag< to change references in such a way that we never see, at the time we collet a car, any references from higher trains in the remembered set. If even or object is removed from the train during the collection of a car, then we are 01 since no new objects are added to the first train, and therefore the first trai will surely run out of objects eventually. However, there may be no garbag at all that we can collect at a stage, and we run the risk of a loop where w perpetually garbage collect only the current first train. To avoid this problem, we need to behave differently whenever we encount* a futile garbage collection, that is, a car from which not even one object can t deleted as garbage or moved to another train. In this "panic mode," we mal two changes: 1. When a reference to an object in the first train is rewritten, we maintai the reference as a new member of the root set. 2. When garbage collecting, if an object in the first car has a reference froi the root set, including dummy references set up by point (1), then \s move that object to another train, even if it has no references from oth< trains. It is not important which train we move it to, as long as it is nc the first train. In this way, if there are any references from outside the first train to objeci in the first train, these references are considered as we collect every car, an eventually some object will be removed from that train. We can then leave pan: mode and proceed normally, sure that the current first train is now smaller tha it was. 7.7.6 Exercises for Section 7.7 Exercise 7.7.1: Suppose that the network of objects from Fig. 7.20 is manage | by an incremental algorithm that uses the four lists Unreached, Unscanna ! Scanned, and Free, as in Baker's algorithm. To be specific, the Unscanned lis I is managed as a queue, and when more than one object is to be placed on this lis [ due to the scanning of one object, we do so in alphabetical order. Suppose als I that we use write barriers to assure that no reachable object is made garbag | Starting with A and B on the Unscanned list, suppose the following evem I occur: f \ i. A is scanned. if. The pointer .4 -> D is rewritten to be .4 -»• i J . i i i . B is scanned. £> is scanned. v. The pointer B -> C is rewritten to be i? -» J. Simulate the entire incremental garbage collection, assuming no more pointers are rewritten. Which objects are garbage? Which objects are placed on the Free list? Exercise 7.7.2: Repeat Exercise 7.7.1 on the assumption that a) Events (ii) and (v) are interchanged in order. b) Events (ii) and (v) occur before (i), (Hi), and (iv). Exercise 7.7.3: Suppose the heap consists of exactly the nine cars on three trains shown in Fig. 7.30 (i.e., ignore the ellipses). Object o in car 11 has references from cars 12, 23, and 32. When we garbage collect car 11, where might o wind up? Exercise 7.7.4: Repeat Exercise 7.7.3 for the cases that o has a) Only references from cars 22 and 31. b) No references other than from car 11. Exercise 7.7.5: Suppose the heap consists of exactly the nine cars on three trains shown in Fig. 7.30 (i.e., ignore the ellipses). We are currently in panic mode. Object ox in car 11 has only one reference, from object 02 in car 12. That reference is rewritten. When we garbage collect car 11, what could happen to 01? 7.8 Advaeced Topics in Garbage Collection We close our investigation of garbage collection with brief treatments of four additional topics: 1. Garbage collection in parallel environments. 2. Partial relocations of objects. 3. Garbage collection for languages that are not type-safe. 4. The interaction between programmer-controlled and automatic garbage collection. 7.8.1 Parallel and Concurrent Garbage Collection Garbage collection becomes even more challenging when applied to applications running in parallel on a multiprocessor machine. It is not uncommon for server applications to have thousands of threads running at the same time; each of these threads is a mutator. Typically, the heap will consist of gigabytes of memory. Scalable garbage-collection algorithms must take advantage of the presence of multiple processors. We say a garbage collector is parallel if it uses multiple threads; it is concurrent if it runs simultaneously with the mutator. We shall describe a parallel, and mostly concurrent, collector that uses a concurrent and parallel phase that does most of the tracing work, and then a stop-the-world phase that guarantees all the reachable objects are found and reclaims the storage. This algorithm introduces no new basic concepts in garbage collection per se; it shows how we can combine the ideas described so far to create a full solution to the parallel-and-concurrent collection problem. However, there are some new implementation issues that arise due to the nature of parallel execution. We shall discuss how this algorithm coordinates multiple threads in a parallel computation using a rather common work-queue model. To understand the design of the algorithm we must keep in mind the scale of the problem. Even the root set of a parallel application is much larger, consisting of every thread's stack, register set and globally accessible variables. The amount of heap storage can be very large, and so is the amount of reachable data. The rate at which mutations take place is also much greater. To reduce the pause time, we can adapt the basic ideas developed for incremental analysis to overlap garbage collection with mutation. Recall that an incremental analysis, as discussed in Section 7.7, performs the following three steps: 1. Find the root set. This step is normally performed atomically, that is, with the mutator(s) stopped. 2. Interleave the tracing of the reachable objects with the execution of the mutator(s). In this period, every time a mutator writes a reference that points from a Scanned object to an Unreached object, we remember that reference. As discussed in Section 7.7.2, we have options regarding the granularity with which these references are remembered. In this section, we shall assume the card-based scheme, where we divide the heap into sections called "cards" and maintain a bit map indicating which cards are dirty (have had one or more references within them rewritten). 3. Stop the mutator(s) again to rescan all the cards that may hold references to unreached objects. For a large multithreaded application, the set of objects reached by the root set can be very large. It is infeasible to take the time and space to visit all such objects while all mutations cease. Also, due to the large heap and the large 1.6. AJJVAlMjaU TUriUS iiV UAtWAUti UULLtiUTlUN 4y/ number of mutation threads, many cards may need to be rescanned after all objects have been scanned once. It is thus advisable to scan some of these cards in parallel, while the mutators are allowed to continue to execute concurrently. To implement the tracing of step (2) above, in parallel, we shall use multiple garbage-collecting threads concurrently with the mutator threads to trace most of the reachable objects. Then, to implement step (3), we stop the mutators and use parallel threads to ensure that all reachable objects are found. The tracing of step (2) is carried out by having each mutator thread perform part of the garbage collection, along with its own work. In addition, we use threads that are dedicated purely to collecting garbage. Once garbage collection has been initiated, whenever a mutator thread performs some memoryallocation operation, it also performs some tracing computation. The pure garbage-collecting threads are put to use only when a machine has idle cycles. As in incremental analysis, whenever a mutator writes a reference that points from a Scanned object to an Unreached object, the card that holds this reference is marked dirty and needs to be rescanned. Here is an outline of the parallel, concurrent garbage-collection algorithm. 1. Scan the root set for each mutator thread, and put all objects directly reachable from that thread into the Unscanned state. The simplest incremental approach to this step is to wait until a mutator thread calls the memory manager, and have it scan its own root set if that has not already been done. If some mutator thread has not called a memory allocation •function, but all the rest of tracing is done, then this thread must be interrupted to have its root set scanned. 2. Scan objects that are in the Unscanned state. To support parallel computation, we use a work queue of fixed-size work packets, each of which holds a number of Unscanned objects. Unscanned objects are placed in work packets as they are discovered. Threads looking for work will dequeue these work packets and trace the Unscanned objects therein. This strategy allows the work to be spread evenly among workers in the tracing process. If the system runs out of space, and we cannot find the space to create these work packets, we simply mark the cards holding the objects to force them to be scanned. The latter is always possible because the bit array holding the marks for the cards has already been allocated. 3. Scan the objects in dirty cards. When there are no more Unscanned objects left in the work queue, and all threads' root sets have been scanned, the cards are rescanned for reachable objects. As long as the mutators continue to execute, dirty cards continue to be produced. Thus, we need to stop the tracing process using some criterion, such as allowing cards to be rescanned only once or a fixed number of times, or when the number of outstanding cards is reduced to some threshold. As a result, this parallel and concurrent step normally terminates before completing the trace, which is finished by the final step, below. 4. The final step guarantees that all reachable objects are marked as reached. With all the mutators stopped, the root sets for all the threads can now be found quickly using all the processors in the system. Because the reachability of most objects has been traced, only a small number of objects are expected to be placed in the Unscanned state. All the threads then participate in tracing the rest of the reachable objects and rescanning all the cards. It is important that we control the rate at which tracing takes place. The tracing phase is like a race. The mutators create new objects and new references that must be scanned, and the tracing tries to scan all the reachable objects and rescan the dirty cards generated in the meanwhile. It is not desirable to start the tracing too much before a garbage collection is needed, because that will increase the amount of floating garbage. On the other hand, we cannot wait until the memory is exhausted before the tracing starts, because then mutators will not be able to make forward progress and the situation degenerates to that of a stop-the-world collector. Thus, the algorithm must choose the time to commence the collection and the rate of tracing appropriately. A n estimate of the mutation rate from previous cycles of collection can be used to help in the decision. The tracing rate is dynamically adjusted to account for the work performed by the pure garbage-collecting threads. 7.8.2 Partial Object Relocation As discussed starting in Section 7.6.4, copying or compacting collectors are advantageous because they eliminate fragmentation. However, these collectors have nontrivial overheads. A compacting collector requires moving all objects and updating all the references at the end of garbage collection. A copying collector figures out where the reachable objects go as tracing proceeds; if tracing is performed incrementally, we need either to translate a mutator's every reference, or to move all the objects and update their references at the end. Both options are very expensive, especially for a large heap. We can instead use a copying generational garbage collector. It is effective in collecting immature- objects and reducing fragmentation, but can be expensive when collecting mature objects. We can use the train algorithm to limit the amount of mature data analyzed each time. However, the overhead of the train algorithm is sensitive to the size of the remembered set for each partition. There is a hybrid collection scheme that uses concurrent tracing to reclaim all the unreachable objects and at the same time moves only a part of the objects. This method reduces fragmentation without incurring the full cost of relocation in each collection cycle. 1. Before tracing begins, choose a part of the heap that will be evacuated. 2. As the reachable objects are marked, also remember all the references pointing to objects in the designated area. 3. When tracing is complete, sweep the storage in parallel to reclaim the space occupied by unreachable objects. 4. Finally, evacuate the reachable objects occupying the designated area and fix up the references to the evacuated objects. 7.8.3 Conservative Collection for Unsafe Languages As discussed in Section 7.5.1, it is impossible to build a garbage collector that is guaranteed to work for all C and C++ programs. Since we can always compute an address with arithmetic operations, no memory locations in C and C++ can ever be shown to be unreachable. However, many C or C++ programs never fabricate addresses in this way. It has been demonstrated that' a conservative garbage collector — one that does not necessarily discard all garbage — can be built to work well in practice for this class of programs. A conservative garbage collector assumes that we cannot fabricate an address, or derive the address of an allocated chunk of memory without an address pointing somewhere in the same chunk. We can find all the garbage in programs satisfying such an assumption by treating as a valid address any bit pattern found anywhere in reachable memory, as long as that bit pattern may be construed as a memory location. This scheme may classify some data erroneously as addresses. It is correct, however, since it only causes the collector to be conservative and keep more data than necessary. Object relocation, requiring all references to the old locations be updated to point to the new locations, is incompatible with conservative garbage collection. Since a conservative garbage collector does not know if a particular bit pattern refers to an actual address, it cannot change these patterns to point to new addresses. Here is how a conservative garbage collector works. First, the memory manager is modified to keep a data map of all the allocated chunks of memory. This map allows us to find easily the starting and ending boundary of the chunk of memory that spans a certain address. The tracing starts by scanning the program's root set to find any bit pattern that looks like a memory location, without worrying about its type. By looking up these potential addresses in the data map, we can find the starting addresses of those chunks of memory that might be reached, and place them in the Unscanned state. We then scan all the unscanned chunks, find more (presumably) reachable chunks of memory, and place them on the work list until the work list becomes empty. After tracing is done, we sweep through the heap storage using the data map to locate and free all the unreachable chunks of memory. 7.8.4 Weak References Sometimes, programmers use a language with garbage collection, but also wish to manage memory, or parts of memory, themselves. That is, a programmer may know that certain objects are never going to be accessed again, even though references to the objects remain. A n example from compiling will suggest the problem. Example 7.17: We have seen that the lexical analyzer often manages a symbol table by creating an object for each identifier it sees. These objects may appear as lexical values attached to leaves of the parse tree representing those identifiers, for instance. However, it is also useful to create a hash table, keyed by the identifier's string, to locate these objects. That table makes it easier for the lexical analyzer to find the object when it encounters a lexeme that is an identifier. When the compiler passes the scope of an identifier I, its symbol-table object no longer has any references from the parse tree, or probably any other intermediate structure used by the compiler. However, a reference to the object is still sitting in the hash table. Since the hash table is part of the root set of the compiler, the object cannot be garbage collected. If another identifier with the same lexeme as / is encountered, then it will be discovered that I is out of scope, and the reference to its object will be deleted. However, if no other identifier with this lexeme is encountered, then I's object may remain as uncollectable, yet useless, throughout compilation. • If the problem suggested by Example 7.17 is important, then the compiler writer could arrange to delete from the hash table all references to objects as soon as their scope ends. However, a technique known as weak references allows the programmer to rely on automatic garbage collection, and yet not have the heap burdened with reachable, yet truly unused, objects. Such a system allows certain references to be declared "weak." A n example would be all the references in the hash table we have been discussing. When the garbage collector scans an object, it does not follow weak references within that object, and does not make the objects they point to reachable. Of course, such an object may still be reachable if there is another reference to it that is not weak. 7.8.5 Exercises for Section 7.8 ! Exercise 7.8.1: In Section 7.8.3 we suggested that it was possible to garbage collect for C programs that do not fabricate expressions that point to a place within a chunk unless there is an address that points somewhere within that same chunk. Thus, we rule out code like p = 12345; x = * p ; because, while p might point to some chunk accidentally, there could be no other pointer to that chunk. On the other hand, with the code above, it is more likely that p points nowhere, and executing that code will result in a segmentation fault. However, in C it is possible to write code such that a variable like p is guaranteed to point to some chunk, and yet there is no pointer to that chunk. Write such a program.