In computer science, garbage collection (also known as GC) is a form of automatic memory management. The garbage collector or collector attempts to reclaim garbage, or memory used by objects that will never again be accessed or mutated by the application. Garbage collection was invented by John McCarthy around 1959 to solve the problems of manual memory management in his recently devised Lisp programming language.
Garbage collection is often portrayed as the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system. However, many systems use a combination of the two approaches, and there are other techniques being studied (such as region inference) to solve the same fundamental problem.
The basic principle of how a garbage collector works is:
By making manual memory deallocation unnecessary (and typically impossible), garbage collection frees the programmer from having to worry about releasing objects that are no longer needed, which can otherwise consume a significant amount of design effort. It also aids programmers in their efforts to make programs more stable, because it prevents several classes of runtime errors. For example, it prevents dangling pointer errors, where a reference to a deallocated object is used. (The pointer still points to the location in memory where the object or data was, even though the object or data has since been deleted and the memory may now be used for other purposes, creating a dangling pointer.)
Many computer languages require garbage collection, either as part of the language specification (e.g. Java, C#, and most scripting languages) or effectively for practical implementation (e.g. formal languages like lambda calculus); these are said to be garbage-collected languages. Other languages were designed for use with manual memory management, but have garbage collected implementations (e.g., C, C++). Some languages, like Modula-3, allow both garbage collection and manual memory management to co-exist in the same application by using separate heaps for collected and manually managed objects, or yet others like D, which is garbage-collected but allows the user to manually delete objects and also entirely disable garbage collection when speed is required. In any case, it is far easier to implement garbage collection as part of the language's compiler and runtime system, but post hoc GC systems exist, including ones that do not require recompilation. The garbage collector will almost always be closely integrated with the memory allocator.
Tracing garbage collectors are the most common type of garbage collector. They focus on determining which objects are reachable (or potentially reachable), and then discard all remaining objects.
Informally, a reachable object can be defined as an object for which there exists some name in the program environment that leads to it, either directly or through references from other reachable objects. More precisely, objects can be reachable in only two ways:
The reachability definition of "garbage" is not optimal, insofar as the last time a program uses an object could be long before that object falls out of the environment scope. A distinction is sometimes drawn between syntactic garbage, those objects the program cannot possibly reach, and semantic garbage , those objects the program will in fact never again use. The problem of precisely identifying semantic garbage can easily be shown to be undecidable: a program that allocates an object X, runs an arbitrary input program P, and uses X if and only if P finishes would require a semantic garbage collector to solve the halting problem. Although conservative heuristic methods for semantic garbage detection remain an active research area, essentially all practical garbage collectors focus on syntactic garbage as described here.
The tri-colour marking algorithm preserves an important invariant:
Some variations on the algorithm do not preserve the tricolour invariant but they use a modified form for which all the important properties hold.
Once the unreachable set has been determined, the garbage collector may simply release the unreachable objects and leave everything else as it is, or it may copy some or all of the reachable objects into a new area of memory, updating all references to those objects as needed. These are called "non-moving" and "moving" garbage collectors, respectively.
At first, a moving GC strategy may seem inefficient and costly compared to the non-moving approach, since much more work would appear to be required on each cycle. In fact, however, the moving GC strategy leads to several performance advantages, both during the garbage collection cycle itself and during actual program execution:
The most straightforward approach is the semi-space collector, which dates to 1969. In this moving GC scheme, memory is partitioned into a "from space" and "to space". Initially, objects are allocated into "to space", until it becomes full and a collection is triggered. At the start of a collection, the "to space" becomes the "from space", and vice versa. The objects reachable from the root set are copied from the "from space" to the "to space". These objects are scanned in turn, and all objects that they point to are copied to "to space", until all reachable objects have been copied to "to space". Once the program continues execution, new objects are once again allocated from the "to space" until it is once again full and the process is repeated. This approach has the advantage of conceptual simplicity (the three object color sets are implicitly constructed during the copying process), but the disadvantage that a (possibly) very large contiguous region of free memory is necessarily required on every collection cycle.
A "mark and sweep" garbage collector maintains a bit (or two) with each object to record whether it is white or black; the grey set is either maintained as a separate list or using another bit. As the reference tree is traversed during a collection cycle, these bits are manipulated by the collector to reflect the current state. The mark and sweep strategy has the advantage that, once the unreachable set is determined, either a moving or non-moving collection strategy can be pursued; this choice of strategy can even be made at runtime, as available memory permits. It has the disadvantage of "bloating" objects by a small amount.
In order to implement this concept, many generational garbage collectors use separate memory regions for different ages of objects. When a region becomes full, those few objects that are referenced from older memory regions are promoted (copied) up to the next highest region, and the entire region can then be overwritten with fresh objects. This technique permits very fast incremental garbage collection, since the garbage collection of only one region at a time is all that is typically required.
Generational garbage collection is a heuristic approach, and some unreachable objects may not be reclaimed on each cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to reclaim all available space. In fact, runtime systems for modern programming languages (such as Java and the .NET Framework) usually use some hybrid of the various strategies that have been described thusfar; for example, most collection cycles might look only at a few generations, while occasionally a mark-and-sweep is performed, and even more rarely a full copying is performed to combat fragmentation. The terms "minor cycle" and "major cycle" are sometimes used to describe these different levels of collector aggressiveness.
Simple stop-the-world garbage collectors completely halt execution of the program to run a collection cycle, thus guaranteeing that new objects are not allocated and objects do not suddenly become unreachable while the collector is running. This has the obvious disadvantage that the program can perform no useful work while a collection cycle is running (sometimes called the "embarrassing pause").
Incremental garbage collectors are designed to reduce this disruption by interleaving their work with activity from the main program. Careful design is necessary to ensure that the main program does not interfere with the garbage collector and vice versa; for example, when the program needs to allocate a new object, the runtime system may either need to suspend it until the collection cycle is complete, or somehow notify the garbage collector that there exists a new, reachable object.
Finally, a concurrent garbage collector can run concurrently in real time with the main program on a symmetric multiprocessing machine. Complex locking regimes may be necessary in order to guarantee correctness, and cache issues also make this less helpful than one might imagine. Nonetheless, concurrent GC may be desirable for SMP applications with high performance requirements.
Some collectors can correctly identify all pointers (references) in an object; these are called "precise" (or "exact" or "accurate") collectors, the opposite being a "conservative" or "partly conservative" collector. "Conservative" collectors have to assume that any bit pattern in memory is a pointer if (when interpreted as a pointer) it would point into any allocated object. Thus, conservative collectors may have some false negatives, where storage is not released because of accidental fake pointers, but this is rarely a significant drawback in practice. Whether a precise collector is practical usually depends on type safety properties of the programming language.
A related issue concerns internal pointers, or pointers to fields within an object. If the semantics of a language allow internal pointers, then there may be many different addresses that can refer to the same object, which complicates determining whether an object is garbage or not.
A more fundamental issue is that garbage collectors violate locality of reference, since they deliberately go out of their way to find bits of memory that haven't been accessed recently. The performance of modern computer architectures is increasingly tied to caching, which depends on the assumption of locality of reference for its effectiveness. Some garbage collection methods result in better locality of reference than others. Generational garbage collection is relatively cache-friendly, and copying collectors automatically defragment memory helping to keep related data together. Nonetheless, poorly timed garbage collection cycles could have a severe performance impact on some computations, and for this reason many runtime systems provide mechanisms that allow the program to temporarily suspend, delay or activate garbage collection cycles.
Despite these issues, for many practical purposes, allocation/deallocation-intensive algorithms implemented in modern garbage collected languages can actually be faster than their equivalents using explicit memory management (at least without heroic optimizations by an expert programmer). A major reason for this is that the garbage collector allows the runtime system to amortize allocation and deallocation operations in a potentially advantageous fashion. For example, consider the following program in the (garbage-collected) C# language:
class A { private int x; public A() { x = 0; x++; } }
class Example { public static void Main() { for(int i = 0; i < 1000000000; i++) { A a = new A(); } System.Console.WriteLine("DING!"); } }
And a nearly equivalent C++ program:
#include
class A { int x; public: A() { x = 0; x++; } };
int main() { for(int i = 0; i < 1000000000; i++) { A* a = new A(); delete a; } std::cout << "DING!" << std::endl; }
Using standard libraries and typical compiler configurations, the C# program executes many times faster than the C++ program. This is because the C++ allocator must hunt for free blocks of memory in a potentially fragmented heap in order to allocate new instances of the A class. In contrast, the C# runtime system can allocate memory by incrementing a pointer from some region of memory previously set aside for new allocations, relying on the garbage collector to eventually release the unused objects and compact the heap. On the other hand, the garbage-collected program may have somewhat greater memory usage averaged over time, since instances of the A class are not deallocated as quickly as they could be.
Each has different sorts of overheads:
It is difficult to compare the two cases directly, as their behavior depends on the situation. For example, in the best case for a garbage-collecting system, allocation just increments a pointer; but in the best speed performance case for manual allocation, the allocator maintains freelists of specific sizes and allocation only requires following a pointer. However this size segregation usually cause a large degree of external fragmentation and this can impact cache behaviour.
The overhead of write barriers is more likely to be noticeable in an imperative-style program which frequently writes pointers into existing data structures than in a functional-style program which constructs data only once and never changes them.
Some advances in garbage collection can be understood as reactions to performance issues. Early collectors were stop-the-world collectors, but the performance of this approach was distracting in interactive applications. Incremental collection avoided this disruption, but at the cost of decreased efficiency due to the need for barriers. Generational collection techniques are used with both stop-the-world and incremental collectors to increase performance--the trade-off is that some garbage is not detected as garbage for longer than normal.
In contrast to tracing garbage collection, reference counting is a form of automatic memory management where each object has a count of the number of references to it. An object's reference count is incremented when a reference to it is created, and decremented when a reference is destroyed. The object's memory is reclaimed when the count reaches zero.
There are two major disadvantage to reference counting:
Functional programming languages, like ML, Haskell and Lisp, traditionally use garbage collection. Lisp, which introduced functional programming, is especially notable for using garbage collection before this technique was commonly used. Safe programming languages must use garbage collection: having a manual delete operation would introduce the possibility of having a reference to a non-existent object, which would violate the type safety guarantee.
Script languages like Perl, Ruby, Python and PHP tend to have built-in support of GC. Also, many object-oriented programming languages like Smalltalk, Java and ECMAScript usually provide integrated garbage collection, a notable exception being C++.
Automatische Speicherbereinigung | Recolección de basura | Ramasse-miettes | Garbage collection | איסוף זבל (תכנות) | Šiukšlių surinktuvas | Garbage collection | ガベージコレクション | Garbage collection | Coletor de lixo | Сборка мусора | Automaattinen roskienkeräys
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Garbage collection (computer science)".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world