Talk:Garbage collection (computer science)/Archive 2

Archive 1 Archive 2 Archive 3

Proposed move

I propose this article is moved to Garbage collection and an unequal disambiaguation to waste management is created. —Ruud 02:42, 11 February 2006 (UTC)

It was originally at something stupid like Automatic garbage collection. I would support this move, but I'm afraid that it would emphasise our CS bias. If we had a more quantitative way of establishing how many people are looking for each article, that would help. Deco 03:04, 11 February 2006 (UTC)
I want server logs :) But in absence of those my guess would be that most people looking for garbage collection here will be looking for the CS kind. —Ruud 03:09, 11 February 2006 (UTC)
We could query the hit counts for the two articles in the last database dump. I'll get on it ASAP. Deco 03:26, 11 February 2006 (UTC)
It looks to me like waste management and computer garbage collextion have about the same number of related articles and hits on a search. So I'm guessing the topic are similarly popular, although there is an undoubted CS bias at WP. I vote my CS bias and say, "move it". – Doug Bell talkcontrib 04:51, 11 February 2006 (UTC)
Given that, I'm opposed. It seems an unnecessary move, and with increasing recycling I suspect that garbage collection (waste management) is likely to be ever more important.WolfKeeper 05:24, 11 February 2006 (UTC)
I would agree except that Waste Management seems to be the dominant term in that field, not garbage collection. – Doug Bell talkcontrib 05:37, 11 February 2006 (UTC)
I will have page hit stats from the DB later tonight, if those help at all. Deco 05:40, 11 February 2006 (UTC)
I oppose. I feel the term on its own is sufficiently generic to warrant "(computer science)" for this highly specialized overloading. --Mike Lin 06:30, 11 February 2006 (UTC)
The counts from the page table of the latest dump: since the last reset of the page hit counters, there have been 165 views of Garbage collection, 494 views of Garbage collection (computer science), and zero views of Waste management. This suggests that most of our traffic is coming directly from Google rather than from the dab page, but a fair proportion of people are typing it right in. Nobody seems to care much about waste management, but then the people searching the web are more CS geeks than trash geeks. Deco 21:50, 11 February 2006 (UTC)

Merging with garbage (computer science)

Perhaps some migration of content might be in order, but the article garbage contains content which doesn't fall under this page. And garbage collection is an important enough topic to warrant its own article. --EngineerScotty 20:57, 11 February 2006 (UTC)

I agree. Many of the topics of the "Garbage" page refer to things unrelated to garbage collection itself. --metromoxie 19:37, 3 April 2006 (EST)
I've re-read the Garbage page twice, and I'm not sure I see which parts are "unrelated" to garbage collection. Could you please be more specific? (I went to the Garbage page looking for some info about uninitialized/undefined/wrong/corrupt values (as in Garbage In, Garbage Out) but there's nothing about that in there.) Ewlyahoocom 18:20, 10 April 2006 (UTC)
It seems to me that "garbage" and "garbage collection" are rather distinct topics. "Garbage" in the usual context is just any piece of data that has no meaning - for instance an uninitialised variable or pointer to nowhere in particular. "Garbage" in the sense of garbage collection, however, is a chunk of memory allocated on the heap that has no references to it - not "unmeaningful" at all, just never-to-be-used. Technically there is no real overlap between these. (In fact they are almost the opposite - the first sense is memory yet to have anything meaningful, and the second is memory which is finished being useful). Keep separate. —EatMyShortz 15:45, 14 May 2006 (UTC)
Oh actually, upon re-reading Garbage, it seems it does deal with both definitions. (I think in my above comment, the GC garbage is "syntactic" garbage and the other is "semantic". Still, this article is specifically about the quite major and important field of Garbage Collection, and the other is about what garbage is. If it is to be merged, it should really be the other way (merge into garbage) - but that's silly because GC certainly needs its own article. So still keep. —EatMyShortz 15:48, 14 May 2006 (UTC)

Reversible computing

Why is Reversible computing in the see also section? I don't see the connection except in perhaps an esoteric sense. – Doug Bell talkcontrib 21:11, 17 February 2006 (UTC)

There is a deep connection between the two; I forget the exact details off hand, but something like that garbage collection can be implemented using no energy on a reversible computer.WolfKeeper 17:08, 28 February 2006 (UTC)
Maybe because you're not allowed to remove anything since every datum has to be recallable? Wouter Lievens 08:39, 2 June 2006 (UTC)
The reversible model involved being able to undo a sub part of a calculation. So it was something like you could reverse the subcalculation that lead to the garbage being produced without affecting the end result in any way.WolfKeeper 12:06, 2 June 2006 (UTC)

Performance implication

Well, is the example a good example? Personally, for this kind of program in C++ I would use automatic allocation:

  #include <iostream>
  
  class A {
  	int x;
  public:
  	 A() { x = 0; x++; }
  };
  
  int main() {
  	for(int i = 0; i < 1000000000; i++) {
  		A a ;
  	}
  	std::cout << "DING!" << std::endl;
  }

This implementation is probably faster, or at least as fast as the program in C#, and uses less memory than both the C# and the other C++ program! Well, some languages like objective C don't come with garbage collection in their default implementation and cannot perform automatic allocation.

Nicolas (62.147.113.184), June 1st 2006, 23:15 CEST


The point of the original example was to emphasize the lack of a garbage collector in C++ as compared to C#, hence dynamic allocation is necessary as a part of the example... though IMHO the published results are skewed.

[Avi, 2006 June 03, utc - 1801]


I perfectly understand, but at least they could have found an example where dynamic allocation is useful!

Nicolas (62.147.112.212 21:05, 8 June 2006 (UTC))

It's a benchmark. It's not supposed to be useful, it's supposed to stress a particular aspect of a system.WolfKeeper 18:11, 14 June 2006 (UTC)


generational GC ... will place only ... a subset of generations into the initial white set

Maybe into the initial GREY set? -atagunov


Indeed the Nicolas (62.147.113.184), June 1st 2006, 23:15 CEST C++ code is pretty much exactly what modern Java garbage collectors do - adn I'd expect C# would as well. Java (around the hotspot time) talked about short-lived-objects allocated and freed in scopes taht can be recognised by the complier would be just as fast as stack allocations because they basically were stack allocations.

Yeah, the trouble here is getting an accurate example that's also short so we don't lead our readers into some unnecessarily large example. The point being illustrated, I think, is that many GC languages basically automatically pool allocations, which speeds them up versus one-at-a-time allocations (in C++ you can combat this by overriding the default allocator with a custom pooling allocator, but it's not the default behavior). --Delirium 16:52, 22 February 2007 (UTC)

Criticism

I have just added a gentle criticism section, it would probably good to also give external links for known problems with GC (both conceptional or for specfic implementations). However, I didn't want to put too much in at once, maybe a few critical links would help to balance this article. —The preceding unsigned comment was added by 212.209.176.2 (talk) 09:35, 28 March 2007 (UTC).

While the fact that garbage collection implies some overhead is well established in the main article, you're making a number of more worrisome claims without citations. After an hour of searching, I can't find anything more authoritative than questions (not answers) posted in coding forums and mailing lists to support these. Rather than putting in 5 citation needed tags and a weasel word tag, I've removed those statements. Unfortunately this doesn't leave much substance in the section. Please restore it with citations to your sources, I would be very interested in reading them. 67.105.142.34 19:56, 2 April 2007 (UTC)
Hm, I am not happy that the criticism section was radically shortened, on the other hand you are right to ask for more background information. For example see the german wiki article, quote "Eine automatische Speicherbereinigung verringert die Gefahr von Speicherlecks, kann sie aber – entgegen einer weit verbreiteten Ansicht – nicht völlig ausschließen"... translates to: "Garbage collection reduces the risk of memory leaks, but can't - contrary to popular belive - completely remove it". I suggest that someone reasearches the topic, extends the critisim section again and adds more links. Right now, I don't think the article gives a balanced view on the topic. 83.233.56.62 21:32, 3 April 2007 (UTC)
Some links to start with...
http://www-128.ibm.com/developerworks/java/library/j-leaks/ - Handling memory leaks in Java programs
http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html - The Truth About Garbage Collection
http://everything2.com/index.pl?node_id=1048400 - Garbage collection vs. manual memory management
http://portal.acm.org/citation.cfm?id=1094836&dl=GUIDE&coll=GUIDE - Quantifying the performance of garbage collection vs. explicit memory management
http://www.wildcard.demon.co.uk/dev/gc.html - Garbage Collection, 83.233.56.62 22:13, 3 April 2007 (UTC)

"A form" or "the form"? Garbage collection definition

The definition of "garbage collection" is currently:

In computer science, garbage collection (also known as GC) is a form of automatic memory management.

This definition seems problematic. Is there any form of automatic memory management that is not garbage collection? Cannot garbage collection be about managing other things than memory? -- Lilwik 06:57, 23 April 2007 (UTC)

This objection seems quite correct. The pushdown stack used for recursive procedure calls is another form of automatic memory management, which was not obvious at the time it was invented. Fortran and Cobol did not have that. Garbage collection is memory management for abitrarily linked structures.--Bernard Lang 16:31, 3 August 2007 (UTC)

Origin of tracing collection

Does anyone know when the concept and the term "tracing collection" was introduced ? --Bernard Lang 17:21, 3 August 2007 (UTC)

Benefits

The only things mentioned in the benefits section are elimination of certain kinds of bugs. Far more interesting, to my mind, is that garbage collection allows language features that would be nearly impossible to deal with safely absent garbage collection - currying, closures, and so forth. Not sure enough how this should be phrased to make the edit myself. —Preceding unsigned comment added by 141.212.108.138 (talk) 15:17, 2 April 2008 (UTC)

This is an important point - persistent data structures also depend heavily on garbage collection, and it simplifies sharing data across threads. I think the way I would put it is that garbage collection enables features involving memory management that is too complex and error-prone for a programmer to manage on their own. Dcoetzee 15:24, 2 April 2008 (UTC)

Disadvantages

It would be nice to have a disadvantages section. I'm not really knowledgable enough to add it, but everything has tradeoffs, GC is no exception it seems like this website is talking about that?

http://www.digitalmars.com/d/2.0/garbage.html —Preceding unsigned comment added by Walker9010 (talkcontribs) 05:14, 2 May 2008 (UTC)

Good idea! Visame (talk) 06:31, 8 June 2008 (UTC)

I agree that there needs to be more about disadvantages. I have years of experience with many languages, all without garbage collection before I began using C#. My personal opinion of garbage collection is that it supports sloppy programming, but I know many disagree with that and I don't expect the article to say that. I think that garbage collection makes it quite unclear when an object will be freed, and that uncertainty makes me uncomfortable. I think there are situations in which that would be a problem and the article needs more explanations such as that. Also, it seems that garbage collection can easily contradict object-oriented philosophy in that objects taht an object has should be deleted with the object but if the object is designed in a manner that something it uses persists after it exists then that could be the result of poor design. —Preceding unsigned comment added by 173.55.97.110 (talk) 00:29, 30 July 2009 (UTC)

[citation needed] and

The whole article would benefit from particular examples, but statements like "many generational garbage collectors use separate memory regions for different ages of objects" do not individually need tags questioning their veracity. Statements like "many people agree that George Bush sucks" warrant such tags, but purely factual statements about technical subjects are not weaselling. —Centrxtalk • 18:32, 14 June 2008 (UTC)

Object Leak

There should be a reference to a new topic called Object Leak and possibly a brief description of it in this document. Object Leak is a kind of logical memory leak that garbage collected languages are subject to.

One example of object leak is refered to by sun in this document http://www.ibm.com/developerworks/java/library/j-jtp01274.html under the stack implementation example.

Another tipical example is forgeting to unpin an object, in this example that object and its referenced object tree (that might not be pinned) will never be collected. In the .Net framework language this may happen if the programer forgets to invoke Dispose for some disposable objects.

Another example is forgeting to set references to null on the program main loop, for example:

  int main() {
  	object o = CreateSomeObject();
       DoSomethingWithObject(o);
       // Object o is no longer required but it will be alive until the program terminates.
       EnterProgramMainLoop();         
  }
This sounds like a good idea to me; this type of leak often comes up in discussions about garbage collection, and it's better to shift this information out to an article dedicated to this specific type of leak than to go into detail about it here. However, I'm not sure how I feel about the term "object leak" - this term seems to apply to all memory leaks. About the best title I've come up with so far is Memory leaks in garbage collected systems.
Also, I should remark that there's been a lot of research into collecting this type of leaked data, and that setting references to null doesn't always work due to compiler optimizations eliminating them. Dcoetzee 23:06, 2 January 2009 (UTC)

What is garbage collection

That article says garbage collection is a form of automatic memory management that uses reachability analysis. That's two times false in my opinion. First, reachability analysis is just one way to define what is garbage (which in that case would be unreachable memory). Second, garbage collection can be used for management of any resource.

For example, I can use garbage collection as a mechanism to manage sessions in a web application. Sessions aren't memory, they're an entry in some kind of database (which might be the filesystem or memory). To define whether a session is garbage, I don't use reachability but a time limit. A session which last access is too old is simply garbage, and thus may be deleted.

So garbage collection is just a resource management mechanism: you allocate resources, then at some time perform some analysis over all the resources that have been allocated, identify which ones are garbage, and free them, with a given garbage criterion. —Preceding unsigned comment added by 88.181.17.51 (talk) 14:30, 7 January 2009 (UTC)

Garbage collecting a circular reference

Don't you need some sort of circular reference to really exercise the garbage collector? Isn't garbage collecting a trivial task if you only need to wait until the reference count drops to zero in order to destroy the object? It seems if you are going to exercise the C# garbage collector you should not give it such a straw man.

For example you could have a City class, and a Building class. The City contains X number of Buildings, and each Building points to the City in which it belongs.

Now go ahead and create/destroy 1000000000 City instances in C# and C++ and see which is faster.

198.107.22.21pjc

No, you're confusing garb with reference counting. That is prob due to Java initially marketing reference counting as a garb, which it isn't. Garbs are immune to circular references, they remove them if they're unnused. Rursus dixit. (mbork3!) 15:20, 14 October 2010 (UTC)

Tri-colour marking

The section Tri-colour marking is unclear as what initially belongs to the white and the gray sets. The black session may start empty or filled with objects that should never be deleted. In fact, the startup of the sweep phase requires that all non-black items are white, and then that a specific root set is marked gray, otherwise the sweep cannot start, because the grey set would be empty. I think such an unprecise statement would suffice, but it must be worked into the current text. ... said: Rursus (bork²) 23:05, 20 January 2009 (UTC)

I've expanded this a bit after some research, I can provide a simple Java example if required Zebsy (talk) 14:17, 11 August 2009 (UTC)

Terminology: "garbage collection", "mark and sweep", "reference counting", and "automatic memory management"

These terms all have distinct meanings. Why do "automatic memory management" and "mark and sweep" redirect to "garbage collection"? That's not fair to reference counting. Also, there is more to automatic memory management than garbage collection. Although it's the hardest part, it's actually only a third of the battle. Also, garbage collection could refer to things unrelated to automatic memory management, as another commenter mentioned above.

Let me be precise. Here's what the terms mean:

"Automatic memory management" means:

  • Memory for new objects is sized and formatted automatically.
  • Memory for objects that are growing is resized automatically.
  • Memory for unused objects is freed automatically. That is, garbage collection.

"Garbage collection" means clearing resources that are no longer used. In the context of memory, it has two major categories:

  • Mark and sweep.
  • Reference counting.

It can also mean clearing unused sessions in a database based on a timeout or clearing old entries in a PF state table.

"Mark and sweep" would be the correct name for this article because it discusses little else, and naming it "garbage collection" represents a judgment as to which form of garbage collection is best or most common.

"Reference counting" already has its own article. The summary here is nice, except that it implies that reference counting is not garbage collection. Obviously, it actually is a form of garbage collection. Also, it's not uncommon, and many people consider it superior.

-- Bilbo1507 (talk) 20:23, 19 April 2009 (UTC)

If you went to people who have written software using both COM and .NET and told them that the former has garbage collection since it uses reference counting, I think most would look at you funny. Which is not to say that colloquial abuse of terminology is uncommon; perhaps it would help if you would give some kind of authoritative source for your specific definitions. --Mike Lin (talk) 20:59, 20 April 2009 (UTC)
They're confused by the marketroids. Reference counting is not garbage collection. Using marketroid "terminology", programmers would have to scream at each other all the time for all the confusions, and no programming would be done. Rursus dixit. (mbork3!) 15:37, 14 October 2010 (UTC)

The very basics of GC

This may be interesting and should possibly be worked into this article. —Preceding unsigned comment added by Parallelized (talkcontribs) 11:20, 21 May 2009 (UTC)

It is interesting, and a good intro for programming a garb, but it is too detailed for an encyclopedia which should give an overview, some rationales and methods. I'll add it first among the external links list. Maybe later some figures and text explaining those figures could be worked in the text some place. Rursus dixit. (mbork3!) 15:32, 14 October 2010 (UTC)

GC & Ref counting in multi threaded environment

I'd like to see some mention of performance in multi threaded environment. For example, a unmentioned disadvantage of reference counting is that it is usually implemented with atomic increment / decrement operations. These can pose serious performance issues in a multi threaded environment, the same is not necessarily true for GC used in the same scenario. (Just putting this out there for someone to consider writing about. I am a fan of both GC and ref counting as useful tools in native languages and managed language implementations.) — Preceding unsigned comment added by 202.173.164.146 (talk) 11:10, 30 September 2011 (UTC)

Neutral point of View?.

Some of the disadvantages do seem to be a bit biased... For example "Memory may leak despite the presence of a garbage collector".... Saying this is a disadvantage, is like saying a boat might sink despite the presence of a sealed hull... Even a manually memory managed environment, logical memory leaks can exist if care is not taken to correctly track objects to identify when they are eligible to be deallocated, or indeed end up as potential sources of memory corruption if the object is deallocated too soon.... "Programs that rely on Garbage Collectors" bullet, doesn't identify GC as been a problem, but instead programs that use GC, also tend to have other problems not related to GC. — Preceding unsigned comment added by 62.189.160.66 (talk) 13:51, 11 July 2011 (UTC)

I agree with your second statement. The problems pointed out seem too much of a stretch. I've removed the paragraph. Perhaps it would be interesting to understand what this sentence even meant. It seems to imply GC causes major layout changes in data... which does not make much sense.

I disagree with your first statement. The whole point is that you can have problems even if you play by the rules. Or at least, this is what I get out of it. I still not quite understand how this is supposed to happen, perhaps a citation needed would be appropriate.


MaxDZ8 talk 14:44, 22 August 2011 (UTC)


I think the RAII stuff is not just a matter of neutrality, but also of relevance. The (irritatingly) implicit claim is that RAII solves a more general problem than automatic memory management. However, no supporting evidence or links are given. The counter-arguments (that memory is not a "resource" in the same sense as a file handle because its behavior is not essential to the behavior of the program as a whole, and that because of this, memory can be abstracted fully and other "resources" cannot; that many languages include syntax or macros for this situation, eg "using" in C# and dynamic-wind in Scheme, unwind-protect in Common Lisp; that there are sophisticated (far moreso than RAII) means of reifying time and IO interactions that actually rely on GC, such as functional reactive programming) are also not addressed or mentioned. Instead we have extremely C++-myopic claims of RAII's superiority and distinguishing between "pointers" and "references", which I assume are referring to the C++ concepts (as if they were essential and relevant outside of C++.)

And that touches on another issue that I have with this article: it doesn't mention automatic memory management's central role with respect to functional-style programming, persistent data structures, and avoiding mutation, and the parts about the disadvantages are written from the perspective of a highly state-ful point of view. The parts of this article that mention RAII seem like they're totally missing the point. It's not that it's POV, it's that it's uninformed and C++-myopic, rather like saying an M16's lack of a bowstring is a flaw.

In order for it to continue in its current state, the implicit claim mentioned above should be made explicit and citations should be provided. Otherwise, I'll remove the parts that rely on it or that I judge as C++-myopic. I may also write a bit on automatic memory management's intimate relationship with functional programming, though I would prefer if there was a separate automatic memory management article and that this one stuck to things more specific to GC algorithms (as opposed to, say, reference-counting.) Cengelhard (talk) 21:10, 5 January 2012 (UTC)

There is no such claim either implicit or explicit. The disadvantage simply states that non deterministic garbage collection of only part of the resources effectively disables RAII based resource management of the remaining resources, and it simply illustrates the potential impact of this fact by focusing on the transitiveness for composition in the 'close' example. Simply stated, if you manage all you resources with GC or manage all your resources with RAII there is no such problem. If you manage part of your resources with deterministic GC (for example reference counted GC as in as the RAII page states c-Python) and the remaining resources with RAII there is no such problem. Only when you manage part of your resources with non deterministic GC than the rest of your resources fall outside of the scope of your GC, than these resources won't be managable with RAII either and will require alternative resource management at each level of composition. Please lets try to keep stuff objective here.I understand your dislike of C++, C++ is a hard to like language with many bad parts, but we must be objective about the fact that RAII is a completely valid alternative for GC that has very interesting compositional properties. You apparently don't like C++ and probably for this reason don't like RAII but please lets keep this page objective and be fair about the fact that its partial incompatibility with RAII can be seen as a disadvantages to some forms of GC especially given these compositional properties. Please read the "Disadvantages of scope bound resource management alternatives" of the page on RAII, hopefully that should give some perspective on why the inability to use RAII for 'remaining' resources is indeed something that can be seen as a disadvantage.

"semantic" versus "syntactic" garbage: poor terminology

The article uses the term syntactic garbage in reference to objects which are actually not reachable through the transitive closure of live references. This is a poor term, because this reachability very much has to do with the semantics of what computation has occured until now. Only in trivial examples is this linked to the syntactic structure of the program. If garbage was syntactic, it could be statically analyzed without the need for a tracing garbage collector! Note that any language which is compiled and designed for garbage collection from the ground up, the compiler can help the garbage collector. If a variable has no next use in a block of code (as determined by the compiler's data flow analysis), then that variable need not be considered as a source of live references for GC. The compiler can emit information so that no matter what execution point in the code triggers GC, the information about variable liveness in that scope is accurate. So it is not true what the article claims: that an object can become syntactic garbage long before it goes out of scope. That is precisely what liveness analysis is for. This is a problem of a poorly integrated garbage collector (e.g. a bolt-on GC library for C programs which blindly scans the stack.)192.139.122.42 (talk) 23:43, 2 November 2011 (UTC) 123 — Preceding unsigned comment added by 121.245.56.14 (talk) 06:53, 22 March 2012 (UTC)