Talk:Discounted cumulative gain

I created a small python script that computes DCG6 for future reference:

import math
rel = zip(range(1,7),[3,2,3,0,1,2])
dcg6 = rel[0][1] + sum(map(lambda (a,b):b/math.log(a,2),rel[1:]))
print dcg6

I guess the first formula is wrong, it is weird that the weight given to the second element in the ranking is the same of the first (and this is done also in the example..). I modified the formula and then I rolled back since I'm not sure at 100%, I'll check. But if i'm right also the example must be modified. — Preceding unsigned comment added by 2A00:1620:C0:50:5AB0:35FF:FEFB:88F8 (talk) 12:03, 2 August 2012 (UTC)Reply


I guess the example is wrong. The DCG values in the sum should be: 3.0 + 2.0 + 1.89278926071 + 0.0 + 0.430676558073+ 0.773705614469 = 8.0971714332 Response: strictly speaking you are right. The numbers in the example are due to rounding of log values. —Preceding unsigned comment added by 84.73.84.91 (talk) 09:41, 18 July 2010 (UTC)Reply

Moreover, it is not clear which DCG of two is the "true" one.


IDCG is not defined or referred to. From http://learningtorankchallenge.yahoo.com/instructions.php it seems to be Ideal DCG — Preceding unsigned comment added by 128.227.2.176 (talk) 23:36, 21 June 2012 (UTC)Reply

First formula for DCG is plainly wrong edit

The first formula for the DCG is wrong.

 

Looking at the page history, I can see the error has been corrected many times only to be reinstated not long after. This is likely due to the fact that this incorrect formula has appeared - probably from wikipedia - in many courses on information retrieval, and dedicated students keep "correcting it".

Since log(2)=1 in base 2, the current formula is equivalent to:

 

In other words, inverting the first and second element of the ranked list would make no difference to the DCG, which makes really no sense.

The discounting should absolutely start at the second rank.

The correct formula, which was there before the edit [06:29, 23 March 2014‎ Tranhungnghiep] is:

 

Or, if we prefer to emphasise that the first relevance is not discounted, we can write the equivalent:

 

Hopefully, this will also clarify another discussion about a paper which "mistakenly states the two DCG are equivalent in the binary case", which is allegedly easily proven false with a counter-example based on the incorrect formula. The statement is in fact correct if the right formula is used.

I am making the edit now, referring to this talk page - please think twice about what I just said, and do not sheepishly quote some course note which are actually based on this wikipedia page - only published peer-reviewed papers are proper source - let us avoid a circular proof, shall we?

Antoine-sac (talk) 17:26, 6 December 2016 (UTC)Reply

@Antoine-sac: I discovered this thread when trying to answer a question about why the Wikipedia formula differs from that in the apparent original paper, the one cited by the Wikipedia page, which is "Cumulated Gain-based Evaluation of IR Techniques (2002)" by by Kalervo Järvelin, Jaana Kekäläinen. See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.3161 for the text.
Quoting the paper:
  when i < b, and
  otherwise
This implies that, indeed, for b=2, the second rank is not discounted. I think this is corroborated by the example in the paper. Given gain G', = <3, 2, 3, 0, ..>, cumulative gain CG' = <3, 5, 8, 8 ...>, it gives DCG' = <3, 5, 6.89, 6.89, ...>.
You are right that we should appeal to primary sources, in peer-reviewed papers, but looks like the source is consistent with the original formula that you changed. Am I missing something? srowen (talk) 14:32, 10 August 2018 (UTC)Reply

NDCG example edit

The example that had been present on this page up until the 1 December revision was confusing: it suggested that ideal DCG is based on only the relevance of the documents ranked by the system being evaluated. The ideal DCG should actually be based on all known relevant documents.

I have personally seen a lot of confusion from students and practitioners because of this (I'm a professor and teach and research IR). So I edited the page on 1 December to better explain how to compute ideal DCG, including some documents known to be relevant from another context.

Anyway, part of my revised example was reverted on 11 Jan. I reverted it back and am posting this comment for the record.

BenCarterette (talk) 17:44, 12 January 2018 (UTC)Reply

Confusion regarding how the base of the log affects the NDCG: edit

Citing from the article: "When computing NDCG with the second formulation of DCG, the base of the log does not matter, but the base of the log does affect the value of NDCG for the first formulation.". This seems to go against what is said in this paper (which was also cited on the page) at page 6, after the NDCG definition: "Note that the base of the logarithm does not matter for NDCG, since constant scaling will cancel out due to normalization" — Preceding unsigned comment added by 37.160.61.46 (talk) 18:02, 4 February 2019 (UTC)Reply

The sentence was wrong; it got the first and second formulations swapped. The correct statement should be: "When computing NDCG with the first formulation of DCG, the base of the log does not matter, but the base of the log does affect the value of NDCG for the second formulation", and I've modified it that way. MaigoAkisame (talk) 18:25, 15 May 2019 (UTC)Reply

Cumulative gain vs discounted cumulative gain edit

I think this article should be labelled as "Cumulative gain", because discounted cumulative gain measure is a derivative of that, not the otherway round. Pa61302 (talk) 13:00, 10 January 2024 (UTC)Reply