Talk:Chi-squared distribution/Archive 1

Archive 1

Untitled

I did appreciate the way the chi square distribution was discussed. This page will greatly help the students of statistics. To understand really statistic subject is that very hard,if one doesn't know on how to give his or her full concentration regarding the topic,but then this subject can greatly help students and researchers in interpreting what they had research on. But then I am just a little bit confuse of making a table for instance in the experimental and control group in their oral presentation skills. I used rating scales with the help of the rubric i got from one of the authors. What I am after with is on how to make a table as i have said relating it to the statistical treatment which is the chi-squared. How a reseaarcher will organize the data presented by using many skills. Can you give us useful tips that will guide us to arrive in a more comprehensive interpretation of our research.Because knowing this will lead the researcher in order to get the degrees of freedom and eventually the interpretation of the study. (User-AL TAN... of Bicol University College of Education,English Major)

Thanks to the authoer for writing this article. However, although the article might be useful for students of maths, I can't understand it easily. I'm a research and use statistics for my work, but I need conceptual explanations, I don't care about the technicalities. User:xinelo

---

That phrase is absolutely correct--and really is only confusing if one doesn't know what affects the shape of the distribution: its parameters. And in the case of chi-squared, there is but one--degrees of freedom.User:pc 1:09, 18 April 2006 (UTC)

The phrase "The chi-squared distribution has one parameter: k - a positive integer which specifies the number of degrees of freedom (i.e. the number of Xi)" is perhaps confusing... the number of degrees of freedom being mostly related to the model to fit and not to the number of data points Meduz 14:18, 5 April 2006 (UTC)


this is too tough for those who don't know much about maths. Kokiri 10:28, 3 Jul 2004 (UTC) 203.47.84.40 04:15, 9 Mar 2005 (UTC)

I've tried to give it a less technical introduction, though the gradient of difficulty is still pretty steep (and, later, bumpy). seglea 23:34, 14 July 2005 (UTC)

But, MarkSweep, I dispute that the title is wrong. It is perhaps not ideal - but the great majority of references to this distribution, both in Wikipedia and in other literature, use the written-out form of "chi-squared". Many readers would not know to look it up under "χ²". It is our business as editors to be better informed than our readers, but not to say things that suggest they are stupid. Introducing the article by "The Chi-squared distribution, or χ² distribution..." does all that is necessary to tell people that they could refer to it symbolically. seglea 23:34, 14 July 2005 (UTC)

I think Mark's point is that the title is wrong because of the capital "C". Wikipedia wants titles to start with capitals, and sometimes (as in this case) this is wrong, in which case a wrongtitle template reference is added. I think the title "Chi-squared distribution" with the wrongtitle template, and then the reference to "the chi-squared or χ2 distribution" is good. PAR 00:22, 15 July 2005 (UTC)
The upper/lower case distinction was indeed my point: The distribution is usually referred to as the "chi-squared distribution" with a lower-case c, or as the "χ² distribution" with a lower-case chi (never an upper-case Chi). If and when the MediaWiki software allows us to have lower-case for the first character of a title, this page should be called "chi-squared distribution". Meanwhile, we can use the wrongtitle template. FWIW, I'm very much in favor of PAR's recent version. --MarkSweep 01:32, 15 July 2005 (UTC)
Oh, right, fair enough - I agree no-one ever uses upper case C for chi... though I think they would if it came at the beginning of a sentence, and it wouldn't look all that wrong. It's not a total solecism like talking about t-tests with a capital T (something students let their wordprocessors do to them with tedious frequency). Really the standard wrongtitle template is too strong here - it risks misleading the reader; we need a more specific statement that explains what the title would be for preference. seglea 23:56, 15 July 2005 (UTC)
In my opinion, the disclaimer about the capital latter at the start of this page is unnecessary and looks silly. Many thousands of titles in Wikipedia are names that don't usually take a capital letter, yet they don't have this disclaimer. What about Digamma function and Beta distribution? For that matter, why is the title "Chi-squared distribution" any worse than "Normal distribution", or "Statistics"? When "chi" is put at the start of a sentence, it is written "Chi". Same for titles. --Zero 09:16, 10 January 2006 (UTC)
Because for familiar words like "Normal", everyone can be expected to know that it would ordinarily (that is, not in titles and not at the beginning of a sentence) be written as "normal". But if you see "Chi" in a title, you may not be able to tell whether it should be "Chi" in ordinary contexts, or "chi". The disclaimer tells you that it's the latter, rather than the former. --MarkSweep (call me collect) 09:05, 13 February 2006 (UTC)

In most recent textbooks as well as lectures, I've seen "chi-squared" used more often than "chi-square". I also came across Wolfram's page that used the title "Chi-Squared." Perhaps someone should at least make sure that chi-squared get redirected to this page. Iav 06:54, 23 September 2005 (UTC)

motivation

this article was useful for me. However the claim that "under reasonable assumptions, easily calculated quantities can be proved to have distributions that approximate to the chi-squared distribution if the null hypothesis is true." could do with some sort of motivation (not the actual proof, that would be over the top, just some insightful commentary), further down the article, for example expounding on the meaning of 'easily calculated' 14:30, 28 September 2005 (UTC)

noncentral chi-squared distribution, and a question

The chi-squared distribution is what you get if you take the sum of the squares of a bunch of normal distributions with mean zero and standard deviation one. If the standard deviations are all the same but not 1, rescaling allows the chi-squared to be used. If the means are not zero, however, you need a generalization, the noncentral chi-squared distribution (see [1] for Wolfram's description). If the standard deviations differ, the result can supposedly still be expressed in terms of noncentral chi-squared distributions ([2]) but how?

Further, there is what Wolfram calls the "chi distribution" (but which is more or less absent elsewhere on the Web) which is what you get if you take the square root of a chi-squared.

It might be valuable to mention these as generalizations.

Which brings me to my question: suppose you have a short list of numbers with uncertainties and you want to compute their root-mean-squared length. What is a good estimator? Put another way, suppose you have a small collection of normal random variables, with assorted means and standard deviations; what is a good estimator for the square root of the sum of the squares of the means?

The standard trick for estimating the sum of the squares is to take the sum of the squares of the values minus the sum of the squares of the standard deviations. This gives a probably unbiased but often negative answer. Taking its square root leads to problems.

If it helps, the uncertainties are probably about the same for all the random variables.

A few points:

It'd be handy to link to it from here!

  • If you have a set of random variates  , all drawn from a population with mean zero and standard deviation 1, then the sum of their squares will be chi-squared distributed. Alternatively, if you have a set of random variates   such that   is drawn from a population with different means   and different standard deviations  , then the sum of the   squared will be chi-squared distributed.
  • If you have a set of random variates, all drawn from a population with mean μ and standard deviation 1, then the sum of their squares will be non-central chi-squared distributed. Alternatively, if you have a set of random variates   such that   is drawn from a population with the same mean   and different standard deviations  , then the sum of the   squared will be non-central chi-squared distributed.
All of the above assumes you know beforehand the mean and standard deviation of the population(s) from which you draw your sample. Its not clear whether that is the situation for your case.

No; in fact, I'm trying to estimate the means (well, actually, the root of the sum of their squares).

What exactly is your data? Do you have a bunch of random variates   and each has a separate mean   and std. deviation   which you know beforehand? With maybe the   all the same? Do you want to calculate an unbiased estimator of the square root of the sum of the   squared? If so, the above two distributions won't do it. There's probably a name for the one that will, but I don't know what it is, but maybe we could calculate it. PAR 15:38, 6 October 2005 (UTC)

More or less, yes. If I knew the means and standard deviations beforehand, what would I be trying to estimate?

I have a collection of n random variables (n is around ten) X_i, each (approximately) normal with known standard deviation (not all equal) and unknown mean  . I want to estimate  .

Put another way, I have a point in n-space whose coordinates are physical measurements of known uncertainty, and I want to estimate its distance from the origin. n is not large, and the uncertainties are not small compared to the values.

There is a standard estimator for   (the square of the distance from the origin): it is  . This estimator is, I think, unbiased; but unfortunately it frequently takes negative values, which means that you can't simply take its square root to get an estimator for the square root (even if you don't mind some bias).

Ok, I don't know the answer to your question, but my first guess is that the best estimator is simply the square root of the sum of the squares of the  . I don't understand why anyone would want to subtract the variances from the squares.

The reason is, if you want to estimate the square of the quantity, well, the expected value of   is

 

which is equal to  .

The question is then, is my simple guess an unbiased estimator? I don't know, but I do know how to set up the equations to derive the answer. To start out with, lets just try to do the case where you have two random variables. If we can't do that, then forget it. Assuming each is normally distributed with its own mean and variance, and that they are independent, then the probability that the first has value   to   AND the second has value   to   is   where:
 
Changing to circular coordinates with   and   gives
 
 
If you set the μ's to zero and the σ's to one, and integrate over θ from 0 to 2π you get the chi-squared distribution for 2 degrees of freedom, with  . If you set all the μ's equal and all the σ's equal and integrate over θ you get the non-central chi-squared distribution, again with  . For your problem, we have to integrate as it stands, or at best, set all the σ's equal. We will have to work on that. Anyway, once that is done, and we have P(r), we then want to calculate the expectation of r which is the integral of rP(r) from zero to infinity. If that turns out to be the square root of the sum of the squares of the μ's, then we have an unbiased estimator. On the other hand, we could seek a maximum likelihood estimator, which is usually easier to calculate. Its the value of r for which P(r) is maximum. PAR 23:50, 6 October 2005 (UTC)

You can do better. Sort of. If what we want is the expected value of a function of a random variable having pdf  ,  , so it suffices to compute

 

This is not an integral I know how to do, nor does MAPLE.

It's possible that one could express this distribution in terms of some sort of "noncentral chi distribution" whose pdf we could actually calculate; then a maximum likelihood estimator would be a reasonable thing to obtain. But actually finding the pdf will be very difficult:

 

where   is the area measure on the sphere. I suppose we only need to find the derivative of this with respect to r and set that to zero, but it seems unlikely that that will be possible.

The problem arose because we noticed that our data was consistently biased upwards, especially when the variances were high. If we do the variance-subtraction trick, then the estimates of the square are no longer biased, but they are sometimes negative, which reminds you that simply taking the square root just won't cut it.

--Andrew 03:05, 7 October 2005 (UTC)


We are on exactly the same track. The integral that MAPLE can't do if it is reduced to n=2 is just my rP(r) from zero to infinity because by my definition  . Your   is my  , the "area element" on the circle. What I want to do is convert from cartesian coordinates   to spherical coordinates, because they are the natural coordinates to use. If you make the right assumptions and integrate over all angles, you get the chi-squared and the noncentral chi-squared, but your full problem makes no assumptions.
Something I didn't realize before is that if you take your assumption that all std. deviations are the same, but the means are different, it does become the noncentral chi-squared distribution with   and   where
 
 
That means you can solve the equation for the expectation of  :
 
where f() is the noncentral chi-squared. Mathematica solves this to be:
 
where L is the generalized Laguerre polynomial. I guess that proves that just taking the rms is not unbiased.

PAR 04:48, 7 October 2005 (UTC)

Lead paragraph

Hi

sorry to revert those well-meaning edits... but the old version was correct, terse, and reasonably clear. The new version was incorrect and (IMO) confusing. Nevertheless, I see your point: the article is deficient in that the (o-e)^2/e formula we all learned in school is not mentioned. Remember that this statistic is only asymptotically chi-squared. I will add a section on this specific application of the chi-squared distribution today (if I get a minute).

best wishes, Robinh 09:00, 13 February 2006 (UTC)

I completely agree. Incidentally, there is a separate article on Pearson's chi-squared test, which discusses the well-known chi-squared test statistic. --MarkSweep (call me collect) 09:08, 13 February 2006 (UTC)

Removal of external link? Input please

Greetings all,

Recently User:128.135.17.105 removed an external link (Free Chi-Squared Calculator) that I had placed on this page to an online Chi-squared calculator that is available for free on my website. The reason given for this removal by User:128.135.17.105 is that the link "looks like an add" (sp). I believe that the free calculator adds a great deal of value to the page, and should therefore be reposted (perhaps in a less audacious "ad-like" form). Here's why:

1. The other external link to an online calculator (On-line calculator for the significance of chi-squared) computes probabilities for Chi square values, but not the Chi-squared values themselves.
2. The distribution calculator referenced by another external link (Distribution Calculator) is not online, but instead requires that the user download and install software on their computer in order to compute Chi-squared values.
3. The form of the external link that User:128.135.17.105 removed (which supposedly looks like an ad) was modeled after another external link on the page that User:128.135.17.105 had no problem with. I invite you to compare for yourselves:
Existing external link: On-line calculator for the significance of chi-squared, in Richard Lowry's statistical website at Vassar College.
Link removed by User:128.135.17.105: Free Chi-Squared Calculator from Daniel Soper's Free Statistics Calculators website. Computes chi-squared values given a probability value and the degrees of freedom.

Out of respect for the opinion of User:128.135.17.105, I will not repost the link right away. Also note that I wouldn't mind toning down the verbage of the external link if it is restored. If anyone besides myself agrees that there is value in the external link that User:128.135.17.105 removed, please let the community know by posting your thoughts here. I would particularly enjoy discussing this issue further with User:128.135.17.105, as I believe that (in the spirit of Wikipedia) we can resolve this issue amicably.  :-)

--DanSoper 00:29, 23 June 2006 (UTC)

I agree with the author of the above message that the chi-squared calculator holds significant value for the readers of the Chi-squared distribution page. After reviewing the Wikipedia logs, I’ve discovered that the user who removed the link from this page also removed other links posted by the author of the above message on several additional statistics-related Wikipedia pages, perhaps indicating a personal issue between the two contributors. While the author of the above message makes solid arguments supporting the inclusion of his/her links on these pages, I feel that I should remind him/her that it is generally considered poor etiquette to post links on Wikipedia to one’s own web site. Nevertheless, I am of the opinion that the links would improve these pages, and as a neutral party, I will repost the links myself in the coming days if there are no objections. -J.K., Kings College

  • Note: Since writing this message, it looks the anonymous user purporting to be "J.K., Kings College" has been blocked for one month for vandalism. Not sure if that mmakes sense, actually, but it's a shared IP address, maybe even an open proxy. See User talk:213.249.155.239 for the panoply of warnings, blocks and vandalism, though. Not a very credible source. · rodii · 01:54, 28 June 2006 (UTC)
The original text is, "Free Chi-Squared Calculator from Daniel Soper's Free Statistics Calculators website. Computes chi-squared values given a probability value and the degrees of freedom." which looks like advertisement because it contains the name of the poster, claims to be free twice, is the only link in statistics to a .com page, and page contains google ads. Removing a few of these and I would have just started a discussion.
While nice an perhaps useful, is the calculator encyclopedic? The external linking policy is that it should be lined externally if it would be linked internally. 128.135.226.222 00:00, 28 June 2006 (UTC)

Though I should let you guys know, I was using this as a reference to calculate a p-value ( from a point x -> inf). I think the cumulative function listed here is actually calculating the area under the density curve from a point x -> infinity and NOT -infinity -> x. Thought I'd let you know, maybe I'm just confused

Citations and Historical Context

I appreciate the external links and support keeping them.

Some dates would be very useful in a historical context here. When did Fisher and Snedecor work together? When were the foundations laid for the chi-squared test?

It would also be great to see some citations - for instance to books (possibly Statistical Methods, Snedecor GW and Cochran WG, Iowa State University Press, 1960) and also to peer-reviewed journals, especially in the life sciences.

degrees of freedom

The current text reads: If   independent linear homogeneous constraints are imposed on these variables, the distribution of   conditional on these constraints is  , justifying the term "degrees of freedom".

I don't really understand this. It seems to me that some correction factor would be needed. Can somebody explicate that section a bit? Perhaps even add an example or reference? ( and shouldn't   be Q ? ) Thanks. Sander123 16:46, 13 March 2007 (UTC)

I've removed the part from the article. It is probably untrue and at least unsourced. As an example let x1 and x2 be N(0,1), let x3 = (x1+x2)/2. Then all xi are N(0,1) and the xi satisfy a homogeneous equation (x1+x2-2x3=0). As x1^2+x2^2 is chi-squared with 2 degrees of freedom, it cannot be that x1^2+x2^2+x3^2 is also chi-squared with 2 degrees since the x3^2 will have a positive contribution. Sander123 09:00, 26 March 2007 (UTC)

Mode of chi square (and noncentral chi square)

John Maddock and I have been extending the mathematical and statistical facilities in the Open Source Boost libraries (see for example Math Toolkit).

Question 1

We note that the mode is defined for the central chi square as k -2 for k <= 2.

so for k = 2, the mode is zero (x = 0).

But the maximum of the pdf is also for x = zero for all 0 < k < 2.

In our implementation, we are considering adding this to avoid a discontinuitity in the mode at k = 2.

(our implementation works with floating-point arithmetic, so it also permits non-integral degrees of freedom).

if (k < 0) signal error if (k <= 2) mode = degrees of freedom - 2 else if (k < 2) mode = 0

Is this reasonable, sensible, useful, mathematically correct/acceptable?

Question 2

Do anyone know of methods to calculate the mode for the noncentral chi square? (Either exactly, or an approximation).

Paul A Bristow (talk) 14:22, 7 February 2008 (UTC)

More Explanation Needed

There is a great deal of assumption made, both on the Wiki page and the discussion page, that the reader will understand the jargon being used. This is very difficult for an average person to understand; I have a degree in math, and I still don't understand: 1. Exactly how one computes a Chi-Squared test. 2. Identifying a Chi-Squared distribution. 3. The significance of the distribution or the test result.

I admit, I didn't have the best teacher for Probably/Sadistics (what we called it), but still, years of study have not shed any light on the issue.

Could somebody please try to explain these things so that the non-statistician could understand them?

soundpreacher —Preceding unsigned comment added by 68.7.37.1 (talk) 00:10, 26 July 2008 (UTC)

Fact may be wrong

For one of the facts, it says that as k->infinity, the chi-squared converges in distribution to the Normal distribution. How can this be correct? The Chi-squared dsitribution never takes negative values, even when k is "extremely large". I think the fact meant the t-distribution, which can be derived from the chi-squared?

Thanks —Preceding unsigned comment added by 24.131.203.27 (talk) 05:56, 3 October 2008 (UTC)

It is not a fact; it is a riduculous assertion and has been deleted. Thanks for pointing that out. --Nm420 (talk) 22:25, 7 October 2008 (UTC)

I reinstated it. The mean is not zero. Hopefully clarified in the new version. Robinh 07:10, 8 October 2008 (UTC)
How is the new version any more clear? It is stated that the chi-squared distribution "tends" to a normal distribution, without any indication what such terminology might mean. There are several convergence concepts on the linked page. Without a source or more clarification about what the statement means, I am disinclined to believe it and am hence removing it. The other rather loose statements about a chi-squared r.v. being "approximately" normal at least cite some source.--Nm420 (talk) 03:09, 11 December 2008 (UTC)
Hi. I reverted the removal; sorry to revert a good-faith edit. The result is bog-standard statistical theory. I believe it taught as part of many undergraduate courses. I will improve the article along the lines you suggest today, but please don't just remove content (flag it with a citation request instead, or we can continue to discuss the matter here). Best wishes, Robinh (talk) 08:05, 11 December 2008 (UTC)

That looks quite a bit more acceptable, though it seems "asymptotic distribution" is still not quite the correct link. Rather, something like the difference between a chi-squared r.v. and a normal r.v. (with mean k and variance 2k) converges to 0 in probability. There is no single limiting distribution, but rather a sequence of distributions which well-approximate a chi-squared distribution as k -> \infty. There should be some specific terminology for such behavior, though I am unaware of what it is. An "asymptotic approximation" or something of the sort? Perhaps the good people at the reference desk may be able to help. --Nm420 (talk) 15:13, 12 December 2008 (UTC)

Disregard that last bit of nonsense. It definitely appears like an accurate statement now. Sorry for the trouble, though I find it much better to have precise and accurate statements over vague terminology. --Nm420 (talk) 15:21, 12 December 2008 (UTC)
Hello Nm420. No worries. The article is much improved, largely thanks to your pointing out sloppy wording in earlier versions. Very best wishes, Robinh (talk) 20:18, 12 December 2008 (UTC)