This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Issues from 2004

Untitled

Latest comment: 16 years ago1 comment1 person in discussion

Hello. I've reworked the opening to state a definition for "statistical hypothesis test". Hopefully it's an improvement over the previous, which stated only:

Many researchers wish to test a statistical hypothesis with their data.

Feel free to further improve it. -- BTW I see the article has a strong Karl "induction is impossible" Popper bias; I don't think that's necessary, even within the realm of frequentist probability. It would be interesting to trace the history of hypothesis testing as an implementation of scientific method; I don't know what Fisher, Neyman, & Pearson, Wald, etc., said about that. -- I'm aware that a distinction is made between "significance tests" and "hypothesis tests"; I guess that distinction should be clarified in this article. I don't know if separate articles are needed. Happy edits! Wile E. Heresiarch 14:51, 17 Feb 2004 (UTC)

a tyre manufacturing company`s sales manager claims that all the tyres produced by a company have a tread life of at least 5000 kilometres,64 tyres are sampledfrom a batch of the tyres and the tread life mean of the sample is found to be 8000kilometres.The standard deviation of the production of the tyres is 4000 kilometres.Can you call the company`s sales manager an impostor based on the sample?Assume 5%level of significance and the normal distribution of the tread life of the tyres in a two-tail test?

and do you mean standard deviation of the sample? or of the entire production?

in all practicality, perhaps not an imposter, but the sales manager should certainly warranty replacement of any tyres that do not meet the claimed performance;

In other words, we're not going to do your homework for you. :-)

The following appeared on the page but looks like a comment from User:Ted Dunning

Note: Statistics cannot "find the truth", but it can approximate it. The argument for the maximum likelihood principle illustrates this -- TedDunning

Should this article perhaps be filed under Statistical test or Statistical hypothesis test with Hypothesis testing, Testing statistical hypotheses and Statistical hypothesis testing all #REDIRECTing to it? Oh, and is the "to-do list" box at the top of this page really necessary? If so, maybe it should be explained a little more. It took me a while to understand its significance (no pun intended). In particular, the phrase "Here is" is ambiguous, and even misleading, since it's often used to refer to the links that follow. - dcljr 07:54, 6 Aug 2004 (UTC)

24-Oct-2007: Since December 2002, it's been the reverse, with all those articles redirecting to "Statistical hypothesis testing" and seems ok. The bigger problem is coordinating articles for parametric and "non-parametric statistics" since the article here has been focusing (for 5 years) on the mean/stdev statistics. -Wikid77 10:54, 24 October 2007 (UTC)

Distaste not criticism

Latest comment: 15 years ago2 comments2 people in discussion

"... surely, God loves the .06 nearly as much as the .05." (Rosnell and Rosenthal 1989)

"How has the virtually barren technique of hypothesis testing come to assume such importance in the process by which we arrive at our conclusions from our data?" (Loftus 1991)

"Despite the stranglehold that hypothesis testing has on experimental psychology, I find it difficult to imagine a less insightful means of transiting from data to conclusions." (Loftus 1991)

The above are not criticisms of hypothesis testing they are statements expressing one’s distaste for hypothesis testing that offer nothing in the way of argument.

--Ivan 06:39, 8 September 2006 (UTC)

The criticism of hypothesis testing is principally that people have doing hypothesis tests when it would be more useful to estimate the difference between effects and give a confidence interval. Blaise (talk) 09:19, 8 March 2009 (UTC)

Unpooled df formula

Latest comment: 17 years ago1 comment1 person in discussion

What's the source for the two-sample unpooled t-test formula? The formula for the degrees-of-freedom shown here is different from the Smith-Satterthwaite procedure, which is conventional, from what little I know. The S-S formula for df is

$df={\frac {\left({\frac {s_{1}^{2}}{n_{1}}}+{\frac {s_{2}^{2}}{n_{2}}}\right)^{2}}{{\frac {\left({\frac {s_{1}^{2}}{n_{1}}}\right)^{2}}{n_{1}-1}}+{\frac {\left({\frac {s_{2}^{2}}{n_{2}}}\right)^{2}}{n_{2}-1}}}}$

Did someone just try to simplify this and make a few errors along the way?

--Drpritch 19:40, 4 October 2006 (UTC)

References?

Latest comment: 17 years ago1 comment1 person in discussion

There are a number of references in the criticism section, e.g., Cohen 1990, but these references are not specified anywhere in the article. Tpellman 04:43, 10 November 2006 (UTC)

Can we clarify this article to make it more readable?

Latest comment: 16 years ago2 comments2 people in discussion

The article should be made more accessible to lay users by an explanation of some of the symbols (or at least a table of variables) used in the formulas. 69.140.173.15 03:19, 10 December 2006 (UTC)

24-Oct-2007: Good idea. I have added a "Definition of symbols" row to the bottom of the table. Forgetting to define symbols is a common wiki problem. Please flag undefined symbols in other articles, as well. Thanks. -Wikid77 10:16, 24 October 2007 (UTC)

Issues from 2007

Sufficient statistic

Latest comment: 17 years ago1 comment1 person in discussion

The article states that the statistic used for testing a hypothesis is called a sufficient statistic. This is false. In some cases the test statistic happens to be a sufficient statistic. For most distributions a sufficient statistic does not even exist. This is especially so if there are nuisance parameters. When a test can be based on a sufficient statistic it is advantageous to do so. 203.97.74.172 22:19, 21 January 2007 (UTC)Terry Moore

What's inappropriate about the link?

Latest comment: 17 years ago2 comments2 people in discussion

Re this edit: "11:42, 14 February 2007 SiobhanHansa (Talk | contribs) m (Revert inappropriate addition of exernal link)" Please tell me what is inappropriate about the addition of the link. It looks like a book, available in full online, about statistics. This could be a useful resource for readers of this article. --Coppertwig 13:47, 14 February 2007 (UTC)

Sorry for the late response. The link has been spammed across multiple articles and European language Wikipedias by several IP addresses who have not made other edits. Standard practice is to revert mass additions that appear to be promoting a particular point of view or resource. If regular editors of this article think it is a good addition to the article then it should stay. -- Siobhan Hansa 15:02, 14 March 2007 (UTC)

question

Latest comment: 16 years ago2 comments2 people in discussion

Isn't the alternate degrees of freedom for a two sample unpooled t-test equal to min{n₁,n₂}-1? —The preceding unsigned comment was added by 68.57.50.210 (talk) 02:21, 19 March 2007 (UTC).

24-Oct-2007: Agreed. I am putting " - 1". -Wikid77 10:18, 24 October 2007 (UTC)

perm test

what is the application of permutation test in estimation of insurance claims.12:12, 10 May 2007 (UTC)41.204.52.52bas

Oops.

Latest comment: 16 years ago1 comment1 person in discussion

I "broke" the article by an unclosed comment which effectively deleted the following sections. Sorry.

66.53.214.119 (talk) 00:36, 11 December 2007 (UTC)

Citations

Latest comment: 16 years ago1 comment1 person in discussion

Many of the remaining requests for citations can be answered by reference to other Wikipedia articles:

http://en.wikipedia.org/wiki/Karl_Popper for the philosophy of science http://en.wikipedia.org/wiki/Mathematical_proof "A proof is a logical argument, not an empirical one." (so statistics don't prove anything).

http://en.wikipedia.org/wiki/Statistical_significance '"A statistically significant difference" simply means there is statistical evidence that there is a difference; it does not mean the difference is necessarily large, important or significant of the word.' (maybe - significant in the common meaning of the word?) http://en.wikipedia.org/wiki/Effect_size "In inferential statistics, an effect size helps to determine whether a statistically significant difference is a difference of practical concern." "The effects size helps us to know whether the difference observed is a difference that matters."

67.150.7.139 (talk) 03:32, 11 December 2007 (UTC)

Organization

Latest comment: 16 years ago1 comment1 person in discussion

Shouldn't the Contents be the first section?

66.53.213.51 (talk) 04:47, 17 December 2007 (UTC)

Issues from 2008

The Definition of Terms Section

Latest comment: 16 years ago1 comment1 person in discussion

Few of the terms have utility in Fisher's original conception of null hypothesis significance testing. They do have utility in the Neyman-Pearson formulation. Should this article discuss both? Unless so, the section should be heavily edited because the null hypothesis is never accepted in Fisher's formulation which makes the "Region of acceptance" very difficult to understand.

See the fourth paragraph under Pedagogic criticism.

67.150.5.216 (talk) 01:59, 14 January 2008 (UTC)

One-Proportion Z-test not well-defined

Latest comment: 15 years ago1 comment1 person in discussion

The One-Proportion Z-test is not well defined because the symbol p-hat does not appear in the legend. Perhaps it should be made clearer that p-hat is the hypothesis proportion, and p is the sample proportion. —Preceding unsigned comment added by 140.247.250.91 (talk) 14:58, 5 December 2008 (UTC)

Removed section

Latest comment: 15 years ago1 comment1 person in discussion

I've removed a newly placed section and am putting it here for reference. The new section read as follows:

== Test statistic ==

Central to statistical hypothesis testing is a test statistic, whose sampling distribution in general, and specific value on the observed data, drives the analysis.

For a given distribution with known parameters, a test statistic is easy computed, or one can compute the p-value directly – for example in Fisher's "Lady drinking tea", where the null hypothesis was a sequence of 8 coin flips, and the Lady correctly guessed the tea method 8 times, the (one-tailed) p-value was $1/32\approx 3\%.$ Similarly, if one has a normal distribution with given mean and variance as null hypothesis, one can apply a z-test.

In general, however, a null hypothesis will have an unknown parameter, in which case one cannot proceed as easily. Test statistic are thus generally based on pivotal quantities – functions of the observations that do not depend on the parameters. A basic example is Student's t-statistic, which functions as a test statistic for a normal distribution with unknown variance (and mean): the t-test is based on the sample mean and sample variance, while the z-test requires the parameters to be known.

The Fisher example is already given in the section titled "Example". TBH, the proposed section doesn't explain much in an already difficult article. Perhaps some of the additional text can be integrated into the Example section? Phrases like "drives the analysis", "in general, however a null hypothesis will have an unknown parameter, in which case one cannot proceed as easily", etc., further confound this extra section in an already disorganized article. ... Kenosis (talk) 04:04, 24 April 2009 (UTC)

Tone and jargon

Latest comment: 14 years ago5 comments3 people in discussion

I've just moved the 'tone' tag, from its previous position at the top of a particular section, to the head of the whole article.

It looks as if one of the problems is a problem with jargon, in that some of the passages seem to say what the authors are unlikely to have meant in plain language.

Example: the article says "in other words, we can reject the null when it's virtually certain to be true", but this seems to read like nonsense: and I'd hazard a guess, that what the author actually meant was something else, that (when said plainly) might be along the lines "in other words, the standard test procedure may encourage us to reject the null (hypothesis) when it's virtually certain to be true" -- a very different thing.

There are also other passages that seem to need attention from an expert who can suss out what was really meant, and then make the article say plainly what it means, and mean what it says. Terry0051 (talk) 19:25, 13 May 2009 (UTC)

The tone tag is very important. This article is ridden with jargon. This needs to be spread out in greater simplicity.

Wikipedia needs to be conveyed with clear writing to explain matters to the non-experts. Jargon is for experts. Experts do not need these articles. They already know the matters. We need this article to be improved, to explain the material at hand on a lay-person's level of knowledge.Dogru144 (talk) 00:26, 23 July 2009 (UTC)

Actually, though there are some statements that are arguably erroneous in the article, it's not merely "jargon". Statistical hypothesis testing is a technical art the expression of which is heavily dependent upon a set of terms that are widely accepted "terms of art" which are often difficult to reduce to plain English in a way that can be readily agreed among knowledgeable persons in this complex, highly technical topic area. Given the formula-laden approach of most statistics texts, and the relative rarity of books that explain statistics in laypersons' terms, it might be a very tall order to explain this topic in "plain English" and still meet WP:V in a way that has a chance of becoming a consensused, stable article.
..... The example given by Terry005 above is one of several instances where students of statistics (and/or those already intimately familiar) attempt to reduce statistical terms of art to plain English: The statement "in other words, we can reject the null when it's virtually certain to be true" is correct only when one already knows what the word "it's" is intended to refer to-- the words "it's virtually certain to be true" refers to any hypothesized correlation between two factors under specified conditions, which can be reasonably said to be "true" only when the null hypothesis is rejected due to evidence of a correlation to within a very high degree of confidence-- (to within a very tight "confidence interval"). Of course something like this could, and should, be written in a way that doesn't look false on its face to those unfamiliar with the process. Perhaps something like "In other words, one can reject the 'null' when the hypothesized relationship is virtually certain to be true" might be clearer. But it isn't at all an easy task in a topic that is dominated by so much formal mathematics and so many specialized terms.
.....So, one approach might be to split the topic into another introductory article. There are many examples of this approach across the wiki, some of which can be seen at User_talk:Kenosis/Research. Having noted this possibility, the same issue seems to remain-- which is actually writing it in a way that will ring true to those who are knowledgeable in statistical testing as well as being helpful to laypersons. ... Kenosis (talk) 05:26, 23 July 2009 (UTC)

Thank you for your reply. A more ideal manner of writing this article would take breaking these ideas down to smaller sized chunks, as math or science teachers do with elementary school students.

This is an example of simpler language that would be more helpful to a broader audience: "In other words, one can reject the 'null' when the hypothesized relationship is virtually certain to be true" might be clearer. Yes, the ideas are packed with specialized terms. It will take much time and patience to break it down. I have faith that the task can be accomplished.Dogru144 (talk) 08:21, 23 July 2009 (UTC)

I briefly attempted to address this issue by inserting the proposed alternative sentence, but quickly realized that the source is making an argument about excessively lax rejections of null hypotheses which lead to excessive Type I errors. So I reverted back to the earlier text in the "Straw man" criticism section here, and removed the "citation needed" templates because the basic assertion seems to properly reflect the already given source for the assertion. Whether the criticism is valid is a whole separate discussion-- a rather complex one to say the least. ... Kenosis (talk) 15:19, 27 July 2009 (UTC)

Howto tag discussed

Latest comment: 14 years ago3 comments3 people in discussion

A 'howto' tag was recently added to the section on 'test procedure'. I've added an explanatory sentence without removing the 'howto' tag, but I propose here that the tag should be removed.

No disagreement is raised about the principle that "The purpose of Wikipedia is to present facts, not to train". But here, a summary of the test procedure can serve to reveal important characteristics of the subject-matter not otherwise effectively communicated. In that way, the role here of the section on test procedure is not to train, but to present facts about the nature of the testing which is subject-matter of this article. (I leave open the possibility that the current description of the procedure could be improved to present the facts more effectively.)

In this case, procedural presentation of facts can be seen as especially important, because an important part of the current state of this article is the multi-part section describing criticisms of frequentist hypothesis-testing. Some of these criticisms are connected with, or dependent upon, features of the test procedure. It can also be controversial, or at least argued, whether some of the effectively criticised features of the procedure represent its misapplication, rather than its proper application -- for example, various practices about what inferences to make from specific numerical levels of probability.

So when (and if) the article reaches a reasonably mature encyclopedic state, I would hope that it will show two characteristics:

(1) a consensus description of the true nature and content of the testing with enough detail to enable the criticisms to be properly understood -- and this would include an adequate note of any competing views of the nature and content of the testing, perhaps among them how the testing was proposed by its originators to be used, and how it is actually/often used in current practice, and

(2) in respect of each type of criticism, how it relates to features of the nature and content of the testing.

What I'd suggest is that if, at that point, a procedure section clearly duplicates facts about the nature of the testing accurately represented elsewhere, then it would look redundant. But otherwise, I suggest it has a genuinely encyclopedic role here.

Terry0051 (talk) 12:19, 7 July 2009 (UTC)

I think the section is a valuable part of the article. It might be renamed to "summary of hyptothesis testing" or some such, and it might appear slghtly better with the numbered points being formatted as such. Possibly the last 3 points might be re-phrased away from being "instructions", but it doesn't seem to qualify as a how-to list. Melcombe (talk) 14:24, 7 July 2009 (UTC)

I agree wholeheartedly. I think the tag police should step down on this one. When discussing a technique, how can you not delve a little bit into how the technique is performed? These rules need to be applied intelligently, not in the knee-jerk way that was applied here. It's completely appropriate to discuss how hypothesis testing is done, because that defines what it is. For example, it would be inappropriate to discuss how to shift in a manual transmission car in an article on transmissions. However, it's entirely appropriate in the heel-and-toe article. Can we just put this to rest and remove the tag? Birge (talk) 21:53, 4 September 2009 (UTC)

Issues from 2009

Bayesian criticism section

Latest comment: 14 years ago2 comments2 people in discussion

I am a little confused about this section. It talks about a situation in which P(Null) is close to 1, but we got some extremely unlikely data, so P(Data|Null) is close to 0 and therefore we reject Null, even though P(Null | Data) is close to 1 too. Every claim it makes is true, but what it fails to do is actually produce a valid criticism of hypothesis testing. If P(Data|Null) is close to 0, then this particular situation will happen very rarely, and therefore we will only make the mistake of rejecting an almost certain null hypothesis very rarely. If your goal is policy making, then that's a pretty good promise.

Don't get me wrong, I am not arguing that there's nothing wrong with hypothesis testing - just that in its current form the summary of criticism from Bayesian statisticians is rather weak and not very convincing.

Ivancho.was.here (talk) 15:31, 17 September 2009 (UTC)

That could be because all criticism from Bayesian statisticians is rather weak and not very convincing. Melcombe (talk) 15:44, 17 September 2009 (UTC)

True vs. un-rejectable

Latest comment: 14 years ago4 comments2 people in discussion

It seems to me to be adequately clear that a proper focus should be not on whether or not the null hypothesis is true or false but, rather, on whether it is rejectable or not based on the results of the experiment conducted. That being said, I take issue with the following definition provided in the article for ~~"p-value"~~ "unbiased test":

or equal to the significance level when the null hypothesis is true.

In terms of the definitions given for α and β, the phrase incorrectly rejecting the null hypothesis appears consistent with the aforementioned focus, in that such rejection is improperly done not based on the hypothesis actually being true, but rather as a result of overemphasis of the evidence in favor of rejecting it. In other words, in appreciating a false-negative-result, we are not concerned that the hypothesis is true nearly as much as we are concerned with the results of the experiment insufficiently contesting the null hypothesis (yet similarly appreciating that it may be false, yet undetectable or at least undetectable by the study methods performed). My problem lies in the above quoted (indented) copy/paste in reference to the ~~p-value~~ unbiased test, in which an inordinate stress seems to be placed on the hypothesis being true. If I am not mistaken, we can never really know if the null hypothesis is true -- I'm therefor disturbed by this statement, which I believe should read:

or equal to the significance level when the null hypothesis is unrejectable as a function of the experimental results.

If what I wrote is not completely incoherent, I'd welcome comments to either bolster my argument or explain to me why my premise (i.e. my understanding of this definition of ~~p-value~~ unbiased test in terms of the "trueness" of the null hypothesis) is incorrect. DRosenbach ^{(Talk | Contribs)} 12:34, 25 September 2009 (UTC)

I don't see the phrase you are complaining about. The "definition" I see is "p-value: The probability, assuming the null hypothesis is true, of observing a result at least as extreme as the test statistic" which seems correct. I did look for your phrase specifically ... if it is still there, could you be more specific? However, the article is correct in working with probabilities calculated assuming the null hypothesis is true, while you are correct that "we can never really know if the null hypothesis is true". But that is the way significance tests work. There is a logical leap from the probabilities calculated to making a "decision" which is the subject of the so-called controversy about significance tests. Melcombe (talk) 13:19, 25 September 2009 (UTC)

Sorry -- I meant unbiased test Your point is well taken, Melcombe, but if there is such a focus on pedanticism, I think it should be across the board. DRosenbach ^{(Talk | Contribs)} 13:27, 25 September 2009 (UTC)

OK found it. But I think it is correct. As before, the probabilities are calculated assuming that one does know the true state of affairs and the condition stated is a property that one would want a "good" test to have, given that one can work out the probabilities assuming a particular version of the truth. That is, there should be a balance between the probabilities of reaching the conclusion "reject the null hypothesis" between the cases where the null hypothesis is true and the cases where the null hypothesis is false. Melcombe (talk) 14:32, 25 September 2009 (UTC)

Popper or not Popper

Latest comment: 14 years ago2 comments2 people in discussion

As a suggestion because there has been considerable discourse on phrasing of the standard hypothesis test result.

A p-value is an estimate of the probability of getting the observed result given that the null hypothesis is true.

This phrasing emphasizes that a hypothesis test is incapable of proving the null hypothesis false, nor can it prove the alternative hypothesis true. Given the outcome of the test it is up to the researcher to make a judgement call based on the outcome of the test.

This phrasing also leaves open the possibility that one could use the outcome to justify claiming that the null hypothesis is true. If a p-value of 0.05 or less is accepted as a criteria for rejecting the null hypothesis, then by extension a p-value of 0.95 or greater should be sufficient to allow for a claim that two samples are the same.

for an example see: Ebert, T.A., W.S. Fargo, B. Cartwright, F.R. Hall. 1998. Randomization tests: an example using morphological differences in Aphis gossypii (Hemiptera: Aphididae). Annals of the Entomological Society of America. 91(6):761-770.

Eumenes89 (talk) 01:52, 20 December 2009 (UTC)

To a large extent I would agree except for one thing: As for a claim that the "two samples are the same", I think not. One can only say that in view of the result they are not distinguishable by this test. (Also, it's not apparent how they are any more indistinguishable when the p-value is close to 1 than when the result fails by a narrower margin to reach significance.) Terry0051 (talk) 10:58, 20 December 2009 (UTC)

Acceptance region

Latest comment: 14 years ago1 comment1 person in discussion

Don't redirect "Acceptance region" to this page if this phrase doesn't even show up ONCE! You're giving people the wrong impression, ie. that info on the topic already exists, when it doesn't. —Preceding unsigned comment added by 67.159.64.197 (talk) 03:25, 13 February 2010 (UTC)

Issues from 2010

Section: An introductory example, null hypothesis

Latest comment: 12 years ago4 comments4 people in discussion

Presently the null hypothesis in the example with the clairvoyant is:

H_{0}:p\leq {\tfrac {1}{4}}

but should be

H_{0}:p={\tfrac {1}{4}}

I was told this was a valid example of composite hypothesis by Melcombe in a history revert. A null hypothesis is always a simple hypothesis, and only alternative hypothesis can be composite. You cannot rejection regions from a null hypothesis which is composite.

The union of the null and alternative hypothesis does not need to contain everything in a hypothesis test. Even in later equations in this section it is actually implied that they are equal:

P({\mbox{reject }}H_{0}|H_{0}{\mbox{ is valid}})=P(X\geq 25|p={\tfrac {1}{4}})=\left({\tfrac {1}{4}}\right)^{25}\approx 10^{-15},

which under a composite null hypothesis should be:

P({\mbox{reject }}H_{0}|H_{0}{\mbox{ is valid}})=P(X\geq 25|p\leq {\tfrac {1}{4}})\leq \left({\tfrac {1}{4}}\right)^{25}\approx 10^{-15},

but then you have only given a bound to the probabilities of rejection of the null hypothesis given it is true. Tank (talk) 09:44, 19 May 2010 (UTC)

It is standard that null hypotheses can be composite. The requirement for a definition of "unbiased test" in the "definition of terms" section is based on the need to deal with composite tests. See Null hypothesis#Directionality for more discussion. For an explicit example, consider any of the tests in Category:Normality tests, where the null hypothesis is always composite (since the mean and variance are not specified under the null nypothesis). As discussed at Null hypothesis#Directionality, it may sometimes be possible to logically reduce a test of a composite null hypothesis to a test of a simple one, but this is not always so, and it is not so for normality tests. Other examples include likelihood ratio tests. Melcombe (talk) 10:05, 19 May 2010 (UTC)

I restored the null hypothesis into its original simple form. In this introductory example it is the simplest form of the given problem. Although a null hypothesis may be composite, there is no need in this example. Nijdam (talk) 10:08, 20 May 2010 (UTC)

It is worse than this, because this particular example demands the simpler, mutually-exclusive competing hypotheses. The null is p = 0.25, and the alternative is simply that the proportion is NOT EQUAL to 0.25; because if the subject were to guest NONE of the cards correctly, that itself is a rare event and suggests intentional misses, which is contrary to the assumption of no clairvoyance. [See Signal Detection theory, which is quite happy to acknowledge and deal with this situation.] You have to remember you are dealing with the highly-compressed proportion scale (0 to 1). To assume that the entire range from 0 to 0.25 is unimportant is ignorance. The fact that proportion data often impose floor and ceiling effects is no reason to goof up with this example. As it stands now, it is flat wrong in both principle and practice. — Preceding unsigned comment added by 61.31.150.138 (talk) 11:13, 9 November 2011 (UTC)

Section: Introduction/Examples

Latest comment: 13 years ago3 comments3 people in discussion

Hello Nijdam,

I would like to discuss your changes back on Sept 19, 2010. I felt the need to restore the tables for the following reasons.

The tables were not redundant. The tables added new information for each example. They clarified what was written in paragraph form in a more structured form.
Who is your audience? I suspect you are writing for fairly mature or expert level statisticians. I feel the audience should be freshman level undergraduates taking their first and possibly only statistics course. In the Wikipedia guidelines, the audience is not supposed to be scientists/experts but instead layman. I feel the tables added value. Maybe not to you personally but I think yes to the target audience I identified.
My attempt was to make the process methodical. Steps 1,2,3... My attempt was to make the tables a sort of "tool" for newcomers.
I borrowed the table from another page (Type I and type II errors) in an attempt to standardize it. You swapped rows and columns in your revised table for example 1 so that they were different from example 4. That inconsistency is usually quite confusing to newcomers.
In Wikipedia guidelines it is generally discouraged to delete information -- unless of course it was just plain wrong or offensive which the tables were not.
Finally, I put a lot of my own personal time which is scarce into writing those tables.

If you still disagree, after all these points then maybe we should engage others to bring this to a consensus.

Thank you for listening. Joe The-tenth-zdog (talk) 02:28, 22 September 2010 (UTC)

Frankly, yes, I disagree with you. I replaced one of your tables by a more conventional one. The other tables would be no more than a repetition of this one. The specific wording you use in the tables is, IMHO, not suited for an encyclopedia. Nijdam (talk) 22:59, 22 September 2010 (UTC)

I too disagree. Some of the principals for what Wikpedia should contain are at WP:NOTMANUAL : it is not a tutorial or textbook. All these examples are far too long, and things like "Send him home to Mama!" should not be appearing. Melcombe (talk) 14:38, 23 September 2010 (UTC)

The criticism § is an embarrassment

Latest comment: 13 years ago5 comments4 people in discussion

I hope it was not done by Americans, but I am not sanguine that it wasn't. Suggest that all that content be moved to a misuse of statistics article, creating it if necessary. It makes the wiki editorship as a whole look mathematically and scientifically illiterate/simple minded. 72.228.177.92 (talk) 23:20, 14 November 2010 (UTC)

Also looks like there may be a failure to distinguish between "criticism" and basing on different fundamental models of probability theory which is something that generally should be done for best fit with the problem at hand. So maybe a section on fundamental models and their effect should absorb the current content in the criticism section on Bayesian theory. 72.228.177.92 (talk) 02:09, 15 November 2010 (UTC)

I don't see why it's necessary to call Americans "simple minded" or "scientifically illiterate" to convey the point that you don't like a section. No more necessary than mentioning any other bigoted opinions, which are completely irrelevant to the matter at hand.Napkin65 (talk) 15:20, 17 May 2011 (UTC)

i don't think he's calling Americans such things; i think he meant to say "native English-speaker," since this section is so poorly written... as a statistically-minded American I agree. for example "The test is a flawed application of probability theory," is vague almost to the point of being meaningless (how is it `flawed'? i can think of 2 interpretations (and counter-arguments) off the top of my head), while "the test result is uninformative" can't possibly be true except in a few artificially pathological cases! Also, the "misuses and abuses"-section seems to not discriminate between the legitimate ("rigorously control the experimental design") and dishonest ("Publish the successful tests; Hide the unsuccessful tests."), while repeating points from the preceding "Selected Criticisms" section... On top of this, the grammar is just uniformly atrocious. It needs to be re-written. I would take a crack at it, except that i don't know what this section is even trying to say, and i don't want to step on toes. 209.2.238.169 (talk) 23:34, 5 July 2011 (UTC)

In fact, the original criticism above related to an earlier version that was mostly replaced as part of the contribution discussed below under "Major edit discussion invitation". However the points above about the present version seem valid. Melcombe (talk) 08:23, 6 July 2011 (UTC)

Issues from 2011

Suggesting: Null Hypothesis Statistical Significance Controversy

Latest comment: 13 years ago1 comment1 person in discussion

I read this article. Then I scanned Introductory Statistics, 5th Ed, Weiss, 1999; Statistical Methods for Psychology, 5th Ed, Howell, 2002; Statistics Unplugged, 2nd Ed, Caldwell, 2007. This article contains more criticism of the null hypothesis statistical significance test than all three books combined. Finally, I selectively read 11 papers from Research Methods in Psychology, Badia, Haber & Runyon, 1970. The majority of the criticism contained in this article was known 40 years ago.

On the basis of this modest literature survey, I suggest that the bulk of the criticism be moved to a new article entitled "Null Hypothesis Statistical Significance Test Controversy". It should receive the attention of a subject matter expert. Critical quotations have been selected for emotional impact because, "one can hardly avoid polemics when butchering sacred cows". The result does not have the appearance of impartiality. Significance tests are heavily used - half of an introductory statistics text may be dedicated to their description. While they may eventually be regarded as a mis-use of statistics, that is not the consensus opinion _today_. This article should not imply otherwise. This article should mention the controversy, summarize it and link to the more detailed article. The new article can contain the graduate level seminar while this one contains the undergraduate introduction to statistics.

Criticism of the null hypothesis statistical significance test originates from a long list of reputable researchers - many from the arena of experimental psychology. Some of the criticism may result from concerns unique to the discipline. The best available summary of the issues seems to be Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy, Nickerson, Psychological Methods, 2000. The article is 61 pages in length and contains a list of references more than 10 pages long (~300 references). For more detail, there are entire books on the controversy: What if there were no significance tests?, Harlow; The Significance Test Controversy, Morrison; The Cult of Statistical Significance, Ziliak; Statistical Significance: Rationale, Validity and Utility, Chow;... There is lots of material for a new article.

159.83.196.1 (talk) 21:06, 20 January 2011 (UTC)

Suggestion to Merge with p-value page

Latest comment: 12 years ago1 comment1 person in discussion

The wikipedia p-value page says someone has proposed merging that page with the hypothesis testing page. Please, please don't merge these pages. As explained by SN Goodman (Ann Intern Med. 1999;130:995-1004.), p-values & hypothesis tests spring from different, even opposing, heritages & purposes. RA Fisher proposed the p-value as a measure of (short-run) evidence, whereas Jerzey Neyman & Egon Pearson proposed hypothesis testing as a procedure for controlling long-run error rates. Neither Fisher, nor Neyman, nor Pearson would have agreed with combining the procedures into a single one. Both p-values & hypothesis tests are already confusing enough (given that they don't constitute the inference most frequentists claim they do); let us not further confuse the reader by merging these pages. — Preceding unsigned comment added by Khahstats (talk • contribs) 18:57, 24 December 2011 (UTC)

A few missing citations

Latest comment: 13 years ago2 comments2 people in discussion

A few missing citations:

Significance and practical importance:

References 16 & 17 are duplicates. Ziliak is the first author.

Meta-criticism:

The Task Force on Statistical Inference.(1999). Statistical methods in psychology journals: Guidelines and explanations L Wilkinson - American Psychologist, 1999

http://www.icmje.org/

Uniform Requirements for Manuscripts Submitted to Biomedical Journals: Publishing and Editorial Issues Related to Publication in Biomedical Journals: Obligation to Publish Negative Studies

"Editors should seriously consider for publication any carefully done study of an important question, relevant to their readers, whether the results for the primary or any additional outcome are statistically significant. Failure to submit or publish findings because of lack of statistical significance is an important cause of publication bias."

Practical criticism:

The "missing" citations are [26][27], found after the following sentence.

Straw man:

Most of the content of this section is repeated in the following section. Delete the duplication to silence the owl (Who? Who?).

Bayesian criticism:

This issue is discussed at length in: Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy RS Nickerson - Psychological Methods, 2000.

While Nickerson has lots of citations, I doubt that he has the best for the Bayesian approach(es).

He cites Cohen (our [29]) who has a very readable example - An accurate test for schizophrenia, which is rare, provides very different results from frequentist vs Bayesian analysis. The accuracy implies that a patient with a positive result is likely to be schizophrenic. The rarity of the disease implies that any positive result is likely to be false. The two implications are in conflict.

Something is wrong with the Lehmann citations. [6] is incomplete. In Further Reading, Lehmann's 5th edition is mentioned (1970), but his 3rd edition was published in 2005 [8]. —Preceding unsigned comment added by 159.83.196.1 (talk) 23:46, 26 January 2011 (UTC)

159.83.196.228 (talk) 20:30, 22 January 2011 (UTC)

Error in Example 2 - Clairvoyant card game

Latest comment: 13 years ago2 comments2 people in discussion

Hi. The Probability of P(X>=10|P=1/4) is given as 9.5 x 10^-7. However, this number seems incorrect. Shouldn't it be ~0.0071 (See http://stattrek.com/Tables/Binomial.aspx)? Just wanted to check on the talk page first in case I am misinterpreting something. - Akamad (talk) 22:36, 24 January 2011 (UTC)

Thanks, ypu're right. Value has been changed. I have restored it. Nijdam (talk) 10:48, 25 January 2011 (UTC)

Disconnect between The testing process & Interpretation

Latest comment: 13 years ago1 comment1 person in discussion

Disconnect between The testing process & Interpretation:

Using an inconsistent formulation causes difficulties.

While the Interpretation says, "The direct interpretation is that if the p-value is less than the required significance level, then we say the null hypothesis is rejected at the given level of significance.", The testing process says nothing about the p-value or the required significance level. Both are in the Definition of terms section. While it is possible to formulate in terms of regions, p-values or confidence intervals, there is merit in consistency. The Criticism sections and many introductory statistics books use p-values.

159.83.196.130 (talk) 20:15, 3 February 2011 (UTC)

Major edit discussion invitation

Latest comment: 13 years ago5 comments3 people in discussion

The proposed major edit of March 1 addressed the embedded editorial comments in the existing article, sections Potential misuse through Publication bias. Comments? —Preceding unsigned comment added by 159.83.196.1 (talk) 21:02, 8 March 2011 (UTC)

I think you are going to have to divide up your proposed changes so that that can be discussed individually rather than everyone having to try to see what the point of a whole group of changes was. However, you might start by noting the header of the article which says: Of course the promised content about the Bayesian version of what hypothesis testing tries to do is not really there or anywhere else as far as I know, but that is not the fault of this article. There have previously been suggestions for a separate article for a Bayes versus classical comparison on an equal basis, and that would been far better than adding confusion to an article that tries to explain what the classical approach is. Melcombe (talk) 10:30, 9 March 2011 (UTC)

Alas, advertising or rabble-rousing for a Wikipedia edit proposal. Call it a summary. :-)

Organization Before: ...

Potential misuse
Criticism
- Significance and practical importance
- Meta-criticism
- Philosophical criticism (recently deleted)
- Pedagogic criticism
- Practical criticism
- Straw man
- Bayesian criticism
Publication bias

...

Organization After: ...

Radioactive suitcase example revisited
Types of Hypothesis Tests
Academic Status
Usage
Weakness
Controversy
- Selected Criticisms
- Misuses and abuses
- Results of the controversy
- Alternatives to significance testing
- Future of the controversy

...

All of the listed original sections contained criticism. The volume of criticism was a substantial fraction of the article length. These sections were filled with quotations chosen for emotional impact rather than for factual content. Wikipedia embedded editorial comments. Earlier discussion characterized these sections as containing expressions of distaste or as an embarrassment.

The replacement sections are more focused on the topic of the article. Several sections illuminate hypothesis testing by comparison with statistical decision theory. There is mention of historical terminology (significance testing) which should make it easier to understand the titles of some of the references. The criticism is tersely summarized in a list format with a reference to a lengthy (60 page) discussion of the issues. Six other sources of criticism are cited - 4 books and 2 journal articles.

My goals:

Focus on hypothesis testing.
Reduce the volume and the emotional impact of the criticism sections.
Summarize the criticism and provide references to it.

I managed to summarize Bayesian criticism in totally non-mathematical terms which answers one editorial comment requesting clarification.

The result is shorter than the original.

The new content generally has adequate citations to justify the claim of factual content from existing sources.

Several links to the criticism section are broken by the edit.

There is more reference to things Bayesian than desirable in an article on frequentist statistics. This is very bad since I regard the opening disambiguation as flawed. Most veterans of Statistics 101 haven't heard of Bayes Theorem, Conditional Probability or Frequentist vs. Bayesian Statistics. They will not understand the first sentence. I do not understand the nuances. I took Probability rather than Statistics and avoid Philosophy. —Preceding unsigned comment added by 159.83.196.133 (talk) 17:32, 12 March 2011 (UTC)

Melcombe, I found a reference that explains the differences between frequentist and Bayesian approaches more clearly. I will clean up the two affected sections appropriately. —Preceding unsigned comment added by 159.83.196.1 (talk) 23:33, 15 March 2011 (UTC)

I am certainly not an aribiter here, and it wasn't me who reverted your(?) attempt at a major revision in one go. I guess I was responsible for moving to this article most of the similar supposed-criticism text from other articles and, although I did do some changes to merge the stuff together, what is left certainly merits a major revamp. And a reduction in size if it is to be left in this article. There needs to be care in choosing references that claim to understand the so-called "frequentist" approach, as the "frequency probability" interpretation of probability is not needed to justify the classical approach to hypothesis testing. I do suggest that you get a wikipedia id, rather than using an IP address, as such contributions are taken more seriously, and that you learn to sign your contributions on these Talk pages. Melcombe (talk) 09:56, 16 March 2011 (UTC)

One-sample chi-squared test

Latest comment: 12 years ago2 comments2 people in discussion

I think this row in the table is messed up: it appears to be a standard test on variance but then the Assumptions or notes column looks like the assumptions for a test on the distribution of a categorical variable. 130.127.186.244 (talk) 12:32, 8 November 2011 (UTC)

Well spotted. I've deleted that row completely for now (from the table in the Statistical hypothesis testing#Common test statistics section). The formula was for the Chi-squared test#Chi-squared test for variance in a normal population which isn't commonly used, in my experience at least. I'm tempted to remove "Two-proportion z-test, unpooled for | d0 | > 0" as well for the same reason. Comments / objections? Qwfp (talk) 17:31, 8 November 2011 (UTC)

Issues from 2012

Duplicate references

Latest comment: 12 years ago2 comments2 people in discussion

References 20 & 25 appear to be duplicates. Is one citation better than the other? 159.83.196.1 (talk) 23:41, 4 February 2012 (UTC)

Now fixed. Melcombe (talk) 01:45, 5 February 2012 (UTC)

Sundry gripes - maybe opportunities for improvement

Latest comment: 12 years ago5 comments3 people in discussion

On January 30, 2012 the opinions of this article were:

Trustworthy: 3.9/5; Objective: 4.3/5; Complete: 4.0/5; Well-Written: 2.6/5.

In summary, the article has a lot of merit, but is poorly written?! The article suffers from several deficiencies:

Reader expectations are inconsistent.
- The Examples are suitable for the novice.
- The Definition of terms and The testing process are taken from texts for PhD candidates.
- The Controversy section is most suitable for graduate school seminars, not Stat 101.
The sections are unbalanced.
- The Interpretation and Importance sections are weak.
- The Definition of terms and Controversy sections are too long.
The sections are not integrated.
- The examples and test statistics are not further discussed in the text.
- The defined terminology is little used.
- The testing process uses regions, while p-values are used elsewhere
The article does not have the proper supporting figures.
- Examples & Definition of terms would benefit.

159.83.196.1 (talk) 22:06, 7 February 2012 (UTC)

I am inclined to delete the following definitions: Similar test, Most powerful test, Uniformly most powerful test (UMP), Consistent test, Unbiased test, Conservative test, Uniformly most powerful unbiased (UMPU). They are Stat 801 rather than Stat 101 material. Only one of the definitions leads to another article. Only one has a reference. I cannot utilize the definitions in other sections, provide examples, etc. This article is better with less. Comments?159.83.196.1 (talk) 01:52, 16 March 2012 (UTC)

I agree with deleting those with the exception of "Conservative test", as I seem to come across that concept reasonably often in my experience as an applied statistician, while the others seem of more theoretical interest. Qwfp (talk) 11:38, 16 March 2012 (UTC)

You should keep "most powerful" and "uniformly most powerful" tests, which appear in calculus-based statistics texts, such as Hogg & Ellis, or the Florida book (Schaeffer, Mendenhall, and a 3rd), along with the Neyman--Pearson lemma. Kiefer.Wolfowitz 21:42, 17 March 2012 (UTC)

Done.159.83.196.1 (talk) 22:43, 24 March 2012 (UTC)

Asterisks

Latest comment: 12 years ago3 comments2 people in discussion

What do the asterisks sprinkled throughout the Name column of the Common test statistics table mean? Examples: many, but not all, of the two-sample tests.159.83.196.1 (talk) 00:36, 15 February 2012 (UTC)

These originate with this version, http://en.wikipedia.org/w/index.php?title=Statistical_hypothesis_testing&oldid=293990353 (June 2009), of the article. Neither that or immediately following edits gave a meaning for * anywhere close to the table, so far as I can see. Melcombe (talk) 03:07, 15 February 2012 (UTC)

Deleted.159.83.196.1 (talk) 21:54, 21 February 2012 (UTC)

Clairvoyant example

Latest comment: 12 years ago2 comments2 people in discussion

Clearly, the clairvoyant example is badly formulated: the null hypothesis should be that P=0.25 and not that P<0.25. Indeed, a common argument in ESP studies is that getting all of the answers consistently wrong (far outside of chance) is just as sure a sign of clairvoyance as getting them all right. I don't know how to fix this example myself, though.linas (talk) 19:03, 11 April 2012 (UTC)

The example specifically addresses your issue, "But what if the subject did not guess any cards at all? ... While the subject can't guess the cards correctly, dismissing H0 in favour of H1 would be an error. In fact, the result would suggest a trait on the subject's part of avoiding calling the correct card. A test of this could be formulated: for a selected 1% error rate the subject would have to answer correctly at least twice, for us to believe that card calling is based purely on guessing." You clearly disagree with the editor of the example.159.83.196.1 (talk) 21:30, 17 April 2012 (UTC)

All wrong

Latest comment: 12 years ago4 comments3 people in discussion

I have commented the part in the Clairvoyant example about the situation where all cards are identified wrong. What goal does it serve adding this? Nijdam (talk) 12:36, 29 April 2012 (UTC)

See section "Clairvoyant example" almost immediately above on this talk page.Melcombe (talk) 23:21, 30 April 2012 (UTC)

There would be some advantage to providing both one-tailed and two-tailed solutions to answer the question raised above. All current examples are of one-tailed tests and the sole remaining statement implies that the use of one-tailed tests is misleading.159.83.196.1 (talk) 22:31, 1 May 2012 (UTC)

The cumulative probability for c=12: 0.0107 (by spreadsheet). Requirement: less than 1%. By the definition provided, c=13. Region of acceptance: 0-12. Region of rejection: 13-25. Resulting significance level: 0.0034. The cumulative probability for c=12 was confirmed by: http://stattrek.com/online-calculator/binomial.aspx 159.83.196.1 (talk) 19:24, 5 May 2012 (UTC)

Split Controversy?

Latest comment: 11 years ago2 comments2 people in discussion

The controversy section has grown again. While the main topic remains of tolerable length, the controversy section probably belongs in a separate article.

Hypothesis testing and its controversies are typically covered in different texts intended for different audiences. Hypothesis testing is not controversial in all fields, but has been intensely controversial in some.

It has proven impossible to limit the size of the section.159.83.196.1 (talk) 20:57, 7 December 2012 (UTC)

On the one hand, the controversy is an entire field of study unto itself and deserves it's own page. On the other, an entire field of study exists criticizing NHST (not necessarily hypothesis testings per se), and this is important information that should be emphasized to any new statistics student who may only visit the main page and fail to follow the controversy link.

NHST continues to be conveyed in textbooks as an universally accepted (and objective) method on which all experiments should be based, and it is of upmost importance that people coming to this page are made aware that this is not true. The textbooks are not giving a balanced view, while wikipedia has the opportunity to do so, which may be most effective by not creating a separate page. Personally I think it should have it's own page.207.229.179.97 (talk) 08:11, 8 December 2012 (UTC)

Issues from 2013

Bayesian Fanaticism in non Bayesian Methodologies Articles

Latest comment: 11 years ago39 comments4 people in discussion

In the Alternatives to significance testing We can read that:

Most statisticians agree that:

The "null ritual" hybrid that is commonly used is fundamentally flawed. No researcher should ever perform this ritual.

P-values can be a useful (although confusing) metric if no prior information is available.

Neyman-Pearson hypothesis testing is the best way to control for long run error rates if certain criteria can be met.

Bayesian methods should be used if sufficient prior information is available.

Researchers should view the different approaches as components of a "toolbox", rather than attempt to use the same tool for all purposes

I find extraordinary how shamelessly the Bayesian crowd maim article after article when the methodologies explained are not Bayesian.

Claiming that "most statisticians" agree in that list of statements is down right false, so I added a citation required that, unless it goes "Most Bayesian Statisticians Agree" such citations are no where to be found.

I don't really know what to do, I would love to recommend Wikipedia to peers and students but every time I see this I have to think twice or warn them about the Bayesian editing.

Is the any procedure to mark non Bayesian articles as possibly biased by Bayesians? Or ask them to please not to trash every Wikipedia entry that does not follow their path? The situation is becoming ridiculous. — Preceding unsigned comment added by Viraltux (talk • contribs) 11:17, 27 January 2013 (UTC)

Please explain what is wrong with those statements so that the page can be improved. — Preceding unsigned comment added by 207.229.179.97 (talk) 13:27, 27 January 2013 (UTC)

The statements are personal opinions without any citations whatsoever, specially the one saying "most statisticians agree"; saying that "Bayesian methods should be used if sufficient prior information is available" implies that this is the only proper way to proceed with prior information which if false. Saying that p-values "can be useful (although confusing)" only shows again ill intention from the Bayesian guy that wrote this section; the p-value is an ubiquitous, simple and fundamental statistical concept in many areas on science, including the p-value offered at CERN when announcing the discovery of a new particle.

The list goes on but this is not a problem just in this Wikipedia article, somehow these guys have managed to alter virtually every non Bayesian article to promote their school; sometimes with down right false statements some others with commercial ad style sections on how good Bayesians alternatives are.

Objectivity is going out the window with this issue, somehow we should reach an agreement with these guys and ask them to please properly label these opinions for what they are. Maybe sections like "Bayesians point of view" or "Bayesians Criticisms" would be appropriate. Presenting these opinions are facts is just wrong and only drive people trying to learn about the subject to confusion or error. — Preceding unsigned comment added by Viraltux (talk • contribs) 18:57, 27 January 2013 (UTC)

Yes and the CERN use of pvalues is controversial since they sampled to a foregone conclusion. Further, the way the news reported on it is a good example of how confusing p values are. Fortunately, unlike other fields, particle physicists use realistic null hypotheses and very small pvalues so that result is unlikely to be a false positive anyway. Can you please provide one reference that goes against each of those claims so that I can read it? They are currently supported by the 40 or so references in the section, which I agree may be a biased sample. — Preceding unsigned comment added by 207.229.179.97 (talk) 22:02, 27 January 2013 (UTC)

NO, it is not controversial. It is only controversial in the fanatic mind of a Bayesian. That incompetent journalists report incorrectly what p-values are is a laughable argument against them; should we stop using Calculus if journalists do not understand it either? Anyhow, serious competent media like the BBC did explain it properly as anyone can check here: Higgs boson-like particle discovery claimed at LHC

Asking me for references to prove someone else statements is, again, laughable. How about if I say "God exists" and if anyone complains I just do like yourself and say "Can you please provide one reference that goes against this claim? No, the I leave that statement there until you do". You say most statisticians believe so and so YOU prove it.

What do you make of this quote, it seems to support statement # 4 above: "Perhaps because of lack of clarity in some of my papers, certain authors appear to be under the impression that, for some reason, I condemn the use of Bayes' formula and that I am opposed to any consideration of probabilities a priori. This is a misunderstanding. What I am opposed to is the dogmatism which is occasionally apparent in the application of Bayes' formula when the probabilities a priori are not implied by the problem treated and an author attempts to impose on the consumer of statistical methods the particular a priori probabilities invented by himself for this particular purpose."-Neyman, J (1957). ""Inductive Behavior" as a Basic Concept of Philosophy of Science". Review of the International Statistical Institute 25 (1/3): 7–22.

I am a Statistician, I do not consider myself Frequentist or Bayesian, and by all means using Bayes' Theorem when I need it does not make me a Bayesian. In fact, Frequestist is the title Bayesian crowd grant to anyone not professing their religion. I completely agree with Neyman in his statement. It is just a sad situation that these guys take their dogma to an extend that we could describe as trolling Wikipedia.

I don't know, maybe talking to them? asking them nicely to make separate articles with all their criticisms? It just seems to be their way or highway. We've reach a point in the Wikipedia non Bayesian statistical articles that I don't know anymore what is a legit criticism or Bayesian propaganda.

Ok, lets start with the first one, I have never seen a statistics paper that disagrees with this statement but have read many, many that do agree (for a small subset, with a "bayesian" bias, I agree, see those cited throughout this article): "The "null ritual" hybrid that is commonly used is fundamentally flawed. No researcher should ever perform this ritual." I would really like to know what the defense of this practice is so that we can include it. — Preceding unsigned comment added by 207.229.179.97 (talk) 21:16, 28 January 2013 (UTC)

Also, with regards to "Most statisticians agree...", I would like to clarify my position on this. My position is that this statement is supported by the 50 or so citations(and there are more out there, thousands upon thousands more) currently used here(roughly #25-75). If noone is willing/able to find a substantial number of references that contradict the positions of the ones already present, I don't see the point in just adding a long string of citations after that statement but we can do that if you feel it is necessary. 207.229.179.97 (talk) 21:44, 28 January 2013 (UTC)

To be honest I did not know what you meant by the "null" ritual, so I checked the article and read "...this phenomenon has come to be known as the "null ritual".[43]..." Fair enough, so I checked the reference and turns out that took me to an article where the author Mr. Gerd Gigerenzer claims he coined the term and he is a psychologist linking Sigmund Freud's theories to statistical methodologies. He says things like Fisher's approach "gets papers published but left with feeling of guilt".

All right, I don't know whether to laugh or cry, So what is next? Non Bayesian statisticians are sexually repressed? right now I am laughing but this might change soon. Listen, there are many ways people can abuse procedures and make mistakes but, would you consider reasonable that someone would edit Bayesian articles saying "People can make mistakes and abuse Bayesian procedures, oh, by the way, there are Non Bayesian alternatives... link link link"?

Bayesian Methods are mathematically sound, Non-Bayesian Methods also are mathematically sound. To dedicate nearly HALF of the article to artificial controversies fabricated by Bayesians is, sorry to say, ridiculous. These controversies deserve a separate article where people can have fun discussing in that sand box about philosophical issues. But Bayesian crowd does not care, they keep trashing article after article. Hell, even the first line in this article is a link to the Bayesian alternative!. Really, these guys have no shame.

Also, so you paste 50 citations of, I don't know, more psychologists? and somehow this is a prove of most Statisticians agreeing with "Bayesian methods should be used if sufficient prior information is available"? I hope this statement does not come as a result of Bayesian inference because my pvalue is pretty low on this one.

Anyhow, now seriously, this is a problem, Bayesians are turning non Bayesian articles into a mess. I don't know if they are very young phycologists with lot of time to mess with Wikipedia or just a few very determined Bayesians but, for me or anyone else trying to fix all these things is a full time job. Not to mention the post/revert-post fight that might follow. So I just will ask to Bayesians to please focus in your articles, improve them, make Wikipedia proud of them and non Bayesian statisticians jealous.... But please, just stop trashing anything non Bayesian. I mean... What can I say. — Preceding unsigned comment added by Viraltux (talk • contribs) 01:30, 29 January 2013 (UTC)

Please help improve the page by contributing references that support the claims you are making. Honestly I've been trying to avoid any ad hominimom attack but it is you that appears to hold religious beliefs regarding approaches to statistics. I am not saying you do, but it is how you appear due to your frustration. I have found that all the criticisms found in that section have merit, and run my own simulations to verify this. Have you done this in order to justify your opinion? If so please share it with the rest of us. Quite frankly I am horrified at statistical practices that are often found in journals and think this is a huge problem. From the sources cited here and what I have read elsewhere, it certainly appears that is widely recognized that null hypothesis testing as commonly done is not what was promoted by Neyman, Pearson, nor Fisher, nor any of the statistics literature that I have ever come accross... it is widely recognized that even when used correctly it is not the best tool for the job all the time (eg prior information available)... it is widely recognized that there are widespread misconceptions about the meaning of a p value amongst researchers, students, the public, and possibly even statisticians... it is widely recognized that the neyman-pearson strategy is a great tool if you can determine all the parameters of the study from the outset and the various assumptions of the tests used are met to a reasonable degree. I say again, there are dozens of sources cited in this article and thousands more available that support the first claim (Although they may use different terms, the message is that what Gigerenzer called the "Null ritual", shouldn't be used). Can you provide some references that disagree so that the page can be improved? 207.229.179.97 (talk) 02:04, 29 January 2013 (UTC)

About providing references, check chapter eight in Statistical Inference. You will find a detailed mathematical description of these tests and also Bayesian tests. Interestingly, no mathematical flaws are described of any kind. Maybe you need Freudian psychologists for that.

I meant no offense by laughing but, a psychologist making Freud comparisons and coining terms like "Null Ritual" as a source for a mathematical complain, well... you have to admit it is a bit hilarious. Nonetheless, I would accept you saying I am holding religious beliefs on statistics if, for example, I would take one of my personal opinions like "a prior distribution is a philosophical flaw for a scientific study" and then I would go to trash every Bayesian article with it. Yes, it is my opinion, but not for a second I would agree with anyone peppering Bayesian entries with it and links to Non Bayesian articles; it's a matter of respect. Also, I don't know what kind of simulation have you possibly run to prove "most statisticians agree...", but I am interested to know what exactly you think you've proven, could you share your simulations with us?

May I also know your background? I only see your IP and it'd be nice to know where you're coming from. I myself am a Statistician (not frequestist nor Bayesian; so much for my fanaticism) also a Computer Scientist and Master in Operation Research. So firstly, you cannot dump a list of statements and then ask someone else to prove them wrong when you yourself don't prove them right We should agree on this, okay? Secondly, some people misusing a tool does not make it the tool's fault. Would you edit the article Differential equations with a a section "Controversies" occupying half of the article with it and then complaining about how general public is confused by these mathematics? Would you mention how difficult is to students? These articles are suppose to be mainly mathematical ones, instead they've become a theater for Bayesians rendering their, again, opinions.

When you say it is widely recognized it is not the best tool when prior information is available, well, maybe comes to a surprise for you but you do not need Bayesian statistics to make use of prior information. This is just a Bayesian mantra. Maybe is widely recognized among Psychologists? I'm just realizing these guys might have some Freudian problems with anything non Bayesian. I find this interesting. I mean Physicists at CERN, arguably some of the most intelligent persons in the world, have no problems with p values but, hey, maybe turns out that psychologists know more math than them.

When you say it is widely recognized that there are "misconceptions about the meaning of a p value amongst researchers, students, the public, and possibly even statisticians...". I care about the public knowing what a the pvalue is as much as I care about them knowing what differential equations are for. How this can possibly be a critique for pvalues as a tool for science is beyond me. But you say "even statisticians" well.. maybe for Bayesians, I've never met an statistician confused with what a p-value is, on the contrary, it is a great concept that beautifully summarizes an experiment.

Students have every right to be confused about anything; they are students, they are suppose to have doubts and make questions about p-values. And about researchers ignoring what a p-value is, well, if journalists at BBC know what a p-value is then researchers have no excuse not to know. I mean, we should expect researchers to know better than journalists in the BBC, don't you think?

What's the deal with psychologists anyhow? Is it possible that all this trashing in Wikipedia is coming from them instead professional Bayesian statisticians?

One final comment, Wikipedia is not supposed to be used to promote personal theories This article talks about the "Null Ritual" like if it was a standard procedure that all statisticians know about it when, actually, it turns out is simply a term coined by a psychologist to describe... well... I don't fully know because the paper was not free to see. So personal theory, with personal coined term to promote a personal paper from a non statistician (psychologist) that, on top of that, you need to pay to see what he really complains about with this "Null Ritual". Hello? — Preceding unsigned comment added by Viraltux (talk • contribs) 30 January 2013 (UTC)

Look I think I may have been a little aggressive there. Really, we are all on the same team. We all want to find a cure for cancer, find the truth, etc. What I have learned, has me convinced that the current methods being used to assess the viability of research are totally inadequate and when combined with other factors (publication bias, etc) may have even resulted in us wasting decades needlessly chasing down false positives. I would love to be proved wrong. The brunt of it does not even have anything to do with the bayesian vs frequentist debate and I think that is reflected in the majority of what has been added to this section.207.229.179.97 (talk) 09:39, 29 January 2013 (UTC)

You are right, we are in the same team. That is why it is important to realize that all those flaws you talk about are human not mathematical. If for some crazy reason only Bayesian statistics were allowed (and I think some psychology [oh surprise] journal has done something like that) researchers will find other ways to cheat. They will keep not publishing negative results and cancelling experiments when they begin to show not much hope. They will keep doing data dredging. They are humans and science comes second after prestige, recognition or money... But to blame the tool is naive, and to promote other tool implying that all problems will be solved and the cure of cancer will come sooner is, at the very least, ill informed. — Preceding unsigned comment added by Viraltux (talk • contribs) 10:34, 29 January 2013 (UTC)

I'm not sure you have even read the section... The issues discussed are not mathematical, they are logical. The math always follows from a set assumptions, the problem is that many assumptions made very commonly do not hold in practice. And this belongs on wikipedia because it is not just a few people misusing a tool. Misuse of the tool is ingrained in the entire modern culture of science. It is encouraged by those who write statistics textbooks, and enforced by journal editors and funding agencies. The reference you provided is not available online (at least I couldn't find it), so is not great for a wikipedia page. Anyway it is an introductory textbook so I doubt it contains any in depth analysis.

Please read some of the citations included here and offer up accessible references that counter each point made. Just saying, "these (non-statistician) guys (who I disagree with) aren't credible (psychologists) while these other (non-statistician) guys over here (who I agree with) are super smart", is not enough. Stop thinking the "bayesians" are persecuting you. This is just strange anyway since the "bayesians" are the current minority. I mean we don't even seem to be able to agree that Null Hypothesis testing as is commonly (almost everywhere) performed is a hybrid of Neyman-Pearson and Fisher. Can we agree on that? 207.229.179.97 (talk) 21:23, 29 January 2013 (UTC)

A discussion richer in content and poorer in opinion would be more welcome. The frequentist/Bayesian divide is not the only issue. It may not even be a major issue.159.83.196.1 (talk) 22:43, 29 January 2013 (UTC)

Agreed, what I was taught and observe throughout the biomed literature barely has anything to do with "frequentist" philosophy, and it goes beyond just human fallibility as these practices (eg nill null hypothesis that is always false) are actually cultivated amongst burgeoning researchers. I can even remember being confused when I was first taught it then was so busy I just accepted it assuming it must make sense somehow since everyone did it. But it is just nonsense. The goal of the section is to highlight this, not argue "bayesian vs frequentist". That is a separate, actual philosophical disagreement that has been conflated with the more important issue. If there are actual references supporting this practice I would like to see them added to the page.207.229.179.97 (talk) 00:21, 30 January 2013 (UTC)

you said, and I quote "I have found that all the criticisms found in that section have merit, and run my own simulations to verify this." now you say and I quote again "I'm not sure you have even read the section... The issues discussed are not mathematical, they are logical." So, an honest question, what exactly have you been simulating to verify something that now you say is not a mathematical issue? Can you show us please?

That introductory book I mentioned is recommended all over the world for courses in inference and the prestige of their authors is huge among statisticians. It was not a reference for Wikipedia though but a reference to support my argumentations. On the other hand, the reference to the Null Ritual is not free. Do you agree with me it is not a good reference for Wikipedia either? Also it is a personal description of that particular psychologist of an alleged problem that up to this point, since I don't feel like paying to find out, I am not sure what the problem is.

I didn't mean to diminish the intelligence of any psychologist, I am sure there are many geniuses among them. I mentioned though physicists at CERN not having problems with p-values because, naturally, the background in mathematics of physicists should be far superior than psychologists. So you might agree with me that attacks to the p-value concept coming from a community not directly involved in mathematics have less weight that the acceptance of this concept from scientific communities heavily involved with mathematics (physicist, chemist, engineers,... etc).

The word Bayesian appears 20 times in an article that has nothing to do with Bayesian statistics. Not to mention all the times recommending not to use this or that non Bayesian procedure. Is this true or this is me being paranoid? But I am going to give you something, the feeling that those trashing the article are not Bayesian statisticians but instead psychologists is growing stronger.

The mathematics in the hypothesis testing are clear. The discussions between Fisher and Neyman-Pearson were philosophical, not mathematical. Fisher accepted the mathematics of Neyman-Pearson as a way to evaluate the power of the test and establish adequate sample sizes but no disagreement was in the math. You say that researchers use this test in an erroneous way and that might very well be so, but the reference about the "Null Ritual" is not free and I am not going to comment on something that I have to pay to access it, I am sorry.

Anyhow, if your complain on the "Null Ritual" is about researchers using a 5% significance all the time without considering the nature of the experiment at hand, well, how is this the test's fault at all? This only shows that these researchers are incompetent in what they are doing or, most likely, they have ill intentions. Fisher chose a 5% because he was working in a farm and for his studies in biology that 5% made sense. Physicist at CERN chose a significance of a hugely much smaller 5 sigma before announcing a discovery. So what are you suggesting? That if incompetent researchers stop using hypothesis tests they become competent all in a sudden? That somehow this is the tool's fault? That researcher manipulating experiments to publish won't manipulate them anymore if they go Bayesian? That we are delaying the cure of cancer if we keep using hypothesis testing? All these have nothing to do with the tool, do you want to write an article on how scientist misuse statistics on their advantage? great! But let's stop blaming the tool.

Bayesians (or Psychologists I don't know anymore) have been tampering non Bayesian articles without mercy, maybe other guys have time to get into this battle but I just don't feel like it. Anyway, if anyone makes it into this talk page, please, get a good book on statistical inference, understand the process to use hypothesis testing efficiently and everything will be all right, just ignore the Bayesian/Psychologists fear mongering about non Bayesians methods. — Preceding unsigned comment added by Viraltux (talk • contribs) 00:37, 30 January 2013 (UTC)

I can't really continue this discussion because you keep going on rants without reading the page. The null ritual and the problems people have with it is described very clearly on the page and it is not "the default 5% level". The fact that you said that clearly indicates that you have not read the page, and you may be unaware of what is being taught to students and what is going on in real life. Thank you for adding the responses tho, if that was you.207.229.179.97 (talk) 04:21, 30 January 2013 (UTC)

Firstly; According to the article the Null Ritual goes "2. Use 5% as a convention for rejecting the null... 3. Always perform this procedure." So yeah, I read the page.

Secondly; this is the third time that I ask you to show us the simulations you have done that prove the list of statements "most statisticians agree with...". I am asking because I really am interested; if there is a flaw unbeknown to me in the procedure and turns out that you can prove it with a simulation I'll be very grateful to you for showing me so. I mean it.

Thirdly; My "ranting" is not even close in magnitude to the confusion that the "Controversy" section is adding to the article. Any student (those that you seem to care so much for) reading the article will leave with the impression that the only sane thing is not to use these tests but to follow your Bayesian advice (checking the history of the article it seems is you being the editor)

Finally; Seems it seems you've taken possession of the controversy section in this article, and since you imply you know "what is being taught to students and what is going on in real life", could you share with us your background? I'd be surprised if you are related to any field in mathematics, but I might be wrong. So, are you? Anyway, just let me tell you that I believe that what you are doing is with a good intention Mr. 207.229.179.97 (if this is you), but you are giving a wrong impression on the validity of these tests claiming that "most statisticians" agree with you. Here is one that does not.

So I don't know exactly how the process goes but I am going to try to flag the article as "possibly biased" so that people can at least check the complains in the talk page and, hopefully, reach an agreement in the future for a better editing of the article.Viraltux (talk) 11:37, 30 January 2013 (UTC)

I am more than willing to share the script that runs the simulations I mentioned. And would actually love for someone looking to find problems with it to take a look. Tell me the best way to do so for you (posting a bunch of code here would be bad). Also I can't say I am really a "bayesian", but I have gravitated towards that way of thinking after observing the complete failure of what I was taught and what I see being used day-to-day, which once again is not "frequentist", it is something else and the majority of criticisms I have found come from the bayesian camp. I hesitate to give you credentials because 1) I have none that matter, 2) You appear to place far too much emphasis on them, and I believe attitudes like this may be at the root of the problem. How should I post the code? 207.229.179.97 (talk) 12:11, 30 January 2013 (UTC)

Ok, I will continue the page discussion a bit: This is the logical problem: "3. Always perform this procedure.", along with #1 "Use Nill Null Hypothesis", then "if null rejected accept research hypothesis as true", which are really the major problems. Please contribute to the page in a way that stops people from doing this. That is really all I wish to accomplish here.207.229.179.97 (talk) 12:20, 30 January 2013 (UTC)

You can paste the script in your talk page, but please describe first what the simulation does and what you think it proves.

And talking about root of problems, when you ask me to "contribute to the page in a way that stops people from doing this. That is really all I wish to accomplish here." I think you miss the purpose of this or any article, which is to explain what the subject is about. This should be a mathematical article, instead it seems that "all you wish to accomplish here" is to enforce good practices which, oh well, since according to you these tests are a "complete failure" then somehow most of statisticians agree with you in go Bayesian.

So anyhow, show me the script with an explanation and maybe I can convince you, or you can convince me. Who knows, maybe I convert into Bayesianism now... that'd be funny Viraltux (talk) 14:36, 30 January 2013 (UTC)

Im not gunna put 1000 lines of code on my talk page, what about pastebin?207.229.179.97 (talk) 20:41, 30 January 2013 (UTC)

Here you go, its written in R. http://pastebin.com/xxAvzwJ3. Here is an example of the results: http://s2.postimage.org/tnj4rvqnd/worst_case.png. I called it worst case but really the situation simulated here is really common. Please report back bugs. 207.229.179.97 (talk) 20:55, 30 January 2013 (UTC)

Summary: It is a monte carlo simulation that performs t tests on samples from two groups. The settings at the top allow the user to adjust the various properties of the population and ways in which the data can be assessed. It allows you to simulate common scenarios found in practice, such as sampling from a bimodal distribution, bimodal+ skewed, the presence of of small "practically insignificant differences", etc. It simulates researcher behaviour such as also performing a non-parametric test if the parametric is "not significant", taking another set of samples if the first is found to be not significant, performing multiple experiments and only reporting the results of "significant" outcomes, and "dropping outliers". The results of running it with multiple scenarios show that these common practices often leads to false positive rates in the range of 40-90%. It is not meant to turn anyone "bayesian", and is not a criticism of "frequentists". It is a criticism of the "null ritual" that researchers are performing all the time in real life. It is meant to highlight that by violating the assumptions of the tests that are commonly used, it makes it very difficult to draw any accurate inference from the outcome.207.229.179.97 (talk) 22:55, 30 January 2013 (UTC)

Okay, I think now we are going somewhere. You say "by violating the assumptions of the tests that are commonly used, it makes it very difficult to draw any accurate inference from the outcome" And that is absolutely true. But you just simulated some of the many ways researcher commit fraud. Some other ways are multiple testing, publication bias, using unvalidated models, data dredging, aiming statistical significance and ignoring significance itself, hiding uncertainties, extrapolation... You can simulate them all if you feel like it. But this is not a ritual this is fraud.

Those engaging in these practices (some people call it p-hacking) are basically fraudsters going after money, recognition or both. How on Earth can anyone think that human weakness is caused by these tests? How on Earth using Bayesian methods is going to make these fraudsters to stop tampering with their experiments for their own benefit? You really have to see the difference between a human flaw and a methodological flaw. These researchers know very well they are cheating.

Listen, how about this, how about if we create an article name p-hacking and we list there all these fraudulent practices you talk about? Then we mention in the articles "p-value" and "hypothesis testing" that researchers abuse these tests for their own benefit in several ways and we link that comment to the p-hacking article. Maybe even a more general article Fraud in Statistics Methodologies and then making "p-hacking" a main section.

If we do this we can focus in the mathematical merits of the p-value an hypothesis testing in their articles and improve them in that direction. We remove all the "Bayesian good, frequestists bad" comments and we simply comment that Bayesians also have their own methodology for hypothesis testing and we link the comment to those methods. How about this?Viraltux (talk) 23:54, 30 January 2013 (UTC)

I disagree is that these researchers know it is cheating/fraud. I think in many fields, most are not even aware that there is a problem. They think they are performing statistical hypothesis testing as advocated by statisticians because it is what they were taught to do (e.g always use nil null hypothesis). Secondly it is common to hear things repeated like "t-test is robust to violations of the normality assumption". Yes someone simulated a small set of violations and it was robust to them. This does not mean that the test is robust to their particular violation, which I suspect is more often than not (at least in biomed fields), a multimodal distribution, which the t test is not robust to especially when also dropping what are assumed to be outliers. This type of violation is not fraudulent. Once again, in my experience most researchers believe what they are doing is called "Statistical Hypothesis Testing" and recommended by statisticians because it is what they were taught in class (using near perfect data) and everyone does it... it is not, it is something else, which I think is aptly called the "null ritual". I would be in favor of making a separate page about these issues and linking to it from here. 207.229.179.97 (talk) 00:35, 31 January 2013 (UTC)

Okay, I looked around in Wikipedia and I found this article Scientific_misconduct. How about in we create a section there for Statistics and a subsection for p-hacking? We can also mention that some researchers might engage in p-hacking unknowingly is something wrong (though, to be honest, I don't believe for a second that a researcher; someone that makes a living out of science, does not know that checking the premises for a test is mandatory) Viraltux (talk) 09:26, 31 January 2013 (UTC)

You aren't aware of what is going on, all those things I mentioned are rampant and fill the highest tier journals. Go follow up on a paper you hear a news article about sometime.207.229.179.97 (talk) 11:26, 31 January 2013 (UTC)

And what is the solution? Oh yeah, Bayesian statistics. Okay, Game over. Viraltux (talk) 11:56, 31 January 2013 (UTC)

I think no statistics at all would be better than it is now. 207.229.179.97 (talk) 12:45, 31 January 2013 (UTC)

Biostatistics have special problems. Good statistical practices and ethical constraints are often in conflict. As a consequence few studies have large samples. A casual reading of the newspaper medical section shows a lot of conflicting statistical results. A Bayesian (or meta-analytic) attitude is a rational response to the current mess. (This the view of a patient rather than a statistician.)159.83.196.1 (talk) 21:24, 31 January 2013 (UTC)

Primer of Biostatistics (Glantz, 1993): In one collection of 71 randomized clinical trials the number of patients was low enough that there was only an even chance of detecting a treatment that reduced mortality rates by 25%.(p 184) "[Because of the woeful statistical practices of the field, you] can rarely take what is published or presented at clinical and scientific meetings at face value."(p 392)159.83.196.1 (talk) 20:53, 1 February 2013 (UTC)

Yes, pretty much every study published in the biomed field is woefully underpowered even using relatively lenient significance levels and testing the easiest possible null hypothesis to get evidence against (no difference at all between groups, which is almost always false). However, statistical significance is often necessary to get others in the field to take your results as interesting and worth publishing. Since number of publications determine future funding and job security, the number of "significant" results you get is a proxy for your merit as a researcher. Thus, researchers who do not exhibit "significance seeking behaviour" (this includes using unrealistic null hypotheses) are less likely to attain success and prestige in academia and become less and less common. The next generation is then taught the methods of the earlier as if this is the way things should be done without even realizing there is a problem. Most are kept too busy to ever take an in depth look at aspects of the research that are outside their area of expertise (statistics). It is a system set up to spiral towards catastrophic failure to achieve its goals.

Under these circumstances, the end result (which may have already occurred years ago for many fields) of striving for an objective way to assess experimental results is that, perversely, all researchers do is measure how strongly each others opinions are held (eg prior probabilities). 207.229.179.97 (talk) 23:29, 1 February 2013 (UTC)

John P. A. Ioannidis has written a few papers (available on the net?) that illustrate how bad the situation can be. If NHST were applied to the ancient Journal of Alchemy, every last published result would be wrong. This result does NOT assume fraud.159.83.196.1 (talk) 21:36, 31 January 2013 (UTC)

I'd like to quote Mr/Ms 207.229.179.97

"I think no statistics at all would be better than it is now." 12:45, 31 January 2013 (UTC)

Just in case someone thought that naming this section "Bayesian Fanaticism in non Bayesian Methodologies Articles" was an exaggeration.--Viraltux (talk) 12:19, 4 February 2013 (UTC)

Much of science has progressed without much use for statistics, eg: "If your experiment needs statistics, you ought to have done a better experiment." If >90% of people misuse the tool then it may be better for everyone if we returned to the days when logic was relied upon more heavily. You may find this extreme since your job depends upon it but it is the truth. 207.229.179.97 (talk) 14:30, 4 February 2013 (UTC)

Fantastic quote from Ernest Rutherford, you should rush and tell people at CERN there is no need for p-values no more if they design a better experiment than the Large Hadron Collider. On the other hand Rutherford said also things like “All science is either physics or stamp collecting.” I wonder if you have anything better than jokes to support claims as "...your job depend upon it but it is the truth". I don't think so.

Just a Note Even if that quotation about statistics was not a joke (it was), he could not possibly be criticizing the so called "frequentist statistics" since turns out that Fisher et al. methodology appeared much later than when Rutherford got his Nobel Price in 1908. But why bother to check, right? And by the way, the "Bayesian Statistics" they did actually were around at the time Rutherford made this comment. Oh... well, irony.

You are literally using jokes to support your claims and, being generous, such claims can only be described as hugely misguided. I think at this point it is quite obvious you have no appropriate mathematical background to edit this article the way you do.

I would love professional Statisticians / Mathematicians to participate in the editing for an objective non-fanatic approach and expressing personal opinions for what they are. At the same time I'd like to beg editors with no mathematical background to restrain to do heavy editing of mathematical articles based on their ability to Google papers of questionable prestige and recognition.

Since, unfortunately, all the effort discussing this issue has lead us nowhere I'll go ahead and use my remaining energy to fix first blatant violations of Wikipedia rules like promoting personal theories / definitions (e.g. "Null Ritual"). If you revert the changes without offering prestigious references in the field of Statistics / Mathematics to support your claims I'll try to contact an admin for the article to be semi-protected. That would be unfortunate. --Viraltux (talk) 11:32, 5 February 2013 (UTC)

I wont stop you... I said at the beginning I had little formal background and it needed expert opinion. I suppose you should also contribute better examples of how to use NHST under real life circumstances. For example, add a section on how the CERN physicists specified a substantive null hypothesis. An example of: "the probabilities involved in specifying a significance test can be conditional probabilities (conditional on previous experiments)", would also be nice.207.229.179.97 (talk) 15:17, 5 February 2013 (UTC)

I just tried to find out the context of the Rutherford quote. Where did you find out it was a joke? Also, with regards to that I think you missed the point...but we are way off track now.207.229.179.97 (talk) 07:04, 6 February 2013 (UTC)

Another thing I thought of: Prestige is a very poor heuristic for correctness. Look at Nature, the most prestigious journal, but if you actually try to replicate anything published there (once again I can only say for biomed), you will find out its impossible. And it is just as filled with poor statistical practices as anywhere else. Going down the route of relying on prestige to determine which information flows is a dangerous one, but a reality of life. I see no reason why peer reviewed work should not be acceptable as references for a wikipedia page.207.229.179.97 (talk) 07:23, 6 February 2013 (UTC)

If the criticism is removed from the discussion of the null ritual, then two obvious alternatives are deletion or moving history to the Origins section.159.83.196.99 (talk) 18:55, 5 February 2013 (UTC)

Assessment comment

Latest comment: 16 years ago1 comment1 person in discussion

The comment(s) below were originally left at Talk:Statistical hypothesis test/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.

I'm adding this rating in the hope it may help attract knowledgeable editors. There is some reasonable material here but it needs substantial editing, both to make it clearer to beginners and to make sure it's technically correct. Perhaps could move the criticisms to another article titled something like "Comparison of statistical paradigms" with a brief summary here? --Qwfp (talk) 17:39, 29 January 2008 (UTC)

Last edited at 17:39, 29 January 2008 (UTC). Substituted at 22:06, 3 May 2016 (UTC)

Talk:Statistical hypothesis test/Archive 1

Issues from 2004

Untitled

Distaste not criticism

Unpooled df formula

References?

Can we clarify this article to make it more readable?

Issues from 2007

Sufficient statistic

What's inappropriate about the link?

question

perm test

Oops.

Citations

Organization

Issues from 2008

The Definition of Terms Section

One-Proportion Z-test not well-defined

Removed section

Tone and jargon

Howto tag discussed

Issues from 2009

Bayesian criticism section

True vs. un-rejectable

Popper or not Popper

Acceptance region

Issues from 2010

Section: An introductory example, null hypothesis

Section: Introduction/Examples

The criticism § is an embarrassment

Issues from 2011

Suggesting: Null Hypothesis Statistical Significance Controversy

Suggestion to Merge with p-value page

A few missing citations

Error in Example 2 - Clairvoyant card game

Disconnect between The testing process & Interpretation

Major edit discussion invitation

One-sample chi-squared test

Issues from 2012

Duplicate references

Sundry gripes - maybe opportunities for improvement

Asterisks

Clairvoyant example

All wrong

Split Controversy?

Issues from 2013

Bayesian Fanaticism in non Bayesian Methodologies Articles

Assessment comment