Wikipedia talk:Wikipedia Signpost/2019-08-30/Recent research

Discuss this story

  • (later note – my initial comments here overlook table 2 in the Adams et al. article, see later comments) The first article reviewed has this sentence: "Examining only sociologists with Wikipedia pages, men’s median H-index (27) is higher than women’s (22)".[1] This straightforward method doesn't give the result they were expecting – and indeed it might even suggest that more articles on male sociologists are what's needed if gender-blind notability fairness is the ideal. But fear not, once the authors use a "logged version of the H-index to adjust for a strong right skew in the H-index distribution" (I don't get it, but okay) and throw a bunch of other factors into a regression analysis, they get the result that women are being cheated out of articles after all. I don't know. Seems like a lot of degrees of freedom here and discarding a simple test with what's probably the most objective merit-based measure available (the h-index) in favor of an opaque regression analysis isn't necessarily convincing. Haukur (talk) 01:09, 31 August 2019 (UTC)Reply
    I don't get it, but okay – If you don't understand the reasoning of the authors, how can you be so confident that they are wrong? — Bilorv (talk) 08:32, 31 August 2019 (UTC)Reply
    The important sentence comes after that one: "Women’s estimated odds of having a Wikipedia page after taking into account differences in rank, length of career, and notability measured with H-index and departmental reputation are still 25 points lower than men’s." But "length of career" and "departmental reputation" are awful ways to estimate whether someone merits a Wikipedia article. The best measure (though it is of course, like everything, imperfect) is the h-index. But instead of sticking to that, the authors decided to go with a complicated composite measurement instead – one that I'd say is a lot less accurate. In the end this is not a persuasive analysis. Haukur (talk) 09:29, 31 August 2019 (UTC)Reply
    Also note how our current article says "Female and nonwhite US sociologists less likely to have Wikipedia articles than scholars of similar citation impact" which is not at all the conclusion reached in that research paper. We'd better correct this. Haukur (talk) 09:39, 31 August 2019 (UTC)Reply
    Having now read the Adams et al. article more carefully I must note that my comments above make too much of the sentence "Examining only sociologists with Wikipedia pages, men’s median H-index (27) is higher than women’s (22)" and don't take their Table 2 into account. I think there's a Simpson's paradox in the data. I'm going to think more about this and maybe see if I can get the data from the authors. Haukur (talk) 16:26, 31 August 2019 (UTC)Reply
    Any data about h-index is highly skewed by the difference in fields and other factors. Even within sociology, there may be vast differences: an h-index of 27 may be exceptional in some areas and ordinary elsewhere. Nemo 07:21, 1 September 2019 (UTC)Reply
    Good point. Conceivably, women might tend to work in subfields with lower citation averages. I've been thinking of possible explanations for the results here and there are so many possibilities. I've requested the data mentioned in footnotes 20-22 from the corresponding author. Haukur (talk) 08:36, 1 September 2019 (UTC)Reply
It seems inaccurate to say there's a "composite measurement" being used in the regression or to suggest that including multiple measures in regression analysis is less valid than just considering one of the measures (H-index) alone. H-index is very much part of the regression analysis and neither including the other measures in the model nor transforming the H-index values (by calculating the natural logarithm of each value) undermines it in any way. Indeed, the regression results in Table 3 indicate that H-index is (as you seem to expect!) the measure most closely related with being the subject of an article (the odds-ratio is quite large and statistically significant). The analysis supports the idea that U.S. Sociologists with higher H-indices are more likely to be the subject of EN:WP articles. It also suggests that female and nonwhite U.S. Sociologists are less likely to be the subject of EN:WP articles. These interpretations are mutually compatible. Aaron (talk) 16:33, 3 September 2019 (UTC)Reply
I regret making fun of the log operation, which really is fine. And regression analysis is fine too - though I stand by my criticism of "length of career" being a suitable indicator of notability. I'm developing a more nuanced take on this and the corresponding author has kindly promised to send me some data. Haukur (talk) 16:43, 3 September 2019 (UTC)Reply
Thanks all for the great discussion! Haukur: Please do let us know once you have received the data and would like to comment more on it. In the meantime, I have reverted the "correction" because 1) citation impact is actually a general umbrella term that includes the h-index and has the advantage of being easier to understand (also, it was still being used in the body of the review after your edit, so changing it only in the title seemed a bit pointless anyway), 2) "similar careers" seems overstating the results quite a bit, because career encompasses a persons entire professional trajectory whereas the measures used here only pertain to a specific moment in that career (plus its length) 3) it seems from the above discussion that the concerns about the interpretation of the result regarding the h-index have been resolved. (I do agree it's an interesting observation that the differences in the median go the other way, if I understood that correctly, but as Aaron points out, this might not undermine the overall result. In any case, log transforms are frequently used and even recommended by some as standard practice for data that only takes on positive values.)
I think a more interesting question might be whether WP:GNG or WP:AUTHOR could be a confounding factor here - there may well be many sociologists on the authors' list whose Wikipedia notability did not rest on WP:PROF but on media coverage or their authorship of (popular or academic) books. Regards, HaeB (talk) 19:00, 21 September 2019 (UTC)Reply
I thought it would be best to reflect the wording of the paper, which never uses the word "citation impact", though I take your point that it can be used to refer to h-index. The paper says "academic rank, length of career, and notability measured with both H-index and departmental reputation" which I think is reasonably summarized as "career" and I don't see what is gained by a switch to "seniority, institutional status, publication count, and H-index".
But whatever, all of this is a side issue since you're quite right that there must be other factors. Indeed, the very data in the paper shows that if you went by H-index alone and used that 100% fairly to pick out sociologists to write articles about then (assuming the same number of articles) you would get a higher ratio of white men then Wikipedia actually had. So the idea that Wikipedia has a bias against writing articles on female sociologists is really not, in my view, supported by this dataset. Haukur (talk) 20:08, 21 September 2019 (UTC)Reply
Related discussion: Wikipedia_talk:Notability_(academics)#Recent_study:_"Who_Counts_as_a_Notable_Sociologist_on_Wikipedia?". Regards, HaeB (talk) 19:00, 21 September 2019 (UTC)Reply
  • Gosh, this is a boring subject in the context of Wikipedia because much of it actually has little to do with Wikipedia per se, despite the (often extremely small) samples using WP as a means to "prove" their hypotheses. This is particularly evident in the Indian survey mentioned, which notes that the cultural issues mostly lie outside WP's ambit. One of the problems with being a major site on the internet is that WP becomes a mechanism for pursuit of agendas, regardless of its actual relevance to the overall issue. To paraphrase a BBC saying, "Other websites are available". And as I keep saying, it isn't our job to change the world but rather to reflect it in all its contrasting beauty and ugliness. - Sitush (talk) 03:22, 31 August 2019 (UTC)Reply
    True for some of the studies, but the first study might make us question our views of NPROF, the third can lead us to embracing a more careful writing style with regard to how we talk about men and women and the fourth is a damning indictment of the toxicity of our behaviour around contentious topics. And there are plenty more interesting conclusions which are relevant to an editor's regular editing patterns. — Bilorv (talk) 08:32, 31 August 2019 (UTC)Reply
    I'll quote from comments by Terri Apter, psychologist and Fellow Emerita of Newnham College, in her notes on The Human Stain by Philip Roth: "[The book is] also about the dangerous pleasures of outrage, what Roth called 'the ectasy of sanctimony'. Of course, we have to be aware of hate speech and embedded bias, they are problematic. But using your need to feel virtuous to tear others apart is also problematic". Too many of these studies start from a virtuous/sanctimonious premise. Good studies draw conclusions after the study, not before undertaking it. - Sitush (talk) 19:21, 12 September 2019 (UTC)Reply
  • Can someone with access to the 4th one (Safety and women editors on Wikipedia) give some more detail on what they're suggesting with "internal safe spaces". As in, parts of the encyclopedia with very high civility requirements? Nosebagbear (talk) 15:32, 2 September 2019 (UTC)Reply
    @Nosebagbear: The file may be located over here. Regards, WBGconverse 11:19, 3 September 2019 (UTC)Reply
    Thanks for the above WBG. So in terms of current safe spaces it refers to off-wiki online (fb, mainly) and offline (women-only edit-a-thons) areas. They moot the creation of an on-wiki women-only space, though they don't consider the fairly substantial issues with that (verification, reporting of misbehaviour, canvassing risks, as well as any disagreements on the fundamental nature of wikipedia) Nosebagbear (talk) 11:29, 3 September 2019 (UTC)Reply
  • Sitush I reviewed the breastfeeding study and it really surprised me to find such a poorly done piece. My take on it was so similar to your thoughts that you might like to read User:WhatamIdoing's page where we discuss it. Gandydancer (talk) 15:21, 12 September 2019 (UTC)Reply
  • Wikipedia is dynamic, citing a learned article from 2009 in a current work is not useful to describe the current state of WIkipedia or the community. All the best: Rich Farmbrough, 21:57, 20 September 2019 (UTC).Reply