User:Johnbod/Content and coverage; where are we 10 years into the project?

Content and coverage; where are we 10 years into the project? (This is a version of my talk given at the Wikimedia UK 2011 Conference in Bristol on April 16, 2011; sadly it has dated very little by 2020, except that editor numbers have stopped falling!)

(Note: This talk is a general review of the English Wikipedia, with some thoughts about what the project as a whole, and not especially Wikimedia UK, can do about it. Comments welcome on the talkpage.)

After ten years in existence, and about five as a major presence, Wikipedia has established itself as the most successful online reference source, with readership continuing to rise.

most popular pages, per hour
October 2001 most visited pages WP is 10 years old in January 2011. This is a blast of nostalgia, with tiny numbers for views per month and a massive US bias (bear in mind this was the month after the 9/11 attacks). Now the English WP gets about 10 million views per hour.
Alexa page on Wikipedia put display setting to "max"; recent figures

How has it done this? What are our “killer apps”?

Very wide coverage indeed
Consistent & easy to use appearance & functionality with links, categories etc.
Huge media coverage – really there’s no such thing as bad publicity
Some of the content is pretty good, and most of it more accurate than inaccurate

As a result of these:

Placed very high in search engine results
It’s all free

There was a key decision made in the very early days to avoid expert mediation or control of the content. This was clearly a crucial and correct decision, as the fate of Citizendium etc has shown. It is at the core of Wikipedia.

It gives us, at least in theory, a totally different way of producing content from a typical print reference work, where, at least in theory, an editor or editorial board makes a broad overall plan and then parcels out topics to specialists, telling them what to cover, and roughly how much to write on each subject. In fact the reality is often very different even, or especially, in the most prestigious print projects. Wikipedia is rightly famous, for good or bad, for the width of our coverage of “popular culture”. Here we have no single competitor that is anywhere close across the whole field. This may not be an area that many of us spend much time reading of editing, but we should remember that it generates the truly colossal viewing figures, millions of views per month per single article, that presumably play a large part in keeping WP articles on all subjects so high in google searches. That editors know their work will be actually read is a huge motivating force.

We still have areas of general weakness between subject areas. Some of these I’m conscious of:

Third world, above all Africa. Latin America too.
?Female- oriented subjects, maybe
Engineering and non-electronic/computing (or non-military) technology seems weak
Business and economics seem weak
In my own area of art history, apart from the first two, the decorative arts and antiques are very thinly-covered.

I’m sure there are plenty of others.

Turning to coverage within the typical subject area, I would make the following wild generalizations:

Our “top level” topic articles are generally weak, especially on abstract concepts.
Our articles on specific things, whether people, films, ships or species are often very good, and our coverage is usually very wide, even if many articles are pretty short.

So the more general and abstract the article subject, the worse our articles tend to be. This is the reverse of the pattern generally found in published reference works. It also means that some of our weakest work within a subject area gets the highest number of viewers. This is not good.

There’s no doubt it is typically harder to write a quality article on a big topic, and also that work on these articles is far more likely to get reverted, especially if it is not fully referenced. There are certainly many editors who are right to largely avoid the big subjects, and a lot more research or expertise is needed to tackle these.

Editor Shortage

That the number of editors is in decline is now well-known, and much discussed. Various initiatives are being undertaken to try and improve matters, but I think we all agree it will be a long and difficult process.

Recent coverage:

Analysis of editor decline concentrates on editor numbers, edit counts and new articles. It is much harder to assess quantitative, let alone qualitative, trends in the addition of article text. As we know, by no means all editors do much of this, which is fine in itself, but makes assessing the picture here much harder.

Signpost/2011-04-11 "Classifying newbies and veterans as experts, gnomes, vandal fighters or social networkers"

Edits >=	Wikipedians	% of editors	Edits total	% of content
1	3215701	100.0%	153400434	100.0%
3	1305870	40.6%	150331427	98.0%
10	630016	19.6%	146450864	95.5%
32	246988	7.7%	140017568	91.3%
100	101721	3.2%	132163057	86.2%
316	44638	1.4%	122334838	79.7%
1000	19906	0.6%	108638514	70.8%
3162	7979	0.2%	87739409	57.2%
10000	2328	0.1%	56854644	37.1%
31623	421	0.0%	25126480	16.4%
100000	35	0.0%	6323434	4.1%
316228	4	0.0%	1771555	1.2%

(late 2010 figures; for latest see here)

Graph - over 25 edits per wk 2001 to now; a different measure showing a similar decline.

My own strong impression, compared to when I joined in 2006 at just about the peak moment for edit figures, is that the decline in addition of article text is at least as strong as the general decline in editing, although the average quality of the edits made has improved considerably.

When I joined there was a general feeling that making a quick stab at a subject would leave something that others would, within some reasonable time-frame, come along and improve. This was not an unreasonable assumption, given that editing figures had risen consistently and rapidly ever since the project had begun five years earlier.

Move on another five years and the picture looks very different. Of course some articles do still get expanded, but a much larger group have shown little real development of their text over a period of several years. This is often disguised in the statistics and edit histories because there have been a large number of edits, but these are nearly all small language and spelling corrections, vandalism and spamming and its reversion, and what I’ll call peripheral edits with interwiki links, categories, infoboxes, templates, and so on.

Histories are bunk!

Lawrence Durrell Collection - history The article has been around since 2005 & from the history a lot appears to have been done. But comparing the current version with the one after the first few edits in 2005 gives a different story.
Royal Observatory, Greenwich - diff Aug 08-Nov 10 ¶ The effects of a vast list of edits from over 2 years on a popular article. In fact, when you go into them, the only significant text added is "The scientific work of the observatory was relocated elsewhere in stages in the first half of the 20th century, and the Greenwich site is now maintained as a tourist attraction.", and "Indeed prior to this, the observatory had to insist that all the electric trams in the vicinity could not use an earth return for the traction current." (in fact a few sentences have now been added since November 2010 when the diff ends).

These essentially abandoned starts are WP’s biggest problem to my mind. What approaches have been taken to address the problem?

There have been various drives to assess and improve “vital” articles across the project, but I’m not aware of any that are currently very active. Wikipedia:Vital articles updates progress on one fairly sensible list; see the bottom of the page for links to others. I won't do a detailed analysis but the assessments bear out the analysis here - FAs are very much concentrated on "specific thing" articles, and the only section to be all-FA is the Solar System. But the real problem is perhaps concentrated a level or two below this group of very top-level articles. Some Wikiprojects have made similar efforts to improve the most important articles in their own area, often with good results. But most projects are much less active than they used to be, and many have only a handful of regular commenters on their talk pages.

What about the DYK, GA & FA processes? Most DYKs are new articles on small, if not miniscule, topics. But DYK also incentivises 5x expansions, which is good. I would say that typically these are of higher quality than the new articles submitted.

GA addresses the problem squarely, and attracts a fair number of “big subject” submissions. Personally I’ve found the GA process too erratic to justify the effort, but I support the process from afar. FA is more problematic, and although I still review some, I have begun to feel that it encourages some of our best editors to concentrate on very small topics, which are much the easiest to get passed. There are exceptions of course, but I worry that the FA process now has a detrimental net effect on the project.

What can we do about it?

Personally, I have increasingly concentrated my editing on addressing this issue. I have over 160 DYKs, but recently rather more of them are expansions than new articles. When thinking about articles to edit – and I admit my editing is not always a very planned process – I look at the number of views, and favour those with high figures. Since the British Library event in late January, when I did a new article on a Persian manuscript, I’ve done a lot on Islamic art, which remains a very poorly covered area.

There’s a lot of talk about encouraging academics and other “experts” to edit – see recent Signposts. It’s unclear how successful this will be. I wonder if it might be easier to get academics interested in an “area review” assessing the gaps and weaknesses in our coverage, in some sort of organized programme covering most parts of the English WP (but perhaps not popular culture). This might avoid several of the pitfalls that seem to alarm potential academic editors. With luck some will go on to get involved in improving the weaknesses revealed. This process should be followed by drives to get the regular editors to concentrate on the weak articles and areas. I’m not really suggesting, by the way, that this should be something driven by the UK or any other chapter, except maybe for UK-only subject areas (which in general are already better than average I think).

Down the road, after such a process has been gone through, it might even be worth considering just hiring academics or other outsiders to improve selected articles. This is Wiki-heresy I know, but sometimes it’s just easier to outsource. Probably such content should not be privileged in any way, but take its chances with the rest. Personally my experience has been that good additions and rewrites even of high-traffic articles are actually pretty stable, with the exception of certain controversial areas. One can always revert.

At the moment I think that most of our readership are aware that our content is in the process of being improved, even if they are vague on how this has done. Clearly poor articles are therefore to some extent excused as temporary stopgaps. The danger is that the mass readership will come to realize how unfortunately stable much of our content in fact is, and that excuse ceases to work.