Wikipedia:Wikipedia Signpost/2013-02-25/WikiProject report

WikiProject report

How to measure a WikiProject's workload

Your source for
WikiProject News
Submit your project's news and announcements for next week's WikiProject Report at the Signpost's WikiProject Desk.
Relative WikiWork for WikiProject U.S. Roads over time

As the editor of the WikiProject Report, one of the most common questions I receive from readers (and a frequent inquiry at Database Reports and the WikiProject Council) asks how WikiProjects can be measured. The most visible way of measuring WikiProjects is the article assessment system which provides the total number of articles tagged with a project's banner as well as breakdowns of how many articles have received various class and importance ratings. Other metrics include rankings of WikiProjects by their total number of articles, the number of edits to the project's pages, and how many editors are watching the project's talk page. But how can we measure the challenges facing a project or determine a WikiProject's productivity? Luckily, several prominent projects have been doing that for years. Their answer: WikiWork.

WikiWork is a concept originally developed in April 2007 that approximates how many classes or "steps" a project's articles need to ascend for all of the project's articles to reach Featured status. WikiWork encompasses several formulas that can provide different views of a project's workload, but the most used formula and the best way to compare WikiProjects of different sizes is called "relative WikiWork" or the average workload per article. This involves adding up the number of steps all of the project's articles must pass to reach Featured status (with A-class articles considered one step, Good Articles two steps, B-class three steps, etc.) and then dividing this number by the total number of articles under the project's scope, including articles that have already reached Featured status. The resultant number will be between zero and six with lower numbers considered more desirable. For example, the relative WikiWork for WikiProject Aircraft is 4.693, meaning that the average article about aircraft is between C-class and start-class. By comparison, WikiProject Elements has a less daunting workload with a relative WikiWork rating of 3.600 while WikiProject Olympics is bogged down with relative WikiWork of 5.829.

In addition to comparing WikiProjects, WikiWork can be tracked over time to gauge a project's productivity. For example, the relative WikiWork for WikiProject U.S. Roads has decreased from around 5.6 in 2008 to about 4.5 today (see graph to the right) showing that the project has had some success in improving articles to higher classes and/or trimming unnecessary stubs. A helpful calculator is available to determine the various WikiWork statistics for your WikiProject.

As with all statistical data, there are some caveats that should be taken into account. Among these caveats are ensuring that the project is large enough for any statistical analysis to be significant and checking that the project's article assessments are accurate before using them to determine WikiWork. Furthermore, WikiWork focuses on articles, excluding lists and portals which typically do not receive comparable ratings other than being Featured or not. To learn the full details about how WikiProjects use WikiWork and the limits of WikiWork data, we interviewed Scott5114 who originated the concept of WikiWork, Hurricanehink who uses WikiWork at WikiProject Tropical Cyclones, as well as Fredddie and Rschen7754 from WikiProject U.S. Roads.

What inspired the concept of WikiWork? How was this metric originally intended to be used? Have the formulas been altered over the years to take into account any changes in Wikipedia's assessment system?
Scott5114: The U.S. Roads project was feeling somewhat inadequate at the time that the WP:1.0 assessment scheme was first introduced. Our stats looked worse than the tropical cyclone project, but we had no way of easily comparing them. I originally created WikiWork as a five-point scale; since FA status is considered the "end goal" of sorts for each article, that was assigned a score of zero, and then each class below that was assigned a score that represented the number of classes to get to FA, so for A-class it was one, GA-class, it was two, and so on, down to stubs being five (there was no C-class yet). This gives you the total WikiWork score (ω), which is the number of classes that a project must improve by in order for all articles to be FA. Divide that by the number of articles in the project, and you get the relative WikiWork score (Ω), which represents the project's average article. This latter number is the one that is used most frequently, as you can use it to easily compare one project to another, and also compare the state of your project to how it stood in the past. The scale has been altered once, to address the addition of C-Class to the WP:1.0 scheme. C-Class was assigned a value of 4, start was bumped to 5, and stub went to 6. Other specialty classes that have been added since then, like the list, template, future, and project classes, have been left out of the WikiWork formula, since it is intended to only represent the state of a project's assessed articles.
The relative WikiWork by state for the U.S. Roads project
How can the WikiWork metrics be applied to the everyday operation of a WikiProject? What kinds of goals or initiatives can use WikiWork as a measure of success?
Hurricanehink: For the tropical cyclone Wikiproject, we mainly use WikiWork in terms of our storms and season articles, since those are easy to quantify and compare from year to year. Using a Google documents file that various project members routinely edit, we compare the various tropical cyclone basins' overall quality. I use it to emphasize Atlantic seasons from 1950 to present, since those are, in general, the most commonly viewed articles in the project. There, we can see which seasons need the most work, and where we should focus.
Scott5114: In the U.S. Roads project, we often use WikiWork to compare the progress of different state task forces relative to one another. We have a "leaderboard" that lists out the WikiWork statistics for each state, and there is a lot of friendly competition surrounding this leaderboard that motivates people to improve articles. For 2013, we are working on a goal of getting the entire project to a WikiWork of 4.400.
Fredddie: One way to look at relative WikiWork is that it's a snapshot of what the project's average article looks like. This works great for the U.S. Roads project because we assess our articles based on the presence and quality of content. I can look at the project's relative WikiWork (currently 4.545, between a C and a Start) and know that the average article has X or Y and parts of Z. If they haven't already, other projects can do the same by defining what an article within their scope should look like at each class and then conduct an assessment audit to make sure articles are falling in line.
Rschen7754: The U.S. Roads project also produce many charts and graphs on a regular basis that are based largely off these numbers and are regularly updated. They can be found at WP:USRD/A/VA.
Does the size of a WikiProject impact the usefulness of WikiWork? How can editors use the WikiWork metrics to compare the workload and productivity of WikiProjects?
Hurricanehink: I think WikiWork is better in a larger project with subdivisions, since you can see the quality levels among various topics. If it's too small, then I don't think the comparisons would work.
Scott5114: WikiWork can be used by both large and small projects, but in different ways. In large projects with many task forces and subdivisions like U.S. Roads, you can create competition between the task forces that serves as a motivation tool. You can't do that so much in smaller projects, but you can compare your stats to Wikipedia as a whole or other WikiProjects, and for any project it is a great way to track your progress over time. It is a lot easier to keep slugging away at the seemingly never-ending article expansion treadmill when you can see progress being made by the relative WikiWork number going down.
Fredddie: I agree with Scott. One thing I like about the U.S. Roads WikiProject's use of WikiWork is that we break down stats by state task force. Like Scott said, we can then compare the different task forces against each other or we can compare a task force against the entire project. If you were to look at the project's "leaderboard", you can see where most of our editors are working as those state task forces are at or near the top of the page. Areas that get low editing traffic tend to be towards the bottom of the page. Then it is up to the project to incentivize working on articles in those lower reaches of the project.
What assumptions does the WikiWork formula make about the state of a project's articles? How does this limit or caveat the use of WikiWork data?
Hurricanehink: Projects have to actively use A-class and B-class for it to be effective. In the hurricanes project, we don't often use it, so every so often (as we did last night, incidentally), we go through articles and upgrade them to either of the two. Since Wikiwork evenly distributes points based on each class, it can be unrepresentative if a project doesn't actively use those classes.
Scott5114: Projects have to keep on top of their article assessments in order for the statistic to reflect reality—editors need to adjust the assessment when they expand the article, or else you will end up with inaccurate assessments, and thus an inaccurate WikiWork. It helps to have a more specialized guideline for what puts each article in each class so that you can reduce subjectivity in the assessments.
Fredddie: WikiWork assumes that every article is assessed. If a project has 50 assessed articles and a relative WikiWork of 2.0 (let's assume they're all Good Articles), but has 30 articles awaiting assessment, you are not getting the full picture. If 29 of those unassessed articles are Stubs and the other is a List, the project's real relative WikiWork is 3.468 (between B and C). On the other hand, knowing that there are 116 class improvements, that is, moving from Stub to Start to C, etc., before those 29 articles become Good Articles gives editors a clear goal with a magic number.
To date, WikiWork is actively incorporated into the initiatives of only three WikiProjects. What has prevented this concept's spread? Is it adaptable to the unique needs of other WikiProjects? Are there any tools available to simplify the calculation of WikiWork metrics for mathematically challenged editors?
Hurricanehink: I personally love it, since it's the sabermetrics of Wikipedia articles (or, more lamely put, a statistical geek's delight). People not as into statistics might be apprehensive at using it, particularly with how detailed the numbers can get. However, like ERA or RBI, it's a pretty basic stat that I think only needs a little education to be spread, provided it's used for a sufficiently large project.
Scott5114: I think it's mostly a publicity thing—most people never happen across it unless they're in a project already using it, since the U.S. Roads Project's WikiWork stuff is buried a couple levels down in our project page structure. A lack of calculation tools probably limited it initially—it was a chore to keep up with because we had to manually calculate everything with a spreadsheet and update the tracking pages by hand. Now that we can have the server calculate it with use of expr functions, and we have the WP:1.0 bot generating a WikiWork table for us whenever we get assessment updates, it is a lot easier. The formula is simple enough that anyone can make a basic calculator in their favorite spreadsheet program. There is also an online calculator available.
In what ways does the information gleaned from WikiWork complement or supplement Wikipedia's other metrics of WikiProject performance? What additional metrics are needed to give editors a clearer picture of the activity, productivity, and anticipated needs of a WikiProject?
Fredddie: This may be an assumption, but I think most people who would be interested in implementing WikiWork in their project are familiar with their regular editors and know in what areas they edit. No one tool is definitive, but WikiWork can show if Wikipedia is improving as a result of an editor or editors in a certain task force.
  • For instance, if the Useractivity tool shows an editor has 70,000 edits and WikiWork shows that editor's area is around 4.850 (just above Stub class), maybe that's a sign to give that editor some help. But then again, that editor may be more involved elsewhere.
  • Popular pages, if your project is lucky enough to have it currently, is great because it shows where our readers are going. Ultimately, what we're doing here is for our readers, so it's good to know where our energy should be spent. Then it's up to the project to coordinate working on the articles at the top of the list.
  • The HotArticles tool shows where the project's energy is being spent. It probably isn't the best tool to compare with WikiWork for that reason. Looking at HotArticles for the U.S. Roads project, out of the 20 articles listed, 6 of them were at the project's A-Class Review and WP:FAC at some point in the last year. Three other articles were contentious for various reasons, so it shows the good and the bad.
The best metric to WikiProject performance, I think, is healthy discussion on the project's talk page. It doesn't have to be edited continuously, but if a talk page is fairly active, only then will WikiWork and the other metrics used together give a clear picture of what is needed and where.
Anything else you'd like to add?
Scott5114: Projects looking into adopting WikiWork should be warned that it is pretty easy to abuse the system. You can manipulate it easily by failing to tag poor articles, having them deleted, doing bad merges, improperly tagging articles as lists, etc. Editors have to remember that the statistic exists to improve the encyclopedia, and not just for its own sake, and avoid the perverse incentives that statistic create. Anything that makes the stats look better but hurts the project shouldn't be done. We have had a few critics of the WikiWork concept who say that it has turned our project into a "role-playing game" concerned only with putting up a good score and resulted in a reluctance for editors to add new articles or to edit outside of their favorite task force. I feel that these concerns are off-base in our particular project, but they could certainly become an issue elsewhere.
Rschen7754: I think that with the current attitude of many editors that a lot of our content is not of high quality over 12 years into the project, WikiWork can be a useful tool if adopted more widely. This will give people clear direction as to where to go with an article and in a subject area, concentrating their efforts. It doesn't work for everyone or every project, but for some places it would be helpful.


Next week, we'll change the channel. Until then, grab your remote and tune in to our previous reports in the archive.