Wikipedia:WikiProject Tropical cyclones/Data migration

This page details planned implementation for a solution to the ever-growing amount of data-related edits on tropical cyclone season pages. These edits count merely as statistics edits, and essentially inflate edit counts of articles without significant impact to the content of the article. This essay is built upon the idea that "data edits go on Wikidata, content edits go on Wikipedia". This hopefully serves as a clear distinction between changes.

Consensus for this implementation plan has not yet been gathered — this will be accomplished after the plan's first draft has been accomplished. Participants from both Wikidata and Wikipedia are encouraged to join the discussion.

Preface edit

As found in this analysis, a significant number of edits on the 2021 Pacific typhoon season article consists of sub-10 byte difference edits (nearly half the article's edit count). Most of these edits consist of numbers changes and other statistical changes. Since Wikipedia is not an indiscriminate collection of information, and because these edits are essentially nullified at the end of a tropical cyclone's lifespan when all current-related data is removed from the article, the recommended course of action is to move statistical edits off of Wikipedia and transfer them to Wikidata.

Goals edit

This plan has the following goals:

  • Create a systematic method of updating current storm data on Wikidata and Wikipedia.
  • Move all "current storm information" statistics (and possibly information in {{infobox tropical cyclone current}} infoboxes) to Wikidata.
  • Ensure compatibility across all storm basins and meteorological agencies when implementing the changes.
  • (Optional) Increase WikiProject Tropical cyclones project activity on Wikidata.

Participating edit

If you have any comments or concerns, please leave a message on the talk page.

If you'd like to participate in developing templates, bots, modules, or any other technical details of the plan, please include your name below. A modicum of technical proficiency is required, as the tasks involved in this project are highly dependent on code and other software work.

  1. Chlod (say hi!) 21:42, 11 March 2022 (UTC)[reply]

Phase 1: Infrastructure edit

In order to facilitate this change, the proper infrastructure must be created on both wikis.

Adjust Wikidata properties and identifiers edit

Wikidata currently has the following properties related to storm information:

Wikidata does not have properties related to:

Wikidata also does not have identifiers for agency tropical cyclone details. These should be supplied in order to automatically generate the "for the latest official information" part of the current storm information.

Create items for all storm classifications edit

Classifications will be on a per-agency basis, to be used for instance of (P31). This list outlines classifications with (linked) and without (unlinked) items:

Adjust future Wikidata items edit

This change primarily targets current storm information. Thus, there is no need for retroactive change on existing tropical cyclone items (although it is highly suggested, especially since Wikidata items for storms are highly under-maintained). For future Wikidata items, the following must be observed:

Create templates on Wikipedia edit

In order to display the data on Wikipedia, there must be a template that specifically generates the content for a "current storm information" section.

Since transcluded Wikidata statistics are automatically given a pencil icon, it shouldn't be hard for existing editors to make changes to the storm data. This also doesn't make it hard for vandals to edit the data: protection details are outlined in the following sections. This template will display the latest (most recent point in time (P585)) value of the maximum sustained wind speed and lowest atmospheric pressure. Only either 10-minute winds or 5-minute winds will be displayed (along with 1-minute winds, which are always shown if available), depending on the basin. Official sources will be provided through identifiers. Closest reference point will always be shown, along with gusts (if available) and movement (if available).

Phase 2: Optimization edit

Data importation edit

Most of the data can be automatically generated. Best track data from the JTWC and RSMCs are freely available and can be scraped or inclusion on Wikidata. This data can be imported by specifically-designed bots.

Cewbot by Kanashimi is already responsible for uploading images from meteorological agencies to Commons (see BRfA). Because of this, Wikidata items for storms can automatically be updated with the uploaded track maps. This, however, does not include automatic updates involving best track data. For this, another bot should be used instead, or an existing bot can be used as long as it can accurately import data at proper intervals.

{{Infobox tropical cyclone current}} edit

Template:Infobox tropical cyclone current (ITCC) suffers from an extremely complicated and messy wikitext-only based infobox. Given that this infobox is used for all basins, standardization should be done in order to ensure that (a) editors will no longer have confusion over the available types and classifications, and (b) Wikipedia can pull data from Wikidata without errors.

The following changes are suggested for standardization:

  • Replace switches on category, AUScategory, JMAtype, JMAcategory, IMDtype, IMDcategory, MFRtype, and MFRcategory with one centralized category database (i.e. Module:Storm categories).
    • (On the module) Prevent specific categories from being used on basins it's not supposed to be in.
  • Fix parsing problems with lat and lon.
  • Stop relying on multiple if statements for gusts position.

NOTE: Much of the issues here have been fixed in an in-development modular infobox, {{Infobox weather event}}.

Timelines edit

Timelines can also be automatically generated by grabbing entity data of the current storm season, finding those cyclones that are part of it, and creating the graph automatically using a module. Such a graph no longer needs updating on Wikipedia, and instead delegates updates to Wikidata.

Phase 3: Conclusion edit

All related changes should be made to documentation pages and current WikiProject members should be notified of the proper changes. This does have the side effect of causing multiple changes to norms, but this new system will inevitably lead to a much more standardized and improved system. Otherwise, WikiProject Tropical cyclones will be stuck with old code and inflated season article edit counts — a hurdle in administration and organization.

In order to maintain this system, WikiProject members are highly encouraged to participate in Wikidata. Wikidata items can be watchlisted much like Wikipedia pages, so it wouldn't be difficult to patrol the items of current storms (which rely on Wikidata). This new system also supports adding in the source of a data point (for example, maximum sustained winds (P2895)) in order to deter users from editing the extremely-vague "winds" parameter, leading to edit wars on which agency to prioritize.

Backporting this system to old storms in order to also automatically supply values in {{Infobox tropical cyclone}} is possible, although access to best track data is no longer required as there only needs to be a singular value — the bare minimum that the infobox requires. This, however, is not as important since edits to previous storm infoboxes do not significantly inflate the edit count of an article.

Implementation edit

Implementation should begin as soon as consensus has been achieved. For template changes, the WikiProject will be given a 15-day notice with a list of changes and examples for usage. For template creations, their creation should be announced on the WikiProject talk page to increase usage, much like the former.

Expected outcomes edit

  •   Begin adding in storm data on Wikidata
  •   Massively reduce non-prose (or data-only) edits to cyclone season articles
  •   Automatically update current storm information
  •   Automatically write the "current storm information" section of articles
  •   Use a bot to automatically add data to Wikidata items
  •   Automatically generate cyclone season article timelines
  •   Invite patrollers to Wikidata items of cyclones to prevent disruption (or automatically flag such edits)