User:GreenC/software/tabular

Tabular data in Lua templates, an alternative to Wikidata and/or local data pages edit

Commons Tabular ("Table") Data is an alternative method to store and access data for Wikipedia templates. An example tabular file can be seen at c:Data:Wikipedia_statistics/data.tab which has been rendered at List of Wikipedias#Detailed_list via the templates {{WP7}} + {{NUMBEROF}}, which use Module:NUMBEROF.

Tabular Data is currently an underutilized resource with only a handful of modules deploying it (May 2020).

Advantages:

  • Data hosted on Commons makes it immediately accessible to all 300+ wikis.
  • Templates can be quickly rolled out to other wiki languages because the data is hosted and updated at a single central location. For example, after re-programming {{NUMBEROF}} to use Tabular data it was rolled out to over 60 wikis in a matter of days without the need for permissions to maintain local data pages via bot.
  • Data can be updated more frequently on Commons then it would make sense to if maintaining separate data files on each wiki.
  • Commons will host data types that may not be appropriate for Wikidata, like daily temperature changes, stock prices or Wikipedia statistics.
  • JSON files are much easier to develop and work with than Wikidata.

Disadvantages:

  • The data files can not be too large or Lua will exceed memory and CPU during page rendering. This can be mitigated by creating multiple data files, but it does not scale well to very large data sets.

Technical edit

Implementing tabular data at high performance has not been documented anywhere. This shows a best practice example for Template:NUMBEROF

Lua has a hook to access .tab pages, for example local statistics = mw.ext.data.get('Wikipedia_statistics/data.tab') as seen in Module:NUMBEROF/data. The JSON is then loaded into a Lua associative array and used within the template like normal.

One problem that arises is performance. The Lua template might be used multiple times per page and each invocation will result in the JSON being retrieved from Commons which is very sub-optimal. For example in practice Template:NUMBEROF might be used as often as 4,000 times on a single page (List of Wikipedias), which causes timeout red errors due to excessive I/O with Commons. To solve this mw.ext.data.get() is used in a sub-module, such as Module:NUMBEROF/data, and the sub-module is loaded by the main module like so: mw.loadData('Module:NUMBEROF/data') - this works because loadData() is a special function that loads a sub-module only 1-time per page, meaning mw.ext.data.get() itself is only invoked 1-time.

See also edit

  • Module:Tabular data - a recently created module to facilitate tabular via #invoke - not suitable for high-performance