Wikipedia:Bots/Requests for approval/YTStatsBot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was Request Expired.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: Petewarrior (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 08:37, Friday, February 19, 2021 (UTC)
Function overview: Update the stats (subscriber count, views, last update date) in YouTube Personality infoboxes by data fetched from YouTube API.
Automatic, Supervised, or Manual: Supervised
Programming language(s): Python 3.7
Source code available: https://github.com/petewarrior/pywikibot-youtube-subscribers/
Links to relevant discussions (where appropriate):
Edit period(s): Manually (every few days)
Estimated number of pages affected: 10-20, may accept requests as long as infobox is in the supported format
Namespace(s): Mainspace
Exclusion compliant (Yes/No): Yes
Function details:
This script works in three steps:
1. Parse a YouTube personality infobox on a Wikipedia page to obtain the channel ID (except for infoboxes using channel_direct_url)
2. Query the statistics for the channel ID to the YouTube API
3. Update the stats in the infobox
Requires Pywikibot, Python 3, and a YouTube API key.
Page requirements
- The page must have a Wikipedia YouTube personality infobox, either on its own or as a module in another infobox.
- The script can only parse YouTube channel ID in the channel_url field. If the infobox uses channel_direct_url, both channel_id and channel_direct_url must be hardcoded in the script (example in source code). Only one channel per infobox is supported.
- Last update time is printed in the stats_update field. It must already exist and have a date formatted using the Wikipedia date template with the YYYY-MM-DD format.
Discussion
editInitial thoughts:
- Do you plan to only run this on some specified articles? Wouldn't it be better to have it run on all pages using {{Infobox YouTube personality}}?
- Suggest advertising this discussion to the template talk page.
- A similar task was suggested in Wikipedia:Bots/Requests for approval/YoutubeSubscriberBot. Similar questions apply to this task. For example, will it round the stats to 2 or 3 sig figs (so that it's not spamming the article history with daily updates)?
ProcrastinatingReader (talk) 08:52, 19 February 2021 (UTC)[reply]
- Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT⚡ 09:16, 19 February 2021 (UTC)[reply]
- Petewarrior you shouldn't edit using the bot account until the BRFA is approved. Though, since you already have I've looked at some of the edits, e.g. at Rina Ikoma (diff), and concern #3 is even more prominent I think. The figures must be rounded to 2 or 3 significant figures before editing, to ensure it doesn't edit much and not until substantive changes are needed, otherwise it's going to spam the page history. ProcrastinatingReader (talk) 09:25, 19 February 2021 (UTC)[reply]
- Alternatively, I would support {{Infobox YouTube personality}} being edited to try to fetch the data from either Wikidata or a userspace/sub-template page, and the bot updating that page continuously. That way, there will be no edits made to the article itself. I think this is probably the best approach to take with data-updating tasks like this. ProcrastinatingReader (talk) 09:27, 19 February 2021 (UTC) Diff added to page link for ease of use later. Primefac (talk) 11:37, 19 February 2021 (UTC)[reply]
- Agreed on both counts; not only does it make more sense to pull pages from the template, if we really are going to look into automating this the template should have a subpage with storage for all of the subscriber counts, meaning that only one page needs to be updated. At just under 2k uses we'd need to be a little creative with our switch statements, but really the initial setup will be the main hurdle. Primefac (talk) 11:37, 19 February 2021 (UTC)[reply]
- I feel like structured data like this is best suited for Wikidata. However, that would mean this BRFA has to be opened there (as you know, but to be clear for the operator: enwiki BAG has no authority on Wikidata bot operations). A switch should also be okay I'd imagine (not sure tho, never personally tried a 2k entry switch, have you?). Technically however, a for loop in the code can easily produce the output though. I personally think the best option is to send this off to Wikidata, though. ProcrastinatingReader (talk) 11:46, 19 February 2021 (UTC)[reply]
- I also agree that Wikidata would be the best place for this sort of thing - it would also make adding new channels easier. With the subpage-and-switch-statement method it would require manually tweaking it every time an additional channel needs to be added (I have no idea whether that's a real concern however, do we foresee the use of this infobox growing significantly over time?). This means the data can be updated as often as we like without spamming pages with "changed 3.44m to 3.43m". ƒirefly ( t · c ) 11:53, 19 February 2021 (UTC)[reply]
- A lua module querying a .json page is probably more efficient than a massive wikicode switch statement (I'm not sure, but I think the latter might trigger the WP:PEIS limit). --Ahecht (TALK
PAGE) 14:38, 8 March 2021 (UTC)[reply]
- I feel like structured data like this is best suited for Wikidata. However, that would mean this BRFA has to be opened there (as you know, but to be clear for the operator: enwiki BAG has no authority on Wikidata bot operations). A switch should also be okay I'd imagine (not sure tho, never personally tried a 2k entry switch, have you?). Technically however, a for loop in the code can easily produce the output though. I personally think the best option is to send this off to Wikidata, though. ProcrastinatingReader (talk) 11:46, 19 February 2021 (UTC)[reply]
- Agreed on both counts; not only does it make more sense to pull pages from the template, if we really are going to look into automating this the template should have a subpage with storage for all of the subscriber counts, meaning that only one page needs to be updated. At just under 2k uses we'd need to be a little creative with our switch statements, but really the initial setup will be the main hurdle. Primefac (talk) 11:37, 19 February 2021 (UTC)[reply]
- Apart from the subpage and Wikidata methods mentioned above, Commons structured data is also useful. See Template:NUMBEROF#How it works. A new template/module in the infobox would retrieve data from a table at Commons (available at all projects). A bot would update the Commons table. Johnuniq (talk) 00:23, 20 February 2021 (UTC)[reply]
- Hello, thank you for all your feedback. Following that, I've created a rough script to update the relevant stats in Wikidata, which is in the same Github repo if anybody would like to check it out. I'm still figuring out how to get the WikidataIB module to fetch the data (it seems to dislike string target values). I haven't tried the Commons API.
As to why it can't update all Youtube infobox in its current iteration, the boxes need to already contain data in a very specific format, which is also why it's proposed to make it a supervised bot to be manually run and verified every week or so. PetéWarrior (talk) 13:16, 20 February 2021 (UTC)[reply]- In regards to WikidataIB, if you need help it could be worth getting in touch with RexxS who maintains the module; he may be able to assist.
- In regards to all YouTube infoboxes, I'm not overly familiar with this template so question: shouldn't all transclusions have some kind of unique identifier (likely the URL/channel ID), and if so can't it automatically update wherever that is present? For the purposes of approval it's fine if you want to manually run and supervise its updates, so that won't affect this approval proceeding, but I do imagine it would be easier if it could safely run as automated. In particular, why is it not possible to parse channel_direct_url? ProcrastinatingReader (talk) 12:30, 25 February 2021 (UTC)[reply]
- Ahecht, are you the one who set up the module and bot running {{high use}}? Would this be a similar situation where a single module can call a specific value out of a /data module for ease of updating? Primefac (talk) 13:50, 8 March 2021 (UTC)[reply]
- Yes, I am. That approach would work, but I agree with other editors that putting this on Wikidata or commons would be better (assuming Wikidata has something like the Bot flag to prevent it from clogging up watchlists). Unlike template counts on enwiki, YouTube subscriber counts are useful to multiple language's wikis. If it can't pass a Wikidata or Commons BRFA (and I'll admit I'm not at all familiar with those processes), having the infoboxes query a local .json file would be a better approach than updating each infobox individually, since it won't be filling up the edit history of each and every YouTuber article on a regular basis. Either approach would mean that there would be a ~1 week lag between adding the infobox and the subscriber numbers showing up, but I don't see that as a major hurdle. --Ahecht (TALK
PAGE) 14:35, 8 March 2021 (UTC)[reply]- Worth noting that the transclusions can be purged using User:ProcBot/PurgeList right after the scheduled data update, for immediate update of the articles. ProcrastinatingReader (talk) 16:14, 8 March 2021 (UTC)[reply]
- The lag I was referring to was that between the infobox being added and the next bot run. --Ahecht (TALK
PAGE) 18:44, 8 March 2021 (UTC)[reply]
- The lag I was referring to was that between the infobox being added and the next bot run. --Ahecht (TALK
- Worth noting that the transclusions can be purged using User:ProcBot/PurgeList right after the scheduled data update, for immediate update of the articles. ProcrastinatingReader (talk) 16:14, 8 March 2021 (UTC)[reply]
- Yes, I am. That approach would work, but I agree with other editors that putting this on Wikidata or commons would be better (assuming Wikidata has something like the Bot flag to prevent it from clogging up watchlists). Unlike template counts on enwiki, YouTube subscriber counts are useful to multiple language's wikis. If it can't pass a Wikidata or Commons BRFA (and I'll admit I'm not at all familiar with those processes), having the infoboxes query a local .json file would be a better approach than updating each infobox individually, since it won't be filling up the edit history of each and every YouTuber article on a regular basis. Either approach would mean that there would be a ~1 week lag between adding the infobox and the subscriber numbers showing up, but I don't see that as a major hurdle. --Ahecht (TALK
- It's been a few weeks since the last update, so I'm wondering where we stand. Petewarrior, are you still thinking of doing this through Wikidata or should we explore Commons structured data? Do you need help getting either of these working? — The Earwig (talk) 05:33, 29 March 2021 (UTC)[reply]
- Request Expired. No response from user. For what it's worth there's no issue with re-opening this req if/when requested, but I think a discussion about how to keep these pages updated is in order first (e.g. update each page, update a central template, update/call from WD). Primefac (talk) 13:14, 9 April 2021 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.