Wikipedia:Bots/Requests for approval/H3llBot 11
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Hellknowz (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 11:03, Friday August 23, 2013 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): C#, custom API
Source code available: No
Function overview: below
Links to relevant discussions (where appropriate): --
Edit period(s): Continuous
Estimated number of pages affected: <500 per Category:Pages with archiveurl citation errors then as they come up
Exclusion compliant (Yes/No): Y
Already has a bot flag (Yes/No): Y
Function details:
Appending H3llBot 4 (User:H3llBot/U2A):
In citations, when the |archiveurl=
or |url=
are set to an archive service link, but the corresponding |url=
is not set or |archiveurl=
isn't used, set the missing fields and fill in the date if needed. H3llBot 4 already covers this for urls and dates I can parse out of the citations. However, the majority of Category:Pages with archiveurl citation errors are using shorthand archive urls, so I need to actually browse the pages and retrieve the url/data.
For example, Van Cleave has 2 errors. Citations have http://www.webcitation.org/64zXFfeH5 and http://www.webcitation.org/6B2tdaqFt links, which need browsing to get the actual values -- http://www.isuresults.com/bios/isufs00012936.htm at 2012-01-26 and http://www.isuresults.com/bios/isufs00012936.htm at 2012-09-29.
I feel this is different enough in technology (actually reliably browsing the websites, editors can't tell the url/date from markup, and I need to implement each site-specific check) that this warrants a BRFA.
I'll try and add all the major/accepted archive providers I come across, including Wayback (Internet Archive), Webcite, Archive.is, Google Cache, etc.
Here is a sandbox edit with common providers converted/filled in (webcitation is down atm, but that one can be seen in previous edits).
For the record, I have also upgraded the original task with a few other parameter misuse cases. A popular being setting |archiveurl=
, but not |url=
. The logic is exactly the same, except the archive url itself was already in the correct location. You can see this in recent contribs.
Discussion
editAs an outsider, this looks good to me Hasteur (talk) 14:40, 20 September 2013 (UTC)[reply]
Approved for trial (30 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Anomie⚔ 00:04, 17 October 2013 (UTC)[reply]
{{OperatorAssistanceNeeded}} Has this trial taken place? Josh Parris 11:13, 5 November 2013 (UTC)[reply]
Trial complete. I made a batch of edits, they are last in contributions (along with earlier trial and previous incremental task upgrades before I decided this should be a full BRFA). Here's a good example of massive url misuse and lost original urls. Also found a blacklisted url. — HELLKNOWZ ▎TALK 11:32, 5 November 2013 (UTC)[reply]
- For prosperity, a permalink to the edits Josh Parris 12:08, 5 November 2013 (UTC)[reply]
- The URL injected in this edit broke the wikitext, you'll need to do some additional escaping for certain characters that might be used in URLs but that MediaWiki doesn't recognize (
/[][<>"\x00-\x20\x7F\p{Zs}]/
is what MediaWiki doesn't recognize). I also see in a few of the earlier edits (e.g. [1], [2], [3]) the URL was present but in a misnamed parameter. - Anyway, since all that seems rare and easy to fix and I have confidence you will fix them, Approved. Anomie⚔ 20:23, 5 November 2013 (UTC)[reply]
- The URL injected in this edit broke the wikitext, you'll need to do some additional escaping for certain characters that might be used in URLs but that MediaWiki doesn't recognize (
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.