Wikipedia:Bots/Requests for approval/BHGbot 9
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was Denied.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: BrownHairedGirl (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 00:21, Thursday, August 19, 2021 (UTC)
Function overview: Remove the banner tag {{Cleanup bare URLs}} from articles which no longer have any WP:Bare URLs.
Automatic, Supervised, or Manual: Automatic
Programming language(s): AWB module (C#)
Source code available: will be published once written, and before any trial run. I don't want to spend time coding it unless there is support in principle for this task. Wikipedia:Bots/Requests for approval/BHGbot 9/AWB module
Links to relevant discussions (where appropriate):
Edit period(s): Initial run to clear the backlog, then about weekly
Estimated number of pages affected: Initial run ~1,650 pages. Thereafter a rough guesstimate of ~50 pages per week. updated estimate 20:20, 18 October 2021 (UTC): 1424 pages
Namespace(s): Article, Draft.
Exclusion compliant (Yes/No): Yes
Function details: initial article list to consist of all main- and draft-space transclusions of {{Cleanup bare URLs}}. With each page:
- check that the page contains the banner template {{Cleanup bare URLs}}, or one of its many aliases. If not, skip the page.
- count the number of {{Bare URL inline}} tags in the page, including aliases
- count the number of untagged bare URL refs in the page, i.e. those which match the regex
<ref[^>]*?>\s*\[?\s*https?:[^>< \|\[\]]+\s*\]?\s*<\s*/\s*ref
- if the total matches of step 2 + step 3 is greater than zero, then skip the page
- Optional check for bare URLs not in ref tags:
- Test for existence on the page of any other URLs which are not:
- wrapped in a {{cite}} tag, or
- wrapped in {{URL}}, or
- formatted as
[http://www.example.com/foo some-non-space-characters]
, or - the value of a
|website=http://www.example.com/foo
parameter in any infobox
- if any such URLs exist, then skip the page
- Test for existence on the page of any other URLs which are not:
- remove the banner {{Cleanup bare URLs}}, and save the page with AWB genfixes, using an edit summary of the form
WP:BHGbot 9: removed {{Cleanup bare URLs}}. This page currently has no bare URLs
- Note 1
- Step 5 (check for bare URLs not in ref tags) is based on the discussion at User talk:Citation bot/Archive 26#Cleanup tag not removed after problem fixed, where both @AManWithNoPlan and @Headbomb advocated retaining the banner tag if there are any bare URLs anywhere on the page.
I think that this approach is overly cautious, because in practice the {{Cleanup bare URLs}} tag seems to be used overwhelmingly for bare URLs within ref tags. However, I am happy to include this step unless there is consensus to omit it.
- Note 2
- My estimate of ~1,650 pages in the initial run is based on comparing the 7,362 pages currently transcluding {{Cleanup bare URLs}} with a scan of the 17 August database dump which found 459,013 pages with 1 or more bare URLs in ref tags. That comparison found 1,665 pages transcluding {{Cleanup bare URLs}} but without bare URLs.
If the bot is set to skip pages with bare URLs not in ref tags, the initial run will be significantly less than 1,650 pages, but until I run the bot in pre-parse mode I won't know how much less.
- Note 3
- Coding the AWB module is not complicated, but testing it and debugging it without a proper development environment is very slow. So I don't want to put in a few hours work without having first checked that the task has approval in principle.
Discussion
editRegarding step 2 & step 3, what if you've got a bare URL tag but the URL is no longer bare, and elsewhere you've got an untagged bare URL? Your bot would skip this if it's only relying on counts? Ditto if there's an inline tag for a URL that's no longer bare? ProcrastinatingReader (talk) 00:34, 19 August 2021 (UTC)[reply]
- @ProcrastinatingReader: thanks for that observation. I hadn't factored in the case of a ref which has been fixed, but the {{Bare URL inline}} tag has not ben removed. I think that such cases will be rare, and that it will be even more rare to have that oddity and a banner tag {{Cleanup bare URLs}} (without which this bot will reject the page in step 1).
- If you like, I can add extra check for such misplaced {{Bare URL inline}} tags, but I would prefer not to do so, simply to avoid adding extra complexity to accommodate a very rare case whose consequence would be a mistaken skip rather than the more serious matter of a mistaken removal. --BrownHairedGirl (talk) • (contribs) 00:57, 19 August 2021 (UTC)[reply]
- PS I just ran https://petscan.wmflabs.org/?psid=19858257 to check for main- and draft-space pages which transclude both {{Cleanup bare URLs}} and {{Bare URL inline}}: total 12 pages.
- I checked them all for the case you described, and found only one kindof match, on List of gangs in New Zealand. An IP had wrong added[1] {{Bare URL inline}} after
</ref>
, instead of the correct placement before it. Then reFill filled a bunch of refs,[2] but didn't remove {{Bare URL inline}} because it was not inside the ref tags. I have now fixed[3] that page. --BrownHairedGirl (talk) • (contribs) 01:27, 19 August 2021 (UTC)[reply]
Regarding step 5 and note 1, I don't see why step 5 is necessary. Is there an example of a page with such a URL (of the 'non-ref bare URL' variety) so I can see a valid use case? ProcrastinatingReader (talk) 00:42, 19 August 2021 (UTC)[reply]
- @ProcrastinatingReader: Thanks again. I have identified no such cases. I added Step 5 solely out of respect for the objections already made by the two highly experienced and technically skilled editors who raised the issue at User talk:Citation bot/Archive 26#Cleanup tag not removed after problem fixed. I can't see the use cases myself, but I have high regard for their judgement, which is why I am willing to accommodate their concerns unless there is consensus to proceed without Step 5.
- Maybe @AManWithNoPlan and/or @Headbomb could comment here? --BrownHairedGirl (talk) • (contribs) 01:04, 19 August 2021 (UTC)[reply]
For bare URLS without ref tags, here are some basic example
According to a report published at at http://www.example.com, 63% of statistics are made up. ==References== * http://www.example.com ==External links== * http://www.example.com
Headbomb {t · c · p · b} 01:19, 19 August 2021 (UTC)[reply]
- @Headbomb: I get the situation, which is more common in older articles (before incline cites became strongly preferred in ~2007), but is there any evidence that {{Cleanup bare URLs}} is actually used to tag such issues? --BrownHairedGirl (talk) • (contribs) 01:30, 19 August 2021 (UTC)[reply]
- Pretty sure that in the several thousand of articles with such bare urls, at least one was tagged with {{Cleanup bare URLs}}. Headbomb {t · c · p · b} 02:02, 19 August 2021 (UTC)[reply]
- For example Ciudad del Carmen or Duncan Sandy, which you've yourself tagged with {{Cleanup bare URLs}}. Headbomb {t · c · p · b} 02:09, 19 August 2021 (UTC)[reply]
- @Headbomb: in each case I applied the tags because the page had bare inline refs. That was my sole selection criteria. I didn't even glance at the external links.
- Are you telling me that having applied the tags for that reason, I can't remove them when that problem is resolved? --BrownHairedGirl (talk) • (contribs) 02:23, 19 August 2021 (UTC)[reply]
- When the problem is resolved, yes. But you've asked for cases where step 5 would be necessary, and those are two examples with {{Cleanup bare URLs}} and non-ref bare URLs. Headbomb {t · c · p · b} 02:35, 19 August 2021 (UTC)[reply]
- @Headbomb: I fear that we may be talking past each other.
- So, just to clarify, my AWB job added the cleanup banner only to pages with bare URLs inside <ref></ref> tags. Same with the hundreds which I have since added manually as I follow around after Citation bot's processing of the lists which I feed it. AIUI, User:GreenC bot/Job 16 also selects only pages with bare URLs inside <ref></ref> tags.
- It seems to me that you are saying that the tags should not be removed after the resolution of the the problem which caused their addition, because there is another unresolved issue to which the tag might have been addressed if it was applied by someone else using different criteria, even tho you have not identified any instance of such usage. Is that what you intend? --BrownHairedGirl (talk) • (contribs) 02:58, 19 August 2021 (UTC)[reply]
- You may have added {{Cleanup bare URLs}} to pages with bare URLs in refs tags, but the criteria for the removal of {{Cleanup bare URLs}} is the cleanup of all bare urls, not just those in ref tags. Headbomb {t · c · p · b} 06:10, 19 August 2021 (UTC)[reply]
- @Headbomb: I can see the logic in that approach, but I think it's too rigid. It will leave a lot of pages inappropriately stuck with the tag because of some external links, which are much less significant than refs.
- Let's see what others think. --BrownHairedGirl (talk) • (contribs) 06:47, 19 August 2021 (UTC)[reply]
- I dunno... the text of {{Cleanup bare URLs}} and its documentation look like the template is just for bare URLs in references. I wouldn't think we should care too much about other URLs, so I agree with BHG & proc that step 5 would be unnecessary. Enterprisey (talk!) 07:17, 19 August 2021 (UTC)[reply]
- Disagree there. The template isn't just for bare URL in ref tags. For example, a reference section with a non-ref tag'd bare external link. Or further reading sections. Those too should be converted to full citations. Or inline external link used as a reference. Likewise, for external links, it's a very high probability that templates like {{Official}} need to be used. It covers all bare urls. Headbomb {t · c · p · b} 07:24, 19 August 2021 (UTC)[reply]
- It seems to me that @Enterprisey's view is better supported by the documentation at {{Cleanup bare URLs}}. --BrownHairedGirl (talk) • (contribs) 15:35, 19 August 2021 (UTC)[reply]
- Disagree there. The template isn't just for bare URL in ref tags. For example, a reference section with a non-ref tag'd bare external link. Or further reading sections. Those too should be converted to full citations. Or inline external link used as a reference. Likewise, for external links, it's a very high probability that templates like {{Official}} need to be used. It covers all bare urls. Headbomb {t · c · p · b} 07:24, 19 August 2021 (UTC)[reply]
- You may have added {{Cleanup bare URLs}} to pages with bare URLs in refs tags, but the criteria for the removal of {{Cleanup bare URLs}} is the cleanup of all bare urls, not just those in ref tags. Headbomb {t · c · p · b} 06:10, 19 August 2021 (UTC)[reply]
- When the problem is resolved, yes. But you've asked for cases where step 5 would be necessary, and those are two examples with {{Cleanup bare URLs}} and non-ref bare URLs. Headbomb {t · c · p · b} 02:35, 19 August 2021 (UTC)[reply]
- For example Ciudad del Carmen or Duncan Sandy, which you've yourself tagged with {{Cleanup bare URLs}}. Headbomb {t · c · p · b} 02:09, 19 August 2021 (UTC)[reply]
- Pretty sure that in the several thousand of articles with such bare urls, at least one was tagged with {{Cleanup bare URLs}}. Headbomb {t · c · p · b} 02:02, 19 August 2021 (UTC)[reply]
- Since bare URLs seem to be about link rot with regards to citations (WP:BAREURLS), I'm not sure the links in the "External links" section, which often just describe the page or site name, are really covered. But there may be better venues to have this discussion if we can't come to a consensus here. ProcrastinatingReader (talk) 16:20, 19 August 2021 (UTC)[reply]
- Barring that, we can just proceed with the automated task with step 5 and see where that gets us. ProcrastinatingReader (talk) 16:22, 19 August 2021 (UTC)[reply]
- @ProcrastinatingReader: as I noted in the proposal, I am happy to proceed with step 5 included. It's not my first choice, but better than no cleanup.
- If there is some consensus elsewhere to omit step 5, then it will be trivial matter to disable step 5, subject to BRFA approval.
- @Headbomb and Enterprisey: are you happy to proceed on that basis? --BrownHairedGirl (talk) • (contribs) 16:32, 19 August 2021 (UTC)[reply]
- Barring that, we can just proceed with the automated task with step 5 and see where that gets us. ProcrastinatingReader (talk) 16:22, 19 August 2021 (UTC)[reply]
- Possible trial. I may be getting ahead of things here, but if BAG is minded to consider authorising this task with Step 5 included, please can I ask that we start with a trial and go through a few iterations?If step 5 is involved, it would be very helpful to have multiple sets of eyes scrutinising test cases for false positives and false negatives in the check for bare links elsewhere the page. --BrownHairedGirl (talk) • (contribs) 20:44, 19 August 2021 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please make sure at least 20 of these 50 edits include a 'step 5' skip. ProcrastinatingReader (talk) 21:06, 19 August 2021 (UTC)[reply]
- @BrownHairedGirl: Can I gently follow up on this BRFA? Do you still plan to go ahead with it? ProcrastinatingReader (talk) 10:10, 18 October 2021 (UTC)[reply]
- @ProcrastinatingReader: Thanks for the nudge ... and for being gentle about it after such a long delay.
I have been noticing in the last week or so that there is a lot of work for this bot job to do, so I need to get back to work on it. --BrownHairedGirl (talk) • (contribs) 10:56, 18 October 2021 (UTC)[reply]
- @ProcrastinatingReader: Thanks for the nudge ... and for being gentle about it after such a long delay.
- @BrownHairedGirl: Can I gently follow up on this BRFA? Do you still plan to go ahead with it? ProcrastinatingReader (talk) 10:10, 18 October 2021 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please make sure at least 20 of these 50 edits include a 'step 5' skip. ProcrastinatingReader (talk) 21:06, 19 August 2021 (UTC)[reply]
- Trial complete.
- @ProcrastinatingReader: I have completed the trial run of 50 edits: see contribs list. Note that there are 51 edits, because #44 is a revert.
- The source code is published at Wikipedia:Bots/Requests for approval/BHGbot 9/AWB module.
- To test the bot, I made a list of articles transcluding {{Cleanup bare URLs}} which had been edited by Citation bot in one of its last 25k edits, because that concentrates pages likely to have had bare URLs fixed.
- For an annotated list of the pages scanned, see Wikipedia:Bots/Requests for approval/BHGbot 9/Article list for trial run 01.
- Note that there was one false positive: this edit[4] to Treasurer of the Household. (#46 in the contribs list, #8 in the annotated list of pages scanned). It should have failed the Step5 check, but didn't.
- I tracked that problem down to an error in line 47 of the code: I had omitted the "\s+" in this regex:
string nonBareURLMatcher = @"\[\s*https?://[^>< \|\[\]]+\s+[^\]]+\]";
. - After that bug was fixed, there were 44 further edit, with no further false positives.
- I have not yet checked the skipped pages to look for false negatives. --BrownHairedGirl (talk) • (contribs) 14:48, 18 October 2021 (UTC)[reply]
- @Headbomb and Enterprisey: I would value your scrutiny of the trial, if you have time. --BrownHairedGirl (talk) • (contribs) 14:51, 18 October 2021 (UTC)[reply]
- PS @ProcrastinatingReader asked me to
please make sure at least 20 of these 50 edits include a 'step 5' skip
.
- Maybe I have misunderstood that request, but it seems to me to be self-contradictory: if a page is skipped at step5 (or any other step), it will not be edited.
- I assume that the spirit of PR's request was that we should be able to check that Step5 was skipping where needed, so I devised another way of checking that. I hacked the module so that it saves a page which failed step5, but skips everything else: see Wikipedia:Bots/Requests for approval/BHGbot 9/Step5 checker.
- I ran that checker in pre-parse mode on the entire set of 235 pages in WP:Bots/Requests for approval/BHGbot 9/Article list for trial run 01.
- That found the following four pages with no bare URL inline refs, but which failed Step5:
- PR, does that satisfy your concerns? --BrownHairedGirl (talk) • (contribs) 17:17, 18 October 2021 (UTC)[reply]
- It was in August so I can't remember exactly what I was thinking, but I think the spirit of that part was to test to make sure step 5 is working. What you've done works.
- I'd prefer to review the BRFA all at once, so (since they're pinged) I'll wait a bit for Enterprisey and Headbomb to comment, if they want, before reviewing. ProcrastinatingReader (talk) 16:08, 19 October 2021 (UTC)[reply]
- I have left a note[5] for Headbomb on their talk. BrownHairedGirl (talk) • (contribs) 12:15, 21 October 2021 (UTC)[reply]
- Updated estimate of number of pages affected I just ran the module in pre-parse mode on all the 7,920 article- and draft-space pages which transclude {{Cleanup bare URLs}}. That produced a total of 1,424 pages from which {{Cleanup bare URLs}} should be removed, listed at WP:Bots/Requests for approval/BHGbot 9/Pre-parsed list for first run. --BrownHairedGirl (talk) • (contribs) 20:17, 18 October 2021 (UTC)[reply]
{{BAGAssistanceNeeded}} @ProcrastinatingReader: the trial was completed 8 days ago. It would be great to have this reviewed, because I would like to get on with removing {{Cleanup bare URLs}} from the near-20% of pages where it is now superfluous. --BrownHairedGirl (talk) • (contribs) 15:09, 26 October 2021 (UTC)[reply]
- For [6] isn't the IMDb one technically a bare URL? Seems bot was confused due to the {{better source needed}} being within the ref tags. Similar at [7], where FN23 is malformed, although arguably GIGO. At [8] the ref is kinda a bare URL? (but it would be difficult for a bot to account for) ProcrastinatingReader (talk) 14:35, 29 October 2021 (UTC)[reply]
- Thanks for the review, @ProcrastinatingReader. I'll take those points in order, but first please note the first line of WP:Bare URLs:
A bare URL is a URL cited as a reference for some information in an article without any accompanying information about the linked page
. As noted in the initial proposal, I coded that as: those which match the regex<ref[^>]*?>\s*\[?\s*https?:[^>< \|\[\]]+\s*\]?\s*<\s*/\s*ref
. That regex has worked succesfully in all three cases:- the IMBD ref in [9] does not fit the technical definition at WP:Bare URLs: a ref which displays no other info about the linked page. The most common situation where a tag makes the ref "not bare" is a tagged dead link (e.g.
<ref>http://example.com/foobar {{dead link}}</ref>
), and in that case I think it is right to treat it as "not bare", because the only available extra info is that is dead.
In this case, the extra info is that this source should not be used, so again I think it is right to treat it as "not bare", because the fix needed is to find a better source not to fill this ref. - [10] fN 23
<ref>{{Cite web|url=https://www.hattrick.co.uk/Show/Small_Potatoes|title = //www.hattrick.co.uk/Show/Small_Potatoes}}</ref>
is not in any sense a bare URL ref. It is a filled cite template, albeit filled wrongly. - [11]
<ref>[http://www.cambridge.gov.uk/public/councillors/agenda/2005/0119plan_files/4_1.pdf cambridge.gov.uk] {{webarchive |url=https://web.archive.org/web/20070927171940/http://www.cambridge.gov.uk/public/councillors/agenda/2005/0119plan_files/4_1.pdf |date=27 September 2007 }}</ref>
also does not in anyway the fit the definition at WP:Bare URLs. It is filled with lots of stuff, albeit crudely.
- the IMBD ref in [9] does not fit the technical definition at WP:Bare URLs: a ref which displays no other info about the linked page. The most common situation where a tag makes the ref "not bare" is a tagged dead link (e.g.
- It is now two weeks since the trial was completed, and I would very much like to get the bot running. In the last ten days, someone has chased down and manually removed a few hundred superfluous {{Cleanup bare URLs}} tags. I think it is a great pity that someone is putting hours of their time to do a task for which a bot is coded and tested, and I doubt that any manual process is doing it with as high accuracy. --BrownHairedGirl (talk) • (contribs) 23:43, 1 November 2021 (UTC)[reply]
- I might be wrong, but it seems like these three examples fit the definition of bare URL 'in spirit' to me. #2 might be a filled cite template in wikitext, but to a reader it's identical to just a URL reference (what I understood to be the meaning of "bare URL"). Similar for #1 and #3. Course, I don't expect a bot to account for cases like these, but it makes me wonder whether there's a CONTEXTBOT issue, as my understanding was that those three pages were correctly tagged.
- Another BAG's input would be appreciated; @Primefac, Headbomb, and SD0001: any of you able to take a look and provide a second opinion? ProcrastinatingReader (talk) 21:53, 2 November 2021 (UTC)[reply]
- @ProcrastinatingReader: thanks for taking the time to reply.
- However, I am disappointed in the reply. #2 and #3 are not bare URLs; they are badly-filled URLs which in some respects look like a bare URL. That is of course a problem, but it is a different sort of problem.
- I am feeling disillusioned about this. My preference was for a very simple task, but to accommodate objections from Headbomb, I made it much complex even tho nobody else supported Headbomb's view.
- I coded that promptly on the day after trial was authorised, but couldn't get it to compile, and the extra layers made it hard to find the bug. So I left it aside and eventually completed it two months later after a helpful nudge prompted me to do another few hours of work.
- So far as I can see, the bot is doing all that was asked of it, without error. But Headbomb, who asked for the extra complexity, has not responded to either a ping 15 days ago or a msg on their talk 10 days ago -- so we have no feedback about whether their objections are satisfied.
- And now the definition of bare URL is being radically expanded in ways which would require several extra layers of complex analysis. That would inevitably be very fuzzy and thus open to ongoing challenge as the bot runs and new examples are found of non-bare refs being badly filled in ways I hadn't foreseen. In a nutshell, this fuzzy "spirit of bare URL" approach is so wide open that any bot trying to satisfy it would be repeatedly accused of malfunctioning.
- So I'm sorry, ProcrastinatingReader, but I am sticking with the simple and narrow definition of bare URL. If that definition is unacceptable, then it would have been nice to have heard that in August, before I wasted time coding on the basis of a clearly stated definition which was unchallenged and unquestioned until now.
- I have had enough of chasing moving goalposts here. This also comes when I have been feeling fed up after a shitstorm created elsewhere on wiki by a serial snark-thrower's latest pops at me.
- I could have saved myself a huge bundle of work and bureaucracy by simply running a quick-and-dirty AWB job months ago. That would have been much much easier for me, and it would also have saved hours of work for the editors who have been manually removing the tags, and it would have had more accurate results than the manual work.
- If the bot as tested doesn't fit whatever new criteria someone wants to apply, then please just decline it so I can stop wasting time on it.
- Best wishes from a very disillusioned BrownHairedGirl (talk) • (contribs) 00:15, 3 November 2021 (UTC)[reply]
- In fairness, while you did provide the regex, it's hard for me to be familiar with every type of weird syntax introduced across the encyclopaedia (GIGO or otherwise), which is why trials are helpful. But those three diffs still come across to me as reasonable assessments of being bare URLs by whichever editor tagged them as such, and there being a valid problem of references-as-URL on those articles, which is why I have pause in approving this. I'm also not sure what to suggest as a task amendment to handle these cases, because as you mention there are just too many cases.
- As I say, I'd appreciate a second opinion from another BAG member on it, and also if another BAG feels there is no problem then I'm happy for them to just approve this BRFA.
- (tag for the list: {{BAGAssistanceNeeded}}) ProcrastinatingReader (talk) 00:28, 3 November 2021 (UTC)[reply]
- @ProcrastinatingReader: the proposal didn't just include the regex. It also clearly stated it would ignore (i.e. treat as "not bare") a URL
wrapped in a {{cite}} template
... and your examples are in a {{cite}} template. - Furthermore, you are completely wrong to say that
those three diffs still come across to me as reasonable assessments of being bare URLs by whichever editor tagged them as such
. I can assert that wrongness with absolute certainty, because in each case I was the editor who added those {{Cleanup bare URLs}}: [12], [13], [14]. I did so using AWB, having selected pages with a regex similar to that listed here, so those refs you noted formed no part whatsoever of the reason for tagging. In each of these three examples, the tag was added because at that time the page contained refs which matched that regex ... and in each case the bare URL ref which caused me to add the tag was later expanded by Citation bot: [15], [16], [17]. - Before making assumptions about why the tags were added, you really should have looked at the diff of when they were added.
- That cycle of tagging and subsequent cleanup is what led me to want to do this tag removal job. I am very disappointed that little regard is shown for my expertise of months of work on this, so that I repeatedly find myself having to write lengthy explanations of what seems to me to be very simple points which are overlooked by those with less experience. WP:NOTBURO is core policy, but the amount of time I have had to devote to this bureaucracy is frustrating and depressing. BrownHairedGirl (talk) • (contribs) 01:49, 3 November 2021 (UTC)[reply]
- @ProcrastinatingReader: the proposal didn't just include the regex. It also clearly stated it would ignore (i.e. treat as "not bare") a URL
- Thanks for the review, @ProcrastinatingReader. I'll take those points in order, but first please note the first line of WP:Bare URLs:
- @Headbomb, Thanks for confirming that #2 & #3 are not bare URLs. As to #1, I will repeat as a bullet point the reply above which I gave to ProcrastinatingReader:
- the IMBD ref in [21] does not fit the technical definition at WP:Bare URLs: a ref which displays no other info about the linked page. The most common situation where a tag makes the ref "not bare" is a tagged dead link (e.g.
<ref>http://example.com/foobar {{dead link}}</ref>
), and in that case I think it is right to treat it as "not bare", because the only available extra info is that is dead.
In this case, the extra info is that this source should not be used, so again I think it is right to treat it as "not bare", because the fix needed is to find a better source not to fill this ref.
- the IMBD ref in [21] does not fit the technical definition at WP:Bare URLs: a ref which displays no other info about the linked page. The most common situation where a tag makes the ref "not bare" is a tagged dead link (e.g.
- The tag was NOT added because the page contained
<ref>https://www.imdb.com/title/tt0108956/technical?ref_=tt_dt_spec {{better source needed|date=October 2017}}</ref>
. The AWB job which I used to add it did not treat the tagged URL as a bare URL. so that was no part of the reason for tagging. - The tag was added by me[22] because the page contained the Bare URL
<ref>https://www.amazon.co.uk/Then-There-Were-Giants-DVD/dp/B00007JGED/ref=sr_1_7?s=dvd&ie=UTF8&qid=1415637415&sr=1-7&keywords=bob+hoskins</ref>
. - That bare URL was filled by citation bot 4 months later[23], so the reason for tagging has been resolved. And since it has been resolved, the tag should be removed.
- In this case, you want me to add a load of extra complexity to the bot, in order to ensure that the page remains tagged and categorised as having bare URLs which need cleaning up, even though the actual fix needed is NOT to fill that bare URL, but to replace it with a ref to a reliable source.
- Please explain how editors are helped by having the page misleadingly tagged in that way? BrownHairedGirl (talk) • (contribs) 04:37, 3 November 2021 (UTC)[reply]
- In the interest of getting this moving, would you be happy to modify your regex to skip the first one (e.g. strip templates from ref tags pre-check)? That way [24] will be skipped. Hopefully that will mean this can be approved, and the automated task can deal with the bulk of the cleanup, and the rest can be done through semi-automated means. ProcrastinatingReader (talk) 13:16, 11 November 2021 (UTC)[reply]
- @ProcrastinatingReader: I checked this page for a week after my last comment, and gave up when there was no reply after 7 days. I saw your comment only just now, when I dropped in to see if there was any progress. A ping would have avoided a month's delay.
- The issue is a little more complex than it may appear, because the {{dead link}} should be inside the ref tags, but the example above
<ref>https://www.imdb.com/title/tt0108956/technical?ref_=tt_dt_spec {{better source needed|date=October 2017}}</ref>
is an error: {{better source needed}} should be placed after the</ref>
tag. There are about a dozen similar tags which should be placed after the</ref>
, but may erroneously be placed inside the tag (e.g. {{Failed verification}}, {{Unreliable source?}}, {{Promotional source}}, {{COI source}}, {{Obsolete source}}, {{Irrelevant citation}}, {{Self-published inline}}, {{Unreliable fringe source}}). In each the problem is not that the ref is bare; the problem is that the ref should not be there. - As above, I think it that when counting tags inside the
<ref></ref>
, it is by far the best to ignore a bare ref marked as a {{dead link}}, because this is all about filling bare links, and a dead link cannot be filled. - So the bot would need to check which tag was there. That requires a specific check for each variant of each of dozens of misplaced inline tags.
- Doing that checks for umpteeen variants of each of a dozen misplaced tags seems to me to add far too much complexity to what is after all a very simple task, which does not in any way alter the encyclopedic content or references or metadata. This is solely about removing a redundant cleanup banner.
- As the code currently stands, the bot works fine except in the edge case of some misplaced cleanup tags, involving about 0.5% of the pages to be edited. And in those edge cases, the significant problem with the ref concerned is not that the ref is bare: the core issue is the tagged ref is a bad source. A bad source will still be a bad source even if the ref is filled, so the fact that it is bare seems to be of little relevance.
- So, I'm sorry, but no. I won't add yet more layers of complexity to cope with misplaced cleanup tags because in my view those refs with misplaced tags are a) very rare, and b) already adequately tagged. I am wholly unpersuaded that that the requested change is actually helpful, and I think that adding a whole further layer of complexity to the regexes risks causing problems of its own.
- I ran the bot code in pre-parse mode before saving this post, and found that here are currently 1,444 articles with a superfluous {{Cleanup bare URLs}} tag. It would be great to just be able to get on with removing them. BrownHairedGirl (talk) • (contribs) 03:01, 12 December 2021 (UTC)[reply]
- In the interest of getting this moving, would you be happy to modify your regex to skip the first one (e.g. strip templates from ref tags pre-check)? That way [24] will be skipped. Hopefully that will mean this can be approved, and the automated task can deal with the bulk of the cleanup, and the rest can be done through semi-automated means. ProcrastinatingReader (talk) 13:16, 11 November 2021 (UTC)[reply]
Other views sought
editThis BRFA seems to have run into the sands, and I would appreciate some feedback from other BAG members.
The disagreement comes down to the bot trial's handling of this edit[25] to World War II: When Lions Roared.
The page had been tagged[26] in May with {{Cleanup bare URLs}} because of a bare URL ref to Amazon. In October, that ref was filled in by this edit[27] by Citation bot.
By the time of the bot's trial run, there was no remaining completely bare URL ref, so the bot removed[28] the {{Cleanup bare URLs}} tag.
However, there was one ref which @ProcrastinatingReader and @Headbomb argue is not bare, so the {{Cleanup bare URLs}} tag should not have been removed: <ref>https://www.imdb.com/title/tt0108956/technical?ref_=tt_dt_spec {{better source needed|date=October 2017}}</ref>
This is in part a GIGO issue: {{better source needed}} should be placed after the </ref>
tag. This placing of it inside the <ref>...</ref>
tag is an input error.
The view of ProcrastinatingReader & Headbomb seems to be that the bot should ignore the existence of the misplaced tag, so it should count <ref>https://www.imdb.com/title/tt0108956/technical?ref_=tt_dt_spec {{better source needed|date=October 2017}}</ref>
as a bare URL, and therefore not remove the {{Cleanup bare URLs}} tag.
I disagree, for several reasons:
- that IMDB ref is already adequately tagged to note the core problem, viz. that it is a bad source. The remedy for that is to use a better source ... and it would therefore be unhelpful to tag it as bare. It is even more inappropriate to retain a big "bare URL" banner at the top of the page, for only one URL whose bareness is at best only a secondary problem. We should not be inviting editors to "please fill this ref before it is removed as inappropriate".
- The same applies to the about a dozen similar cleanup tags which might be misplaced inside
<ref>...</ref>
: e.g. {{Failed verification}}, {{Unreliable source?}}, {{Promotional source}}, {{COI source}}, {{Obsolete source}}, {{Irrelevant citation}}, {{Self-published inline}}, {{Unreliable fringe source}}). In each the problem is not that the ref is bare; the problem is that the ref should not be there. - Checking for misplaced tags in that bad-ref family would hugely complicate the regex, increasing the risk of error. A regex to accommodate all these templates and their many aliases would amount to several lines of regex soup.
- Even if others are not fully persuaded that the tag removal was appropriate in this case, I hope that they will agree is that it is worst a marginal issue, one where there is a a reasonable case for removing it.
- This issue arose in only one of the 50 pages in the trial, so it is rare.
- This bot is not altering the encyclopedic content of the article, nor the refs or metadata. All it is doing is removing a cleanup notice, and if it removes a tag from an occasional article where another editor might perhaps have kept the tag, that will in no way degrade the content of the articles.
- Meanwhile, over 1,400 articles still have this tag when it should have been removed. That actively impede cleanup, by leading editors to pages which don't need refs filled. For example I used https://petscan.wmflabs.org/?psid=20904751 to find Ireland-related articles with bare URLs to fill, but I gave up after only 4 pages because 3 of the first 4 pages still had tags after the refs had been filled by a bot. The encyclopedia will be improved by removing these tags, allowing editors to get on with the cleanup.
Please can we just get on with this? In task such as this, excessive attention to rare and marginal case of tag removal is a real enemy of improvig the 'pedia. --BrownHairedGirl (talk) • (contribs) 01:33, 16 December 2021 (UTC)[reply]
- URLs don't cease to be bare because you disagree they should be there.
<ref>https://www.imdb.com/title/tt0108956/technical?ref_=tt_dt_spec {{better source needed|date=October 2017}}</ref>
IS a bare url. I will not approve a bot, i.e. dumb-as-a-brick-no-context-mindless-automaton, to remove valid cleanup tags because you do not personally agree the source should be present in the first place. I do not think you'll find any BAG member that will approve such a task either. The scope is to remove no-longer relevant bare URL tags, not remove unreliable sources. Headbomb {t · c · p · b} 01:52, 16 December 2021 (UTC)[reply]- @Headbomb: on the contrary, the
dumb-as-a-brick-no-context-mindless-automaton
(your phrase) is Headbomb's insistence that a ref which shouldn't be there at all needs a big banner at the top of the page to say that it should be filled in before deletion. That banner is a completely inappropriate response to the issues on that page. - Your statement that this is about my view (
you do not personally agree the source should be present in the first place
) is demonstrably false. I did not add thebetter source needed
tag; it was added in this October 2017 edit[29] by User:Rfl0216. - Do you really want to argue that a WP:USERGENERATED website should not have been tagged as
better source needed
? - Or you do you really truly believe that a ref to an unreliable source which has been inline-tagged as such also needs a big top-of-the page banner saying that it is bare? Really really really? --BrownHairedGirl (talk) • (contribs) 02:13, 16 December 2021 (UTC)[reply]
- I'm not saying it shouldn't have been tagged with 'better source needed', I'm saying it's still a bare url making the removal of the 'this article has bare urls' tag inappropriate. Headbomb {t · c · p · b} 02:45, 16 December 2021 (UTC)[reply]
- @Headbomb: thank you for dropping that absurd claim that an IMBD ref being correctly tagged by someone else as unsuitable was some sort of weird personal quirk of mine. However, it seems to me that you are still taking a robotic approach which wholly misses the purpose of this exercise.
- Per the nutshell of WP:CLEANUPTAG, tags are used "to inform readers and editors of specific problems with articles or sections". So this is about how best to solve problems. These tags are not some sort of attempt at perfect scientific classification of all the flaws on a page.
- The guidance at WP:CLEANUPTAG is very helpful:
- "Don't insert tags that are similar or redundant".
The bare URLs tag is redundant when the ref should be removed. - "If an article has many problems, tag only the highest priority issues".
The fact that this IMDB URL is bare is wholly secondary to the fact that it shouldn't be there at all. The priority is to remove the ref ... and its bareness doesn't deserve a mention at all, let alone being given top billing in a banner at the top of the page.
- "Don't insert tags that are similar or redundant".
- So if we follow the guidance, that {{Cleanup bare URLs}} was removed correctly.
- Do you really want to argue that the guidance would support its retention?
- The purpose of {{Cleanup bare URLs}} is very simple: to inform readers and editors that a bare URL ref needs to be filled. But on that page there is no bare URL which needs to be filled; there is a bare URL which needs to be removed.
- Why do you want to waste the time and energy of editors who cleanup bate URLs by drawing their attention to a page which does NOT have a bare URL to be filled? BrownHairedGirl (talk) • (contribs) 03:14, 16 December 2021 (UTC)[reply]
- I'll flip the question around, why do you insist on including this tiny minority of articles in the scope of your bot, when two BAG members independently told you they were problematic. I will not approve this task as is, and I doubt any other BAG member will approve it as well, short of having an RFC where the community deems it acceptable for bots to remove bare url tags when there are still bare urls in the article. Headbomb {t · c · p · b} 03:20, 16 December 2021 (UTC)[reply]
- @Headbomb: the answer to that question is very clearly answered above. But I will repeat:
- because although the removal of those tags is a GIGO quirk, it is a quirk which will always be appropriate, because per the WP:CLEANUPTAG the {{Cleanup bare URLs}} banner gives undue priority to a secondary issue.
Or in simple language, because it is deeply absurd to invite editors to fill a ref which should be removed, and which has already been tagged for removal. - because progamming the bot to accommodate the guideline-denying demands of two BAG members in respect of this minority issue would add a lot of complexity. That would waste my time, reduce transparency, increase the risk of error .. and all to retain a tag which should not be there.
- because although the removal of those tags is a GIGO quirk, it is a quirk which will always be appropriate, because per the WP:CLEANUPTAG the {{Cleanup bare URLs}} banner gives undue priority to a secondary issue.
- As to your demand for an RFC, that is also absurd. Why on earth do you want an RFC to determine whether to follow existing guidance?
- I'm sorry to say this, Headbomb, but at this stage your stance is starting to look like perverse obstructionism. Demanding an RFC on whether it is appropriate to remove a banner "fill this bare URL" tag for a ref which should be removed? Really really really?
- I am trying to fil bare URLs, and through various methods I have in the last five months filled all the bare URLs in well over 100,000 articles, and filled some of the URLs in many tens of thousands more articles. Removing redundant cleanup tags will assist my work and that of other editors.
- So what on earth are you trying to achieve by making a stand in favour of inviting editors to fill a ref which should be removed? Is this about something other than the issue at hand? BrownHairedGirl (talk) • (contribs) 03:49, 16 December 2021 (UTC)[reply]
- @Headbomb: the answer to that question is very clearly answered above. But I will repeat:
- I'll flip the question around, why do you insist on including this tiny minority of articles in the scope of your bot, when two BAG members independently told you they were problematic. I will not approve this task as is, and I doubt any other BAG member will approve it as well, short of having an RFC where the community deems it acceptable for bots to remove bare url tags when there are still bare urls in the article. Headbomb {t · c · p · b} 03:20, 16 December 2021 (UTC)[reply]
- I'm not saying it shouldn't have been tagged with 'better source needed', I'm saying it's still a bare url making the removal of the 'this article has bare urls' tag inappropriate. Headbomb {t · c · p · b} 02:45, 16 December 2021 (UTC)[reply]
- @Headbomb: on the contrary, the
Denied. There is a lots of potential for good bot work to be done here, but this task cannot be approved as is. This come from both from the lack of demonstrated consensus for a bot to remove valid {{Cleanup bare URLs}}, to the lack of willingness of the operator to limit the scope of the bot to obviously non-controversial edits (over several months of the BRFA being open), and the general WP:BATTLEGROUND mentality on display here. The task can be resubmitted in a new BRFA when and if these concerns have been addressed, either through an RFC establishing the community supports the bot-removal of valid {{Cleanup bare URLs}} tags when the bare urls are potentially problematic, or a modification of the bot task's scope to avoid removal of {{Cleanup bare URLs}} tags when bare urls remain in the article. Headbomb {t · c · p · b} 08:10, 16 December 2021 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.