User talk:Uziel302/typo.js

Latest comment: 2 years ago by Certes in topic Further $ and \b edits

Date format errors edit

@Uziel302: This script is tagging typos with {{Typo help inline|date=2019}}. This is incorrect date formatting, and is causing an edit by AnomieBOT every time the script tags a template. Would it be possible to change the script to output {{Typo help inline|date=June 2024}} instead to fix this problem? * Pppery * it has begun... 22:40, 24 October 2019 (UTC)Reply

* Pppery * fixed. Uziel302 (talk) 06:20, 25 October 2019 (UTC)Reply

Passage unavailable edit

@Uziel302: I think I've got rid of 90% of the "Passage unavailable" messages by setting articleName to the h2's first anchor's text rather than its title, which can be blank. This makes things a lot faster, as the page isn't constantly reloading! Diff (between two of my own efforts, because I don't have a version with the "Passage unavailable" fix but without the \b and $ fixes). Certes (talk) 01:58, 5 August 2021 (UTC)Reply

Certes, I implemented your fix, but still saw passage unavailable when going top down. I offered to go bottom up to prevent this issue. Can you check if the issue is fixed on the main version? What was the scenario you fixed? Uziel302 (talk) 10:48, 5 August 2021 (UTC)Reply
Yes, you still need to go bottom up; you will still see "passage unavailable" when going top down. The scenario I've fixed is "passage unavailable" when going bottom up, which almost never occurs with the new version. Certes (talk) 11:44, 5 August 2021 (UTC)Reply

Further $ and \b edits edit

I've extended the $ and \b checks to all buttons, and added a negative lookahead for an existing typo tag in "check template". Diff, including the above "Passage unavailable" change, here. The only theoretical downside I can foresee is that text which contains $$, $&, $`, $' or $< may get mangled as these combinations are special when replacing a regexp. The more likely $1 etc. should be OK as there are no capture groups.[1] We could double up $ signs in the replacement text first if the extra complication is worthwhile for these rare strings. The "aartists" bug remains but should not trigger often: it only occurs when an omission of the first letter of a line (not just a word) has already been fixed. It could be solved by prefixing the context with "\b" when it consists of a single word, but again is probably too rare to justify the complication. Certes (talk) 08:50, 5 August 2021 (UTC)Reply

Certes, just read the other discussion and I understand the problem better now. My original approach was to rely on giving enough context so it won't replace other words, but you are right that we should refrain from replacing part of words. Can you expand the solution to all required functions and test? I will copy the code when it's ready. I don't have enough free time to work on it myself. Thanks, Uziel302 (talk) 10:51, 5 August 2021 (UTC)Reply
I've expanded it to all required functions, including checking that there is no existing typo tag for "check template". Certes (talk) 11:46, 5 August 2021 (UTC)Reply
Certes, I copied your version to my script, let me know if there is additional change you want to enter. Uziel302 (talk) 07:47, 20 August 2021 (UTC)Reply
Thanks – I'm now using the standard version which works well. The only other change I'd make is cosmetic: reduce the spacing between the article title and the Replace etc. buttons, so it's more obvious that they belong together and we can fit more on a screen. I'm not sure whether everyone wants that, but I've gone ahead and changed it in User:Certes/typo.js. Certes (talk) 13:19, 20 August 2021 (UTC)Reply

Multiple spaces edit

Typos aren't fixed where the article has multiple spaces shortly before: example. The typo page has a single space, which doesn't match. This also happens for tabs and other creative spacing. We could fix this by changing single spaces in the search pattern to \s+. However, that would require a capture group to preserve the spacing, which in turn means defending against $1, $123, etc. used as monetary amounts in the replacement text, making this a bigger job than it might seem. I'm confident about getting this working, but I'm also concerned about making a script I didn't create less understandable, so I'll pause for feedback before implementing. Certes (talk) 09:21, 5 August 2021 (UTC)Reply

Certes, the right solution is to keep original spaces in the original script that creates the lists. It can be improved in other ways too, maybe by making arrays of the flags etc. If you want, I can share with you the project in toolforge where I run the script. I just don't have enough time to work on it. Thanks a lot. Uziel302 (talk) 11:05, 5 August 2021 (UTC)Reply
Yes, that's a good solution. I have a ToolForge login but am not a member of GitHub or any other off-wiki repository. Certes (talk) 11:47, 5 August 2021 (UTC)Reply
I'm not a Python coder but, looking at the GitHub listing, we may want something like:
  • line 2½: new import re
  • line 54: replace objects = line.split() by splits = re.split(r"(\s+)", line); objects = splits[0::2]; spacing = splits[1::2] + [" "]
  • lines 134 and 139: replace f.write(cont+' ') by f.write(cont+spacing[index+x])
I don't have the environment to test that, and someone who knows the language can probably find better ways to write it. Certes (talk) 00:03, 7 August 2021 (UTC)Reply
Certes, I understand how it fixes the issue where the double space is at the end of the context string, but if it's in the middle, the spacing won't be added. I think I need to just cut the line by the chars of the word and get some chars from either side. Uziel302 (talk) 08:29, 7 August 2021 (UTC)Reply
If I've understood correctly, line 134 runs in a loop once per word leading up to the typo. The suggested change will write the spacing that re.split() parsed from after that word into splits[0, 2, 4...], rather than replacing it by a single space. 139 works similarly for words after the typo. As you say, cutting the line by the chars of the word would be another way to achieve the same result, as long as it finds the correct instance of the word. Certes (talk) 10:23, 7 August 2021 (UTC)Reply
Certes, uploaded a fix based on your solution, running it now on latest dump. Uziel302 (talk) 12:35, 7 August 2021 (UTC)Reply
Certes, I uploaded lists after the change, can you check if everything is ok? Uziel302 (talk) 07:31, 20 August 2021 (UTC)Reply
That seems to work. In list 1, examples include multiple spaces preserved in Jessie Duarte (old version, before the typo was fixed on 7 August) and tabs preserved in Drexel 4041. Certes (talk) 12:04, 20 August 2021 (UTC)Reply