User talk:Opencooper/showKanji.js

Latest comment: 2 years ago by Opencooper in topic Strange bug

Bug: don't show on jawiki edit

@Opencooper: I'm using this userscript in my global.js, since I'm active in other language Wikipedias. Would it be possible to set an exception in the code so that it doesn't run on jawiki? It's kinda redundant. ~nmaia d 13:25, 6 May 2019 (UTC)Reply

@NMaia:   Done Thanks for bringing it to my attention. I didn't even consider crosswiki usage when I wrote this (or people using other skins for that matter). Let me know if there are any other pain points I can address. Opencooper (talk) 02:59, 7 May 2019 (UTC)Reply
Awesome, thank you! Seems to be working fine :) ~nmaia d 05:22, 7 May 2019 (UTC)Reply

Furigana doesn't work well with numbers edit

The furigana acts a little weird when numbers are in the title. See 2019–20 Wuhan coronavirus outbreak ~nmaia d 23:51, 30 January 2020 (UTC)Reply

@NMaia: Funny, I was just on that article yesterday and noticed the issue as well. The specific problem in this case seems to be that the title uses a hyphen (「2019年-2020」) while the hiragana uses a horizontal bar (「きゅうねんにせん」). If they both used the same dash, the script would have been smart enough to include that when it tries to extract the reading.
You're not wrong that the script struggles with numbers. It doesn't cope well when there are long titles or they contain many different symbols (I track a bunch of edge cases in the dev version's comments).
My philosophy with the script has been to conservative, that is, to undercapture readings rather than overcapture. Too much special handling of specific cases causes the logic to get very complicated and be difficult to reconcile with each other.
That isn't to say I don't try to be a little clever: I match interpuncts (「・」) when the title contains Latin; I try to match the title even with spaces in between any character; and I add any non-kanji characters in the title to the hiragana search (the "smarts" I was referring to earlier).
At the end of the day, there's just so much variation in how the jawiki editors will type yomigana, that there will be plenty of places where the script just breaks down. The best bet in that case is to insert the reading in Wikidata, which this script prioritizes. One could also edit the Japanese article, but I don't know their norms or MoS.
Anyway, thanks for reporting. Always feel free to reach out regarding any issues with the script. Not sure if anyone noticed, but added a bunch of improvements a while back like removing redundant kana characters from the furigana. I don't have as much time these days but I'll keep maintaining and improving the script as I can. Opencooper (talk) 04:12, 31 January 2020 (UTC)Reply

Dealing with hyphens edit

@Opencooper: Take a look at Hong Kong national security law and 中華人民共和国香港特別行政区国家安全維持法. Because of the hyphens, the furigana is not displayed correctly. One idea would be to explicitly ignore it. I've seen this mistake other times too, often in articles about Chinese cities, where the し reading of 市 is separated by a hyphen as well. ~nmaia d 02:12, 1 July 2020 (UTC)Reply

@NMaia:   Done Before, I was only looking for a hyphen in the reading if the title itself included it, but I've changed it so hyphens are always searched for. That fixes the case here, as well as the -shi case such as at Benxi. Thanks for bringing it up! Opencooper (talk) 07:09, 1 July 2020 (UTC)Reply

Shogun edit

The page Shogun shows an interesting bug in this script. The source of the confusion is over at d:Q131767, but I'm not entirely sure how would be the best way to fix it. ~nmaia d 04:37, 15 October 2020 (UTC)Reply

@NMaia: Sorry, just saw this. I was grabbing any first "name in kana", when I should have been checking that the kanji matched. Thanks for reporting, it's been fixed now. Opencooper (talk) 00:45, 16 November 2020 (UTC)Reply

Strange bug edit

Check out The Party to Protect the People from NHK. ~nmaia d 14:14, 4 July 2021 (UTC)Reply

@NMaia: Sorry, just saw this. The reading isn't in the lead anymore, but I was able to reproduce it. The script fetched the kanji 古い政党から国民を守る党 and the reading ふるいせいとうからこくみんをまもるとう, and then attempted to simplify the furigana by removing redundant parts. This works backwards by looking for kana bits in the kanji. So first it saw the る in 守る, then を, then から, and finally the い, where it choked. Since it's working backwards, it associated that い with せいとう rather than ふるい. It's a pretty dumb algorithm that doesn't know anything about the actual readings of kanji unfortunately… There isn't much that can be done about this class of error. The script tries to abort if it really messes up simplification but not in this kind of case. If you'd rather not have the cleaning up at all, I can tell you the user CSS you can add. Thanks for reporting. Opencooper (talk) 20:33, 19 July 2021 (UTC)Reply
@Opencooper: Sorry, I should've pinged you the first time. By the way, is there a particular reason why the script works backwards? ~nmaia d 22:50, 19 July 2021 (UTC)Reply
@NMaia: I wrote it that way because I noticed that particles, verb conjugations, and other grammar bits in Japanese come after nouns/kanji. It's just the most natural way to process it, and works well most of the time. Opencooper (talk) 23:25, 19 July 2021 (UTC)Reply