User talk:Cmglee/extract lang.py

Some comments can break the script edit

Hello Cmglee! I've stumbled upon a bug in this script. Say, the below SVG code (simplified testcase) is in a file named test.svg:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100%" height="100%" viewBox="0 0 100 100">
  <switch>
    <g id="text_en" systemLanguage="en">
      <text y="1em">English</text>
      <text y="2em">text</text>
    </g><!-- some comment -->
    <g id="text_de" systemLanguage="de">
      <text y="1em">German</text>
      <text y="2em">text</text>
    </g><!-- another comment -->
    <g id="text_def">
      <text y="1em">Default</text>
      <text y="2em">text</text>
    </g><!-- one more comment -->
  </switch>
</svg>

Then, when I run extract_lang.py file.svg de, I get:

Traceback (most recent call last):
  File "/usr/local/bin/extract_lang.py", line 26, in <module>
    svg_out = re.sub(r'(<\s*switch[^>]*>)(.*?)(\s*<\s*/\s*switch[^>]*>)',
  File "/usr/local/Cellar/python@3.9/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/local/bin/extract_lang.py", line 27, in <lambda>
    lambda matchs:extract_lang(matchs.group(2), lang), svg_in, flags=re.I | re.DOTALL)
  File "/usr/local/bin/extract_lang.py", line 18, in extract_lang
    return re_lang.sub('', svg_langs[lang] if lang in svg_langs else svg_langs[None])
KeyError: None

(I'm running macOS, if it matters)

If I delete <!-- some comment --> from the file, the script works fine.

In general, a comment right before the tag that contains systemLanguage="lang_to_export" breaks the script. Hope this can be fixed. DmitTrix (talk) 20:01, 13 November 2020 (UTC)Reply

@Cmglee: pinging, just in case you don't monitor this page… DmitTrix (talk) 20:42, 16 November 2020 (UTC)Reply

  • @DmitTrix: Thanks very much for your interest in my script and your feedback.
I hadn't considered SVG comments, so have modified my script on the parent page to strip out comments. Hope that's OK.
Can you please try again? Thanks, cmɢʟeeτaʟκ 22:12, 16 November 2020 (UTC)Reply
  • Hi! Thanks a lot for looking into this. Sorry, I'm a bit busy IRL these days; will test as soon as I manage to get back to it. DmitTrix (talk) 11:19, 18 November 2020 (UTC)Reply
  • @Cmglee: Sorry for not getting back to you on this for so long. I've re-tested it on a few files that were making problems, and the latest version of the script worked fine. Thanks for fixing! DmitTrix (talk) 11:06, 30 November 2020 (UTC)Reply
  • My pleasure, DmitTrix: glad it works. By the way, I'd have preferred if the relevant comment was retained, but it's hard to tell whether the relevant comment comes before or after the extracted section. Easiest was to just remove all comments. Cheers, cmɢʟeeτaʟκ 18:23, 30 November 2020 (UTC)Reply