Talk:Z-variant

Latest comment: 2 years ago by MwGamera in topic Confusion?

no more simple "grep 不"

edit

Oh no. Now I will have to grep for two characters instead of one. Say where they keep the master list of them.

Mention if color/colour is like X,Y, or Z variants.

—Preceding unsigned comment added by 210.201.31.246 (talkcontribs) 16:34, 15 March 2006

It's fundamentally impossible to unify Chinese so everyone will be happy. Like everything else in real life text processing, code is going to have to be sensitive to the issues, just like code is sensitive to casing issues. The Unihan file from the Unicode ftp site has some of this information.
There's been enough bad analogies already. Making superficial comparisons to English isn't going to clear anything up; to understand the issue, you'll have to understand some of the concepts behind Chinese writing.--Prosfilaes 22:15, 15 March 2006 (UTC)Reply

x-variant?

edit

We have

The X-axis represents differences in semantics; for example, the Latin capital A (U+0041 A) and the Greek capital alpha (U+0391 Α) are represented by two distinct codepoints in Unicode, and might be termed “X-variants”

But I think that's a bad example. In my opinion, Latin A versus Greek A are Y-variants or Z-variants or no variants at all, distinct only because of the source separation rule. I'm still not sure I understand the term "x-variant" correctly, but I think a better example might be U+00C5 Latin Capital Letter A with Ring Above Å versus U+212B Angstrom Sign Å, or U+03BC Greek Small Letter Mu μ versus U+00B5 Micro Sign µ, or maybe even U+0041 Latin Capital Letter A versus U+0042 Latin Capital Letter B. —Steve Summit (talk) 21:01, 17 May 2006 (UTC)Reply

The fact that every Latin/Greek character set that wasn't trying to cram both in 7 bits has encoded Latin and Greek seperately shows that Greek users feel it's not merely source seperation. Conflating the two brings up too many problems. U+00C5 and U+212B are the same thing in Unicode; they aren't variants of any kind.--Prosfilaes 01:47, 18 May 2006 (UTC)Reply
It's still a bad example. I find it confusing and I already understand the topic from reading many other sources. This example will certainly confuse anybody not already understanding the topic who is searching for clarification here. Unicode itself does not use a confusing example in their definition.
Here are some purely Latin alphabet examples: for the x-axis "a" vs "b", for the y-axis "a" vs "A" or "a" vs "ɑ", and for the z-axis "ş" vs "ș". Not that I would include these as the primary examples in the article since there is no Latin Unification in Unicode and many people will not be familiar with these characters anyway. — Hippietrail (talk) 00:12, 28 January 2009 (UTC)Reply

Confusion?

edit

There might be some confusion as it's not always obvious how to precisely apply Unicode's definitions, sometimes looser definitions like the one on wikt:z-variant might be used, and the Unicode itself isn't perfectly rigorous. But whatever the confusion is about, the section § Confusion seems to only add to it and make it more confusing to the reader. It shows two pairs of Unicode characters neither of which are z-variants of each other by Unicode definitions, and there is nothing in referenced Unihan database that would suggest otherwise. Yet it erroneously claims they are both examples of z-variants.

Code points U+4E0D CJK UNIFIED IDEOGRAPH-4E0D and U+F967 CJK COMPATIBILITY IDEOGRAPH-F967 are compatibility variants, which means they do not differ on Z axis at all; they are identical on all the X-Y-Z axes. They are essentially considered duplicate encodings of the same variant that exist only for round-trip compatibility with KS X 1001 where K0-5D55 (61-53) has a reading and maps to F967 while K0-5C74 (60-84) reads and maps to 4E0D and both have exactly the same reference glyph. But 4E0D has both readings and F967 canonically decomposes to it. In many places it's not even possible to use F967 without it getting converted to 4E0D. On the other hand, because the distinction exists only in Korean, the compatibility character F967 unambiguously refers to the Korean variant while 4E0D unifies all the other variants in addition to this one so it might be displayed slightly differently than F967 when font or language setting other than Korean is used. There is a standardized variation sequence 4E0D FE00 that can be used to select the F967 variant using 4E0D as a base character (your fontage may vary). Calling them font variants is certainly imprecise (not a Unicode term anyway), but you can see where it comes from.

Code points U+5154 CJK UNIFIED IDEOGRAPH-5154 and U+514E CJK UNIFIED IDEOGRAPH-514E are semantic variants which differ on Y-axis—their abstract shapes are not unifiable in the first place, so they cannot be z-variants. This pair is even explicitly mentioned in UAX #38 Unihan as an example of y-variants. The fact that the referenced draft mentioned these two as examples of "zVariant" [sic] shows how confusing it might be, but does it matter given that it never made it to the final version?

Can we get rid of the section? It makes no sense whatsoever in its current shape and I don't see how could it be improved. – MwGamera (talk) 14:45, 18 August 2022 (UTC)Reply