Wikipedia talk:Manual of Style/Archive 29

Latest comment: 18 years ago by Edinborgarstefan in topic Ð and Þ
Archive 25 Archive 27 Archive 28 Archive 29 Archive 30 Archive 31 Archive 35


Ð and Þ

I think there's a pretty clear consensus for using Ð and Þ among people editing articles on Icelandic people, places and things. Here are just a few examples:

Þjórsá Skeiðará Sigurður Eggerz Tryggvi Þórhallsson Magnús Guðmundsson Davíð Oddsson Þorsteinn Pálsson Steingrímur Steinþórsson Björn Þórðarson Jón Þorláksson Guðbrandur Vigfússon Þingvellir Hafnarfjörður Garðabær Hveragerði Siglufjörður Ólafsfjörður Seyðisfjörður Lóðurr Þorsteinn Erlingsson Völundarkviða Sigurður Nordal Fóstbroeðra saga Heiðrún Auðumbla Eikþyrnir

Please do not start moving these around. Such an action would be fought tooth and nail by Icelandic Wikipedians, me included. Why forbid ð but allow æ? Is it really completely clear that æ is a variant of a letter in the English alphabet? How about œ? If those are obviously 'ae' and 'oe' (and I don't think they are) then why isn't ß obviously 'sz' (and I don't think it is)? I'm removing the ß/Þ/Ð dictum from the page. I think more time is needed to discuss the matter before a consensus can be declared. And even if a consensus was reached for doing away with Ð (and I don't think that will happen) we would still have to discuss how to transliterate it. TH? DH? D? There hasn't even been any discussion on that so it certainly doesn't belong in the MOS. And even if a consensus to do away with ß is reached (which might happen) it will not immediately follow that Ð and Þ should be dropped too. - Haukurth 20:12, 1 September 2005 (UTC)

I agree, it doesn't make sense not to give actual spellings and doesn't serve our readers well to only give anglicised spellings. This also applies to Faroese names, eg Suðuroy, Eiði, Borðoy. Worldtraveller 10:35, 7 October 2005 (UTC)
Here's a complete list
Albert Guðmundsson, Alþingishúsið, Arnar Viðarsson, Arnavatnsheiði, Arnór Guðjohnsen, Atlakviða, Auð, Auðumbla, Áramótaskaupið, Árnafjørður, Ásatrúarmaður, Barðastrandarsýsla, Barðaströnd, Barðsneshorn, Bessastaðir, Bítið Fast í Vítið, Björk Guðmundsdóttir & Tríó Guðmundar Ingólfssonar, Björn Þórðarson, Borðoy, Borgarfjörður, Brattahlíð, Breiðafjörður, Breiðá, Breiðárlón, Breiðdalsvík, Búðardalur, Darraðarljóð, Dauði Baldrs, Davíð Oddsson, Davíð Stefánsson, Diocese of Ðà Lat, Drög að Upprisu, Ecgþeow, Egilsstaðir, Eiði, Eikþyrnir, Eilífr Goðrúnarson, Ellíðavatn, Eyjafjörður, Far… Þinn Veg, Félag íslenskra þjóðernissinna, Fimleikafélag Hafnarfjarðar, Flóki Vilgerðarson, Fóstbroeðra saga, Fréttablaðið, Friðrik Þór Friðriksson, Friðþjófs saga ins frœkna, Froðba, Frostastaðavatn, Fuglafjørður, Fugloyarfjørður, Funningsfjørður, Garðabær, Garðaríki, Garðarr Svavarsson, Gleðibankinn, Goðafoss, Gríms saga loðinkinna, Grundarfjörður, Guðbrandur Vigfússon, Guðbrandur Þorláksson, Guðjón Arnar Kristjánsson, Guðlaugur Kristinn Óttarsson, Guðmundur G. Hagalín, Guðni Bergsson, Guðni Þór Sigurjónsson, Guðríður Þorbjarnardóttir, Guðrúnarhvöt, Guðrúnarkviða, Guðrún Katrín Þorbergsdóttir, Gøtueiði, Hafnarfjörður, Hamðismál, Hannes Sigurðsson, Hárbarðsljóð, Heaðolaf, Heiðar Helguson, Heiðrún, Helgi Sigurðsson, Herðubreið, Hermann Hreiðarsson, Hermóðr, Hlaðgunnr, Hliðskjálf, Hlöðskviða, Hreðavatn, Hreðel, Hroðgar, Hróa þáttr heimska, Húsakórið, Hvalfjörður, Hveragerði, Hymiskviða, Hættuleg hljómsveit & glæpakvendið Stella, Hæþcyn, Höðr, Höfuðlausnir, Iðunn, Iður til Fóta, Illuga saga Gríðarfóstra, Indriði Sigurðsson, Ísafjarðarbær, Íslenska Ásatrúarfélagið, Íþróttabandalag Akraness, Íþróttabandalag Vestmannaeyja, Jóhannes Harðarson, Jón Eðvald Vignisson, Jón Hnefill Aðalsteinsson, Jón Sigurðsson, Jón Þorláksson, Jón Þór Birgisson, Kerfissíða:Contributions/Dabbidj, Kirkjubøargarður, Kristján Þór Júlíusson, Kristján Örn Sigurðsson, List of people in the Dictionary of Canadian Biography - Þ, Líðarnøva, Lóðurr, Maður eins og ég - 2002, Magnús Guðmundsson (politician), Magnús Þór Hafsteinsson, Menntaskólinn hraðbraut, Menntaskólinn við Hamrahlíð, Menntaskólinn við Sund, Miðvágur, Mjötviður Mær, Mjötviður til Fóta, Morgunblaðið, Nafnaþulur, Neskaupstaður, Norðragøta, Norðurljós, Norðurmýri, Norður-Þingeyjarsýsla, Norna-Gests þáttr, Nyrðra-Vatnalautavatn, Ongenþeow, Ólafr Þórðarson, Ólafsfjörður, Páll Guðmundson, Reykjahlíð, Reykjavik Excursions Kynnisferðir, Rymskviða, Seðlabanki Íslands, Seyðisfjörður, Siglufjörður, Sigurður Eggerz, Sigurður Kári Kristjánsson, Sigurður Nordal, Sigvatr Þorðarson, Símun av Skarði, Skaði, Skagafjörður Municipality, Skeiðará, Skeiðin, Skiðarima, Skíðblaðnir, Skjaldbreiður, Sparisjóðabanki Íslands, Steingrímur Steinþórsson, Steinunn Sigurðardóttir, Styrbjarnar þáttr Svíakappa, Stöð 2, Suður-Múlasýsla, Suðuroy, Sæmundr fróði, Sörla þáttr, Sørvágsfjørður, Sørvágs Róðrarfelag, Thích Qung Ðc, Torfhildur Þorsteinsdóttir, Tómas Guðmundsson, Tríó Guðmundar Ingólfssonar, Tryggvi Þórhallsson, Tunguliðsá, Tvíhöfði, Vafþrúðnismál, Varmahlíð, Veðrfölnir, Viðoy, Viðrar vel til loftárása, Vínbúð, Von brigði, Völsa þáttr, Völundarkviða, Vörðr, Vørðufelli, Wealhþeow, Wið færstice, Þingvallavatn, Þingvellir, Þjóðólfr of Hvinir, Þjórsá, Þorgerður Katrín Gunnarsdóttir, Þorgnýr the Lawspeaker, Þorlákshöfn, Þorsteinn Erlingsson, Þorsteinn Gylfason, Þorsteinn Pálsson, Þorsteins saga Víkingssonar, Þórbergur Þórðarson, Þórður Óskarsson, Þórisvatn, Þórólfur Árnason, Þórsdrápa, Þórsmörk, Þrír blóðdropar, Þrúðr, Þrymskviða
Ævar Arnfjörð Bjarmason 10:40, 7 October 2005 (UTC)
It's fine to use those non-English characters in "defining instances". They should be used in the introduction or in other situations where it would be appropriate to use the Greek alphabet, the Arabic alphabet, or Chinese characters. Thereafter, the normal English version is appropriate in this, the English Wikipedia. The English alphabet has 26 letters; occasional adding of a diacritical mark to those is a less significant problem, though failure to also write them without the diacritical mark might well and foolishly hide that information from people using various search engines.
Your list, of course, deals primarily with a separate problem; which version to use as the title of the article, as there is only one of them, though there are also an unlimited number of redirects that can be made, so it really doesn't matter that much which one shows up at the top of the page when you read it. But not including, and not using, the standard English spellings is a major problem, a major disrespect for the fact that this is an English Wikipedia. Gene Nygaard 11:15, 7 October 2005 (UTC)
I agree in principle. The problematic thing is that there often isn't any "standard English spelling" and then it seems natural to use the native spelling because everyone usually agree what that is. The paragraph I removed dictated that Ð should be replaced with 'th'. Is that the "standard English spelling"? Hardly - Ð is often replaced with 'dh' or 'd' in an ascii context. I'd actually venture a guess that 'th' is not the most common replacement. So, if Þorgerður Katrín Gunnarsdóttir should be moved to a "standard English spelling" then what would it be? It just doesn't exist. And surely Ð can be seen as a character of the "English alphabet" with a diacritical mark, just like ł in Stanisław Lem. Most readers unfamiliar with ł are likely to pronounce it like an 'l' rather than like a 'w' but the only way to solve that problem is to provide pronunciation information. - Haukur Þorgeirsson 11:52, 7 October 2005 (UTC)

There are two different, both important, factors to consider:

  • Visual representation: what the reader sees and understands
  • Electronic representation: not only what a search engine sees and reports and what the browser's page "Find" function finds, but also things such as the automatic indexing of Wikipedia categories (and the limit of 200 of them displayed on one page is also a factor)

That's in addition to the distinctions between titles and the body of the article, and between first reference in the body and subsequent references. Gene Nygaard 12:08, 7 October 2005 (UTC)

Good points. I personally think it's important that the reader can find the article starting from different forms using Google or other search engines. That's why I always try to list all possible (or at least all reasonably common) anglicizations of Old Norse character names I write articles about. See Lóðurr for an example. Your note about the indexing is also important. We need to follow ISO standards for alphabetical order of post-ascii characters. We don't wan't Édith Piaf or Árni Magnússon to appear after 'z' or something like that. - Haukur Þorgeirsson 12:23, 7 October 2005 (UTC)
As a particular example of this silliness, go try to find Thorbergur Thordarson in the Category:European writer stubs. What happens when you get to the "T" section and don't find him? What if you use that ever-so-handy navigation tool at the top, to get to a particular letter of the English alphabet? Gene Nygaard 13:00, 7 October 2005 (UTC)
That's a good example of the problems that can arise. I'd look for Þórbergur Þórðarson under Þ of course but it's true that someone might have an ascii-ized version and look under 'th'. It would be nice if that person would find the man. Books typically solve this with "redirects" in the index. Maybe we could think about something like that - there might be a special flag to put in a redirect page so it would show up in the index. Many redirects are basically misspellings and don't belong in the index but some of them should. Do I make sense? As an aside it's interesting that the 'transliteration' of Þórbergur Þórðarson's name offered above replaces 'ð' with 'd' rather than 'th' as the paragraph I removed proposed. On a final note some books alphabetize 'þ' as if it were 'th' but I don't think that's a good solution for us. It's not in accordance with the relevant ISO standard (ask User:Everype) and it would cause problems for those (like me) who expect 'þ' to be alphabetized as a separate letter. Maybe it could appear both separately and as if it were 'th' in the index. Do you think that would be helpful? - Haukur Þorgeirsson 16:03, 7 October 2005 (UTC)
Well, Thorbergur Thortharson doesn't have a redirect as Thorbergur Thordarson does, it is not the spelling used in the article, and a Google search for "Thorbergur Thordarson" gets 879 hits compared for a big fat goose egg for "Thorbergur Thortharson". So it would have been pretty silly for me to use that spelling. Just goes to show that we'd be ill-advised to let User:Haukurth handle the transliterations for us. I don't doubt that there are times when the edh should be tranliterated as "th", but there's no good reason to do so here.
You completely missed the point. I am not advocating transliterating Ð as TH - the paragraph I removed is! And according to you this was the consensus you reached here. Look at your own edit and edit summary: [1]. Please read your own edits before saying things like "Just goes to show that we'd be ill-advised to let User:Haukurth handle the transliterations for us." It's your own transliteration method that yields a big fat goose egg. I had nothing to do with it. - Haukur Þorgeirsson 22:33, 7 October 2005 (UTC)
There is no ISO standard revelant to indexing of general encyclopedias in English (or Icelandic, for that matter).
Nor did I mean to imply that. There are, however, international standards for alphabetizations. - Haukur Þorgeirsson 22:33, 7 October 2005 (UTC)
Note also that the Icelandic encyclopedia has Österreich at is:Austurríki, and unlike this Wikipedia, not even a redirect at is:Österreich. Let's show that we have as much sense as the Icelanders show in their own Wikipedia, and use the English spellings in our English Wikipedia. Gene Nygaard 21:55, 7 October 2005 (UTC)
Of course. Which is we we have the English article at Austria. How this is relevant to the discussion eludes me. - Haukur Þorgeirsson 22:33, 7 October 2005 (UTC)
Seydisfjordur is not 'the English spelling' of Seyðisfjörður, it's just a crudely simplified spelling. A redirect from crudely anglicised spellings to proper spellings allows full searchability and gives our readers the actual names of places. Lodz redirects to Łódź, and that Ł is similar to ð in that it looks like an English letter but is a different letter with a different pronunciation. Austria is not comparable - no-one would suggest we have our article at Österreich because all English-speaking peoples know the country as Austria. You could hardly argue, though, that everyone would somehow know Seyðisfjörður as Seydisfjordur. Worldtraveller 22:15, 7 October 2005 (UTC)
As a point of curiosity I chanced upon a small earth globe in a shop here in London yesterday. I glanced at Iceland there and saw that they had Ísafjörður marked. It was written Isafjördhur. - Haukur Þorgeirsson 09:29, 9 October 2005 (UTC)
Of course not. Seydisfjord is the English spelling, sometimes written as two words: Seydis Fjord. We throw out those "-ur" endings, too. Where did this example come from, anyway--was it new in your posting?
Who is this 'we'? Google gives 700 results for your anglicised versions, 256,000 results for the proper spelling. I just chose Seyðisfjörður as a representative examples.
No, the mere existence of a redirect does not "allow full searchability". Not even for those appearances in titles. However, the most obvious thing you have failed to consider is that this doesn't deal only with titles, and involves words which do not even have articles of their own, which might appear either unlinked or as redlinks.
If I search for Lodz, I get the article at Łódź. I just don't see any real argument to pretend to our readers that it's called Lodz, or that Seyðisfjörður is called Seydisfjord.
And you totally missed the point about Austria, which is where it should be in English (with a redirect from Österreich here, something the Icelanders arguing so adamantly here about "correct" spelling don't even have enough sense to put in their own Wikipedia). Gene Nygaard 22:42, 7 October 2005 (UTC)
I don't think you can attack Icelandic users of the English Wikipedia for failings you perceive among Icelandic users of the Icelandic Wikipedia, and accusing them of not having 'enough sense' could be construed as quite offensive. Worldtraveller 23:02, 7 October 2005 (UTC)
Let's expand on that "full searchability" nonsense with the example I gave in an edit summary. Go to the Wikipedia search box (location on your page probably depends on skin you are using) and enter "Thingvellir" and hit "Search" (not "Go", which serves a differnet purpose; that's why ther3e are two buttons). Do you see Geysir in the results list? There is in fact an English Wikipedia redirect from Thingvellir to the article now at Þingvellir, but that is of no utility whatsoever if you are using the Wikipedia search function to find articles which mention "Thingvellir" by entering that in the box. That is most definitely not full searchability. Gene Nygaard 22:55, 7 October 2005 (UTC)
Yes, the Wikipedia search engine is crap. I would, nevertheless, welcome the anglicized/ascii-ized version "Thingvellir" somewhere on the Þingvellir page, solving this problem. - Haukur Þorgeirsson 23:00, 7 October 2005 (UTC)
Please don't pick on the Icelandic Wikipedia. We're a small country. We only have about 3500 articles at the moment. Our article on Austria is just a template/stub. We don't have a lot of redirects yet. And I actually beat this "enough sense" comment of yours by one minute in actually putting a redirect at is:Österreich.
But this thing about "Seydis Fjord" is news to me. It's also news to Google which goes 9350 to 44 for English pages on Seyðisfjörður and "Seydis Fjord" respectively. But I'm not one to go by Google tests. Are you proposing a move to Seydis Fjord? That's a quixotic idea if I ever heard one. And how far does this "throw out those -ur endings" rule of yours go anyhow? Should Þórbergur Þórðarson be moved to Thorberg Thordarson? Come to think of it is there an "English spelling" for my name? Do tell me what it is. "Hauk Thorgeirsson", perhaps? :) - Haukur Þorgeirsson 22:51, 7 October 2005 (UTC)
If you are going to leave it up to me, I hereby dub you "Hawk Thompson". That wouldn't be all out of line with the choices real Scandinavians have made upon emigrating to New Zealand or Canada or the United States. But you actually know English well enough so that you can make your own choice, and most of us will probably defer to you if we write a Wikipedia article (other than your user page) about you. Gene Nygaard 14:50, 9 October 2005 (UTC)
Thank you :) The question actually isn't just academic since I already have emigrated to the United Kingdom since last month for an indetermined period of time. I still go by the name "Haukur Þorgeirsson" - my contract with my employer even has a nice thorn in it. I'll admit that it's somewhat grating that no-one seems to be able to pronounce Haukur and I've thought about adopting some convenient nickname. Is Hawk a normal English name? That might be a good option, it's even the descendant of the same Proto-Germanic word as Haukur (something like *habukaz). I don't know about Thompson, though... - Haukur Þorgeirsson 15:57, 9 October 2005 (UTC)
You forgot the one-word English version. This example, of course, is due in part to the fact that fjord is a word fully assimilated into English; we talk about the fjords of Alaska and New Zealand, as well as those of Iceland and Norway.
Actually fjord is spelt fiord in New Zealand. --Mark from Oz 05:39, 9 October 2005 (UTC)
I wasn't picking on Iceland Wikipedia, merely trying to show you how your perspective changes when the spelling is foreign to you, as the Icelandic spelling is to most English speakers. Gene Nygaard
Throwing out the "-ur" endings in English is no different from the Icelanders slapping on an "-ur" ending to Noreg for is:Noregur, is it? And why shouldn't we use the English Norway, or the Icelandic Noregur for that matter, for a country spelled differently in its own two official Norwegian languages: no:Norge and nn:Noreg? Gene Nygaard 15:21, 9 October 2005 (UTC)
The Icelandic Noregur, the Norwegian Noreg and the Dano-Norwegian Norge are ultimately all descendants of the Old Norse Nórvegr. We haven't slapped the nominative ending onto anything, the mainland Scandinavians have dropped it along the way. But however that may be throwing out the nominative endings of Icelandic names is not standard practice in English anymore. It's true that it used to be done a lot in the 19th century and survives in some fossilized forms. As for the Icelandic Wikipedia it is quite small and hasn't developed its naming conventions very far yet. We don't even have an article on Stalingrad Zürich/Zurich yet so I can't tell you how it will come out :) I do notice that the Icelandic article on Switzerland lists the name as Zürich. The character ü is not used in Icelandic but I personally see no problem in using it for foreign proper names. - Haukur Þorgeirsson 15:47, 9 October 2005 (UTC)
"Someone might have an ascii-ized version and look under 'th'..." The vast majority of our readers will look under th. Most of them, if they don't find it there, will stop at that point. Let's serve the great majority of our readers first, and please let's not require or expect them to know non-English alphabets. CDThieme 20:26, 7 October 2005 (UTC)
What on Earth gives you the idea that "the vast majority" of our readers will do that? Most people who want to read up on a relatively obscure Icelandic person are probably Icelandic themselves, or at least familiar with Icelandic names and Icelandic orthography. Thus they are likely to expect a Þ - and they should because that's what we're using. It's also worth mentioning that Icelanders and Scandinavians frequently transliterate 'Þ' as 'T' as well as 'Th'. - Haukur Þorgeirsson 21:42, 7 October 2005 (UTC)

Wikipedia is written for a broad general audience. The articles may contain difficult or specialized material, but article title should be as accesable as possible. We can't assume readers are Icelandic or know Icelandic or any language but English in the English Wikipedia. Putting a "relatively obscure Icelandic person" in characters most readers don't know is going to ensure that person stays obscure. Let's not do that. Jonathunder 22:50, 7 October 2005 (UTC)

You make it sound like more people will find the article on Þórbergur Þórðarson if it's moved to an ascii-ized spelling. I just don't see how that's supposed to work. People find articles on Wikipedia mostly through links. Or through searching. And people are more likely to search for the actual name, "Þórbergur Þórðarson", than any of several possible ascii-ized versions. And we have redirects. And we already have one ascii-ized version at the page. I just don't see any "obscurity" problem with keeping the article at its current place. - Haukur Þorgeirsson 22:56, 7 October 2005 (UTC)

I think it's important to note that major revisions to the MoS shouldn't be done until after a consensus is made.

This page is a style guide for Wikipedia. The consensus of many editors formed the conventions described here. Wikipedia articles should heed these rules. Feel free to update this page as needed, but please use the discussion page to propose major changes.

I understand why someone would hold a hard stance on this, but please act tactfully before removing entire paragraphs dealing with many languages.glocks out 20:37, 7 October 2005 (UTC)

  • Please quit removing the paragraph until this dispute is settled. Follow the official 3RR policy. glocks out 22:08, 7 October 2005 (UTC)
This paragraph was inserted only last week in this edit: [2]. I don't see any consensus for it on this talk page or even much discussion of it. - Haukur Þorgeirsson 22:14, 7 October 2005 (UTC)
Sorry, I didn't realize that was why it was being deleted. There has been much discussion about this topic, not Icelandic specifically, but along this debate. German eszet discussion is in the archives now. There is also a talk on this here. glocks out 23:07, 7 October 2005 (UTC)
Thank you. Your advice is sound. - Haukur Þorgeirsson 23:16, 7 October 2005 (UTC)
The German eszet discussion should not have been archived, because it is still part of an ongoing discussion. -- Mark from Oz 05:39, 9 October 2005 (UTC)
  • Yes. The eszet discussion was archived prematurely. Bring it back, please. --Tysto 21:51, 12 October 2005 (UTC)
User:Ævar Arnfjörð Bjarmason: Thank you for the long list of articles demonstrating that English Wikipedia has been invaded by Vikings. 99% of English speakers would have no idea how to pronounce those words because they are not written in English. These characters belong in brackets after the English transliteration, like other foreign spelling guides. I'm especially shocked to find that not only are non-English characters used regularly in Iceland-related article titles, but the authors don't even bother to provide pronunciation guides! --Tysto 21:51, 12 October 2005 (UTC)
I would support the inclusion of pronunciation information (IPA + sound files) for every one of those articles. I would also support the inclusion of ascii versions in the lead for them. But please don't miss the point that it's largely the <1% of English speakers which does have some clue on how to pronounce these names which will be interested in the articles to begin with. I will not support transliteration of proper names from languages written in the Latin alphabet. - Haukur Þorgeirsson 22:17, 12 October 2005 (UTC)
The list demonstrates that there is interest on the English Wikipedia to provide articles about Iceland. Most of these articles could be expanded and one of the things many of them need is a pronouciation guide or something similar. This should make us strive to do better, provide more information, not to distort the information we already have. And anyway, if we took out the latin alphabet characters (Þ and Ð) you don't like, even less people would be able to pronounce the words. Compare Ó Siochfhradha (there is a pronounciation guide in the article (so don't look :) and I think that's commendable). However if we forget about the Ó would you say that this is "written in English"? The second word is string of letters from the 26 letter alphabet you are so fond of. Can you seriously say that this is pronounceable to "99% of English speakers"? The aim of getting rid of Ð and Þ can not be to make titles more pronounceable unless you also want to change Siochfhradha into something quite funny looking. Edinborgarstefan 22:37, 12 October 2005 (UTC)

Unicode collation charts

It seems that things would be better if the category lists sorted using the Unicode collation algorithm.[click "Latin"] Á is sorted with A, Ð comes after D, and Þ comes after Z. Anyone know if this is a planned feature? Michael Z. 2005-10-8 15:39 Z

Forget the diffrences between foreign languages as these things are decided by national committes there are likly do be differences between UK (or GB as ISO calls it!) and US collation algorithm!

Mm-hm. I guess "likely to be differences" means we don't know whether there are any, so the point is moot. Implementing the Unicode collation algorithm:
  • Avoids having to pick a national standard
  • Is hopefully already implemented in software
  • Accommodates all writing systems
Michael Z. 2005-10-9 16:36 Z
What you are referring to is something that is called a default multilanguage European sorting, or something along those lines, isn't it? This isn't a multilanguage encyclopedia, however; it is an English-language encyclopedia. So English-language sorting is appropriate; and as the commenator pointed out above, there may be several different such standards in various countries using the English language, even if they share the same 26-letter alphabet as a basis for that sorting. Gene Nygaard 19:01, 9 October 2005 (UTC)
It's just called the Unicode Collation Algorithm, and it is intended to be applied to multi-lingual text. Is there an international English-language collation algorithm that takes into account letters such as Á, Ð, and Þ? If not, then using the Unicode algorithm would certainly be better than the current state where Á comes after Z in category listings.
The Unicode people also seem to be documenting the differences between national and platform-specific collation systems in their Common Locale Data Repository, but it seems to be quite preliminary. Michael Z. 2005-10-9 21:53 Z
The whole answer to the sort question seems to have been a camel instead of the horse the committees wanted. If on a POSIX compliant system/language, (with the LOCALE set to GB or US) one specifies the range "a-z" for a sort then the range of letters returned depends on whether one sorts forwards or backwards because in a forward sort it is defined as "Aa-Zz" but backwards sort it is defined as "aA-zZ" and as one does not normally have control over the algorythm used: "Z" will be included if the sort is forwards but not if it is backwards! Philip Baird Shearer 11:25, 9 October 2005 (UTC)
Why does this matter to us? On my Mac, the UNIX shell's reverse sort for file names (ls -lr) is exactly the reverse of the regular sort, even when file names include "Z", "Zz" and "Þ". The Finder appears to sort using the Unicode collation algorithm, which appears much more sensible. Apple has implemented a more sensible system in their interface, and so should Wikipedia. Michael Z. 2005-10-9 16:36 Z

You miss understood what I wrote. It is not the way with "ls" works it is the way that the POSIX standard was implemented collation algorithms. Each country has a committee which decides the issues, so potentially there can be differences between the US and other English speaking countries. As these POSIX standards reflect ISO standards, it is possible that in future collation algorithms in different English speaking countries will not be the same when implementing the collation in other character types. But to give you a practical example on your UNIX box. Create a directory. cd to that dir:

  • touch C D ; LC_COLLATE=en_US sh -c 'echo [a-z]' ; rm -f C D
  • touch C D ; LC_COLLATE=C sh -c 'echo [a-z]' ; rm -f C D

and these

  • touch a b c d z; LC_COLLATE=en_GB echo [A-Z]; rm -f a b c d z
  • touch A B C D Z; LC_COLLATE=en_GB echo [a-z]; rm -f A B C D Z

It may be that your UNIX uses a forward or backward sort the results of the last two tests will tell you which :-( --Philip Baird Shearer 17:12, 9 October 2005 (UTC)

Those examples just echo the string "[a-z]", etc., on my machine, but I think I get the idea. Still, I don't see why we would be mandated to use any particular national variety of POSIX sorting algorithm (although I suppose a user's preference setting could choose their preferred variety). Why not implement the Unicode collation charts to sort category listings on Wikipedia? Michael Z. 2005-10-9 18:36 Z