I am proposing that Wikipedia have a set of guidelines for alphabetization and collation. Here is a permanent link to a preliminary discussion of the topic: User talk:Noetica - Wikipedia, the free encyclopedia [section 4: "Alphabetization (given names, surnames, domestic name order, thorn)"].
Initial discussion with Noetica
|
I would like to alphabetize the entries in each of the subsections of Esperantist#Lists of famous Esperantists, but I am unsure of whether to decide the order according to given names or surnames. Also, Kálmán Kalocsay is in Hungarian name order Kalocsay Kálmán. Also, Þórbergur Þórðarson begins with the letter thorn. I am unsure of how to alphabetize those two Esperantists' names. I consulted the following pages but did not find an answer to any of my questions.
What do you advise me to do, and which page(s) (if any) has/have the answers?
-- Wavelength (talk) 00:49, 10 January 2009 (UTC)
- You have raised good questions, Wavelength. I find these matters hardly addressed at all in WP's guidelines, even though there is a great deal of attention paid to lists of various sorts, and we even have featured lists (like featured article). That omission needs to be addressed systematically.
- The best article for alphabetical order is Collation. See the whole, but especially the section Collation#Alphabetical_order, where some of your specific concerns are dealt with. I have also checked New Hart's Rules, and after some reflection I would answer your questions, and some other possible ones, specifically like this (bearing in mind that your list keeps the conventional English order of elements within each name):
- Order by surname, regardless of where the surname occurs among the elements of a name.
- Use the conventional English adaptation in the order of elements, which sometimes matches the original language's order (Mao Zedong) and sometimes alters it (Béla Bartók).
- Use the most common standard English transliteration or variant where foreign characters occur. (I have just now made redirects from Thorbergur Thortharson and Thortharson to Þórbergur Þórðarson, by the way. And I advise a move to Thorbergur Thortharson.)
- Generally ignore de, von, van and the like in determining alphabetical order, unless they are fixed to the name without spaces (as in Degas, Vanderbilt, d'Alembert, l'Anglais) or are conventionally treated as an essential part of the surname (as in McDonnell, O'Connor). French le and la, often capitalised in French names, are considered in alphabetising (so Delacroix precedes La Croix). When a prefix is naturalised in English (as in De Quincey, inconsistently spelt with de or De at our article; and Walter de la Mare, name of an English poet), alphabetisation should begin at that prefix.
- Treat Mc as if it were spelt Mac.
- Use other conventions that might be laid out at Collation.
- Allow for conventional exceptions (such as Charles de Gaulle, alphabetised on de; mentioned specifically at CMOS).
- So:
Karen Attwood Étienne d'Angers Annette Davidson Charles de Gaulle Walter John de la Mare Thomas De Quincey Ernő Dohnányi Antoine de Gascogne Julien Offray de La Mettrie Yves La Roche Jean de La Rochelle Jean Le Maingre Jean-Marie Le Pen Craig McCulloch Avril MacIntyre Mao Zedong John Mountford Thor Rasmussen Thorbergur Thortharson
- I hope that helps. If you want more, let me know.
- –⊥¡ɐɔıʇǝoNoetica!T– 02:47, 10 January 2009 (UTC)
- Thank you very much for your answers. I suppose that I am ready to do the alphabetization now, with the edit summary containing a permanent link to this talk page, and with the text "section 4". (I consider it to be likely that some editor in the future will see the new orderings, and will change some of them to more "correct" orderings.) I would like to introduce at WT:MOS essentially the same message that I have here (my first message in this section), possibly with a link to this discussion. However, I want to respect your wishes not to participate at WP:MOS or WT:MOS. Also, I am unsure about what might constitute participation by proxy, and what your thoughts are about that. Therefore, I am awaiting your comments on those matters before I proceed with the alphabetization, or with a possible discussion at WT:MOS.
- -- Wavelength (talk) 17:31, 10 January 2009 (UTC)
- That's all fine, Wavelength. If you want to link this discussion at WT:MOS go right ahead. Or you could put the text of it in a navbox and paste it directly onto the page there:
Initial discussion with Noetica
|
RAW TEXT OF THE DISCUSSION HERE
|
- Something like that. I think navboxes should be used a lot more. They certainly can keep things orderly. Don't hesitate to come back here for more technical discussion as needed. I have a few resources to consult, and the topic interests me.
- –⊥¡ɐɔıʇǝoNoetica!T– 20:26, 10 January 2009 (UTC)
- I've refined and corrected things a little in my post above.–⊥¡ɐɔıʇǝoNoetica!T– 22:53, 10 January 2009 (UTC)
- I have done the alphabetization of Esperantist#Lists of famous Esperantists. Ba Jin (listed at Esperantist#Writers) is a pseudonym, which I alphabetized at Ba. Pope John Paul II (listed at Esperantist#Others) is a titled name, which I alphabetized at John. This reminds me of Cardinal, which is used as a middle name/title. It also reminds me of Esquire, which is mentioned last in a name (or maybe I should say "mentioned after a name").
- Some telephone directories have all Mc and Mac (and maybe M' ) names in a section between the L section and the M section. Also, Mackenzie (with a lowercase k) could be analyzed as being in the M section, rather than in the section for Mc and Mac. Several Mac names have two forms which differ only as to the capitalization of the next letter.
- In my previous work on Wikipedia, I have listed items in ASCII-code order, with numerals before letters. If numerals are ordered as the words they represent, then there is ambiguity with 1492, which could be read as "one thousand four hundred ninety-two" or as "fourteen (hundred) nineteen-two", and likewise with 2009. See User:Wavelength/Articles started, sections 2 to 7.
- Recently, when I added M.C. Mehta v. Union of India (Oleum Gas Leak Case) to List of environmental lawsuits, I left the order as I had arranged it before, but I noticed another problem: the new entry differed from another one (M. C. Mehta v. Kamal Nath) in the spacing of the initials. Perhaps one is right and one is wrong, according to a guideline somewhere on Wikipedia.
- (All of this is giving me images of crazy quilting.)
- -- Wavelength (talk) 07:38, 13 January 2009 (UTC)
- Yes, I can understand your experiencing the crazy-quilting effect. I have edited the lists on the page myself. I do urge a move of Þórbergur Þórðarson to Thorbergur Thortharson; and even without that move, Thorbergur Thortharson would be much better for standard English usage, as in these lists. Such an adaptation is quite normal. We don't refer to Thor Heyerdahl as "Þór", or whatever the original form would be! I have also fixed some punctuation, capitalisation, and the like. The Esperanto word Internacio is best translated as International (SOED, "international": [B. n'] 3 (I-.) Any of various socialist organizations founded for the worldwide promotion of socialism or Communism; spec. = First International, Second International, Third International, Fourth International below. Also, a member of any of these organizations. L19.).
- One entry was an error, due to confusion with an almost exact namesake. I removed it (see edit summaries). There are articles for several Russians with that same surname, as opposed to first given name and also surname; and while there is a disambiguation page there is not, so far, a DAB tag at the top of every affected page.
- Language and languages were not designed for strictly rational collation such as alphabetising. We do the best we can, in an imperfect universe. I think we have it sorted out well enough this time. The larger matter of making WP guidelines to deal adequately with alphabetising is separate and more problematic.
- –⊥¡ɐɔıʇǝoNoetica!T– 00:56, 14 January 2009 (UTC)
|
Here is a permanent link to a subsequent discussion of the topic: Wikipedia talk:Manual of Style - Wikipedia, the free encyclopedia [section 42: "Alphabetization and collation"].
Second discussion on Manual of Style talk page
|
Recently, I made some editing decisions involving alphabetization in the article Esperantist, and I discovered that Wikipedia has few or no guidelines for this process, which is probably important enough and comprehensive enough to deserve its own subpage. Here is a permanent link to a discussion of the topic: User talk:Noetica - Wikipedia, the free encyclopedia [section 4: "Alphabetization (given names, surnames, domestic name order, thorn)"].
-- Wavelength (talk) 00:09, 25 January 2009 (UTC)
- As for Icelandic names, my impresson by taking a look on Iceland, Icelandic name, Icelandic language etc. is that most articles about people with Þ or Ð in their name are titled that way. And the redirect Thortharson is quite pointless: Icelanders aren't usually referred to by their patronymic alone (e.g. Björk, not "Guðmundsdóttir"); so a redirect from Thorbergur might be more useful. I won't insist on this as I'm not an Icelander, but if you want to do a large-scale renaming of articles with þorns and eðs in their names I suggest you ask the opinion of people at WP:ICELAND first. (More generally, articles should be at the name the person is most commonly referred to as by in English, even if this leads to inconsistences as Strauss rather than Strauß but Schrödinger rather than Schroedinger; as a rule of thumb, modern names tend to be translitterated less often than old ones.)
- As for surnames beginning with prepositions, at least in Italy they are considered part of the surname (and in more than 90% of them they start with a capital, mine being one of the few exceptions), i.e. nobody would alphabetize De Felice under F, or even de André under A, nor would anyone ever refer to them as Felice and André, either. So conventionally treated as an essential part of the surname is a good advice, but it should provide at least one example containing a space, such as De Felice. The point is, alphabetize under the surname the person is most commonly referred to, e.g. Ludwig von Beethoven under B because he's usually referred to as Beethoven, but Vincent Van Gogh under V as he's usually referred to as Van Gogh; spacing and capitals should be irrelevant to this.
- As for numbers, I think they should be sorted by value (e.g. 10 BC before AD 10 before 100 before 1492 before 2009), it would be awkward to have to spell out each number mentally in order to locate 1492 in a list of years. -- Army1987 – Deeds, not words. 02:40, 25 January 2009 (UTC)
To add to Noetica's nice list above, St. and Mt. should be alphabetized as Saint and Mount, right? Reywas92Talk 19:08, 25 January 2009 (UTC)
From the archives, here are 23 related discussions, of which seven of the most relevant are in boldface italics.
-- Wavelength (talk) 23:10, 25 January 2009 (UTC)
Wikipedia has the article Collation, which discusses alphabetization.
One definition of alphabetize is "to express by or furnish with an alphabet" [2], hence, there is Alphabetization.org - Easily fighting illiteracy!
Besides those two examples, five of the leading results of my Google search for "alphabetization" are as follows.
-- Wavelength (talk) 04:19, 26 January 2009 (UTC)
From the five external links listed above, I have learned that there are two major methods of alphabetization: word-by-word alphabetization and letter-by-letter alphabetization. In the former method (my preference of the two), word in print precedes wording, whereas, in the latter method, wording precedes word in print.
Besides that "major" distinction, methods of alphabetization differ in various other respects, involving: abbreviations (such as St. and Mt.), surnames (with Mc, Mac, and M' ; and with da, de, di, do, du, van, von, and so forth); name order (of given name and surname), capitalization (A to Z before a to z, or A before a to Z before z), punctuation, numerals, and other features.
At this moment, I am inclined to favor a system of collation based on the very thorough outline at Martin Tulic, Book indexing - About indexing (with its list of linked pages), and supplemented by the following.
-- Wavelength (talk) 23:50, 27 January 2009 (UTC)
I would favor the word-by-word method also, as humans tend to think in word units rather than letter units. The Martin Tulic resource looks like a good reference. As for the diacritics, as a mostly-English cataloger, I have no opinion, although the use of redirects on WP is a life-saving procedure for instances with spelling variants. My 2¢. Pegship (talk) 20:05, 28 January 2009 (UTC)
- I agree with what has been said so far, namely that word by word is best as is using the commonly found form. I also think that some names (Walter de la Mare) might be better off alphabetized in natural language order instead of an inverted form that is difficult for users to remember. If you wanted to have the inverted form you could use redirects. This is especially true of medieval names (e.g. Willam of Occam).--FeanorStar7 (talk) 00:32, 29 January 2009 (UTC)
- Word by word, each abbreviation should be spelled out when alphabetizing, no opinion on diacritics. Nowimnthing (talk) 13:44, 29 January 2009 (UTC)
The standard alphabetization methodology in the Anglo-American library industry is 'word by word', otherwise known as 'nothing before something'. I favour this on the basis of my own professional experience in the information industry. The American Library Association Code (Rule 6), follows this methodology, and further notes 'abbreviated words should be filed as if they were spelled out in full, with one exception, that is, the abbreviation Mrs. St. is therefore filed as if it were spelled Saint, and Mc... as Mac'. --Taiwan boi (talk) 14:11, 29 January 2009 (UTC)
- I adhere to the ALA's word by word rule (Rule 1) in ALA Rules for Filing Catalog Cards. Rule 6 covers abbreviations. Here's a link to a preview of the book:http://books.google.com/books?id=mCg-ZNG2llUC&dq=american+library+association+alphabetization+rules&printsec=frontcover&source=bl&ots=4_QnGRcepx&sig=VVGIQmIRFvK6uL2xzYKRXRP131c&hl=en&sa=X&oi=book_result&resnum=2&ct=result#PPP1,M1
Kmzundel (talk) 17:08, 29 January 2009 (UTC)
- I also vote for word by word, for NOT putting numbers alphabetically as if they were spelled out (e.g. 14 under "f") but rather in numeric or chronological (BCE before AD) order as appropriate, and for treating abbreviations as though they were the full word (e.g. "St." would be alpha'ed as "Saint"). As for the perplexing questions of Mac and Mc (do we put them at the front of the M's? do we pretend Mc is actually spelt Mac?), I vote that we not do either as it does nothing but confuse the average person. I mean really, who's going to think to look for "McDonald" between "MacDaniel" and "MacElhenie" ?? The original purpose of this was to simplify locating a name when the reader couldn't be sure whether it was spelt Mac or Mc, but in these days of ctrl-f and search, it seems an unnecessary and artificial refinement. I vote we simply alpha those exactly as spelt. So for example: ... Mabon... MacDuff... Maddox... Mbutu... McDuff... Merrill... and so on. --Bookgrrl holler/lookee here 00:50, 16 February 2009 (UTC)
Noetica’s suggested protocol, and other editors, may have alluded to the following issue, but the examples don’t address it explicitly. I’m concerned about (usually French) names starting with Le and La. Some Le s and La s are separated from the next part of the name – Le Boucher, La Roch, etc. Others are not – Leboucher, Laroche, etc. Currently, all the names that have a space after La/Le appear earlier on in a category, and the names without spaces appear later. Hence, Maurice Leroux could be at quite some distance from Maurice Le Roux. In this particular case, these turned out to be the same person, so they have to be merged (I have no idea which is the correct spelling). This would have been picked up much more quickly if they’d appeared together in the first place. My intuitive way of ordering names in the real world does not just focus on the first element of a multi-part surname, but ignores spaces utterly and considers all the letters as if they were contiguous. I know that WP has decided to use only the first element for sorting purposes, but that has always seemed an odd decision to me. It feels like a decision made by a computer whizkid who's all tied up with primary and secondary sorting keys. That's good in theory, except that that's just not the way names work. We don’t think that the principal part of Maurice Le Roux’s surname is merely "Le", with some less important letters appended merely to distinguish him from Pierre Le Roy. No, Roux and Roy are, if anything, more important than Le, so they deserve to be fully taken into account as integral parts of name. There should be no "primary and secondary sorting key" idea with surnames. The whole surname is the primary - and sole - sorting key. The only real issue is deciding what exactly is the surname - and cases like Charles De Gaulle and Vincent van Gogh are classic examples. Is it "De Gaulle" and "van Gogh", or merely "Gaulle" and "Gogh"? But that's a completely separate issue. Once we decide that Vincent was Herr "van Gogh" and not Herr "Gogh", then he would be listed as a V name, after a putative Vangoff and before a Van Goit. See the discussion here that brought me to this page. -- JackofOz (talk) 21:46, 8 February 2009 (UTC)
- Most indexing systems I use regularly ignore spaces, so Le Roux and Leroux would be found together under LER. I find this easy to remember (no need to remember an exhaustive list of strings to ignore or spacing conventions) and leads to a fairly logical ordering. I've also noticed a growing convention not to spell out abbreviations, which I find also improves accessibility. Knepflerle (talk) 11:03, 9 February 2009 (UTC)
- This is a very odd discussion, since in France "de Gaulle" would be under "g" and "le Roux" under "r". −Woodstone (talk) 11:13, 9 February 2009 (UTC)
- Of course that is true - in French. It's asking rather a lot of our readers to know which particles are ignored in languages they may not have any acquaintance with. Our readers also shouldn't have to know what language a name stems from in order to find it in an index. It's a lot easier to remember the simple rule that we alphabetise every letter, rather than try and remember a list of exceptions on top of that. Knepflerle (talk) 15:33, 9 February 2009 (UTC)
|
-- Wavelength (talk) 22:16, 31 January 2009 (UTC)
[I have updated the permanent link and the archived discussion. -- Wavelength (talk) 02:58, 20 February 2009 (UTC)]
|