Archive 1Archive 2Archive 3Archive 4Archive 5Archive 10

Category sort key: surnames

I don't know if it's mentioned on some other guideline page, but this page should sake clear that a people article "Firstname Surname" added to a category should use a Sort key of surname, as in [Category:Foo people|Surname, Firstname]. On which note, and again redirect me if this is discussed elsewhere, some related issues:

  • What about prefixes like de la, van der, etc?
  • Should McDonald collate with Macdonald?
  • What about Prince X of Y, or John Smith, fourth duke of Y?

I'm sure Chicago Manual of Style has something to say about all these. Maybe add a link to an explanation of our policy in Template:Fooian people. Perhaps it could vary by category: conceivably Category:Dutch people might demand exclusion of "van" where other categories would not; in which case a notice to that effect is required at the top of the category. jnestorius(talk) 13:46, 12 July 2006 (UTC)

  • I added a link to the guideline section that currently describes how this is done, i.e. Wikipedia:Categorization#Category sorting. Note that that guideline section gives considerable detail on people names. Maybe part of that general guideline could be moved/copied to the "Categorization of people" guideline. Feel free to proceed.
  • Some prior discussion can be found on wikipedia talk:naming conventions (people)#Naming convention for Dutchmen. Wrong place for such discussion, but anyhow, that's where you can find some discussion about it. Regarding the "Dutch" rules: as has been remarked in that discussion, more than half of Belgium speaks Dutch. But in Belgium the "Dutch" rules (which only apply in the Netherlands) are not followed in telephone directories and so forth. So, currently, such foreign-language rules are not imported in English Wikipedia: no idea what CMS says on the issue, but I don't think one should expect that for a Dutch-language name, English-speaking Wikipedians are supposed to know pre-emptively whether a person is coming from the Dutch-speaking part of Belgium or from Holland. Should "Vanden Plas" be sorted differently than "Van Diemen" because one of these names derives from a Belgian name and the other (probably) from the name of a person that lived in Holland before moving to Tasmania? ...I'm supposing that such sophistications elude most English speakers: both are sorted under "V" in Category:Motor vehicle manufacturers of the United Kingdom - and IMHO that should not be different in specific "people" categories, even if the people with such names live currently in Flanders or the Netherlands. --Francis Schonken 15:08, 12 July 2006 (UTC)

Thanks for bringing this up. In Alain de Cadenet, I reordered "Cadenet, Alain de" to "De Cadenet, Alain", after noting similar instances in e.g. Jacqueline du Pré, since it agreed with "my experience", but I had difficulty finding anything to corroborate my inclination. I think you're saying it is different in different languages. Alain de Cadenet is actually pretty much English, even though his name is French. People should be aware the collator will create a "d" heading if you use "de Cadenet, Alain", so it seems important to capitalize the collation information. Also it might be nice if we could arrange to put the one name in multiple places. I.e. put "de Cadenet" under both "D" and "C", but I don't believe that can be done with the current implementation. And I won't vigorously defend it as being a good idea either!--SportWagon 18:10, 12 July 2006 (UTC)

Thanks muchly. After following my watchlist to later read the article, Wikipedia:Categorization#Category_sorting appears to answer all the questions I had when making the above re-ordering. It does seem that, in English at least, the software should collate upper and lower case letters together. I.e. we shouldn't need to specify "De Cadenet" instead of "de Cadenet". But I won't vigorously defend that idea either, but merely present it as a suggestion.--SportWagon 18:34, 12 July 2006 (UTC)

Regarding one of the points you mention: it is possible to make a name appear two (or more) times in a same category, with a different collation. But the technique is rather something of a hack, and that's probably why you won't find much about it in "how-to" guidelines. The trick is to categorise a redirect page, applying a different collation to the category on that page. For instance Leonardo of Pisa is currently a redirect page to Fibonacci. It is possible to add [[Category:Italian mathematicians|Pisa, Leonardo of]] to the redirect page. Another disadvantage is that this "trick" can only be applied if the name of the redirect page is not a misspelling and a generally recognisable name for the topic, and also it shouldn't make the collation senseless or enigmatic to the average user (e.g. [[Category:Dutch stadtholders|Orange, William I of]] wouldn't make much sense on the Willem de Zwijger redirect page).

Regarding collation of upper and lower case letters together... Don't know any more whether this is just a low priority "bug" still needing to be solved or whether there was a particular reason not to do this. Anyway, might be there is still a Bugzilla: report about this - if not, and if you feel like, you could always start one. --Francis Schonken 19:18, 12 July 2006 (UTC)

In order to make the hack work for Alain de Cadenet, I guess I'd need to create an Alain De Cadenet redirect page, then. But, nope, that's not a correct spelling. So, true, it seems only sensible to do for true alternate names. It is interesting that you can put a redirect page into a category.--SportWagon 22:27, 12 July 2006 (UTC)
I believe that it won't work unless you put the category tag on the same line as the redirect tag. If you're going to submit a bug perhaps you could also request that Á, À, Â, Ä, Ã, Ǎ, Ā, Ă, Ą, and Å all collate to A as well, etc., as that *is* how they should be sorted on category pages. --JeffW 03:06, 13 July 2006 (UTC)
But, of course, people can also force that with their sort key. (I.e. "use the same work-around"). But, true, it might be fodder to change it from an annoyance into a bug.--SportWagon 21:18, 13 July 2006 (UTC)

Thanks all. I've moved the people-specific points to this page; further debate about them should take place here. Some points raised are more general and belong at Wikipedia talk:Categorization or even m:Help talk:Category. jnestorius(talk) 18:21, 13 July 2006 (UTC)

The notes on Lasse Åberg and Ötzi the Iceman conclude "preferably only omit the diacritic". I'd like to change the sample sortkey to reflect that. BTW peers are generally sorted as listed in all categories. -- User:Docu

By residence: more demonyms

This follows on from Wikipedia:Categories_for_deletion/Log/2006_July_14#More_demonyms where discussion seems to be winding down. I think there's probably going to be an opportunity to build on a consensus in making the demonym guidelines clearer. If the July 14 big batch of renaming demonym categories is agreed, then we ought to update "Exceptionally, where the commonly used English name for residents of a city is known globally, the category "Foocityers" may also be used to redirect towards People from Foocity. For an example of this, see Category:New Yorkers." I'd suggest replacing it with -

"The category page of People from Foo should normally mention the most commonly used names for residents ("Fooians", or "Fooers") and such names can helpfully be used to redirect towards People from Foo. For an example of this, see Category:New Yorkers. " --Mereda 11:32, 20 July 2006 (UTC)
I disagree with the proposed change to the "exceptionally, ..." wording. There are hundreds if not probably thousands of cities and towns that we categorize by on Wikipedia, and one of the main reasons there is a trend to the "People from X" wording is because some cities/towns don't have denonyms, or there are multiple ones, or we can't verify ones suggested, etc. I do not believe that guidelines should indicate to "normally" mention problematic terms like denonyms. "Exceptionally" mentioning them is appropriate in my view. Kurieeto 12:52, 20 July 2006 (UTC)
Discussion on that big batch of July 14 changes has now been closed, with agreement to rename all from demonyms to "People from X". This leaves us with just the rest of the world to go (eg Category:People by Russian city, Category:People by British city etc etc)! I take the point that my guideline suggestion of "normally" mentioning demonyms goes too far since they aren't universal at all. The feeling I have from the July 14 discussion is that where a demonym is popularly used then it's fair to expect it to be mentioned (and maybe redirected too). So how about saying -
"The category page of People from Foo should mention any commonly used names for residents ("Fooians", or "Fooers"), assuming that common usage is verifiable (eg by Google), and such names can also helpfully be used to redirect towards People from Foo. For an example of this, see Category:New Yorkers." Is that better?? --Mereda 17:11, 24 July 2006 (UTC)
Consensus on CFD seems to be leaning towards having redirects in all cases, which isn't my preference, but that's the breaks for me! If we're already supporting demonyms by redirecting them all, I don't think the term "should mention" is required - Instead I would suggest: "The category page of People from Foo may mention the most commonly used names for residents ("Fooians", or "Fooers"), assuming that common usage is verifiable (eg by Google). Such names may also helpfully be used to redirect towards People from Foo. For an example of this, see Category:New Yorkers." The wording "Such names may also ..." could instead be "Such names should also ...", I'm just wary of encouraging the creation of demonym redirect catgories for every obscure city that Wikipedia will categorize by. Kurieeto 14:57, 30 July 2006 (UTC)
OK, I've made that change to the guidelines now. I see the Canadian demonym discussion has also closed today with redirecting all pre-existing versions. --Mereda 16:10, 1 August 2006 (UTC)

People by group-hating

There are a few categories like this, such as Category:Anti-Semitic people, Category:Anti-French people, Category:Anti-Arab people and so forth. There are also people categorised directly in the "-ism" categories, such as in Category:Anti-Arabism.

I propose that such categorisation of people be removed, and all "Category:Anti-whatever people" categories be deleted (though "Category:Anti-whateverism" be left). —Ashley Y 01:43, 5 August 2006 (UTC)

Best to treat such categories on a case-by-case basis I suppose. Category:Anti-reformation wouldn't be a good idea I suppose, but only because the established term for that movement is Category:Counter Reformation. Whether they need "people" categories (like Category:Spirituali is a category of Counter Reformation people): also best to treat on a case by case basis I suppose.
Here's the full list of categories starting with "anti": [1]
For instance, Category:Anti-war activists makes perfect sense to me. More sense than the three examples you name above. --Francis Schonken 08:27, 5 August 2006 (UTC)
The problem is not the people who are well-documented with historical facts. The problem is with everyone in the grey areas. Rather than remove the categories, they could be made into lists. The lists could have different ways of listing people (self-identified, consensus of historians, identified as such in media reports, labled as such by opponents, etc...) with citations and discussion. This would make the lists verifiable and much more useful and encyclopedic than these categories. The list versions could be linked to every person in their article where appropriate. -- Samuel Wantman 09:51, 5 August 2006 (UTC)
Category:Anti-war activists etc. are different. I only mean to consider "Anti-(group of people) people" where (group of people) might be Jews or Arabs or Muslims or French or whatever. —Ashley Y 20:03, 5 August 2006 (UTC)

articles about deceased persons

moved from Cfd

An unusual listing this one, as I'm proposing the creation of a category. This is Categories for Discussion so I guess here is as good a place as any. I'm not being bold and going ahead with it without discussion as it would affect probably 100,000 articles or more.

We currently have a flat Category:Living people, which contains all known articles on living persons with no subcategories. This is a tremendously useful category for automated/bot work. We don't, however, have a flat category for dead persons (but we do have the multi-level Category:Dead people). So:

  1. Would a flat, contains-all category for dead people (as an additional category) be useful? (I think it would; and the matter has already been raised at Category talk:Dead people).
  2. Would it be worth the trouble of a very long bot run on the existing articles (which I can do) and maintaining it future/ensuring editors use it? (I'm less convinced on this point)
  3. What would such a new category be called?

Your comments please. --kingboyk 15:06, 6 August 2006 (UTC)

  1. No, especially as that has already been discussed and rejected on its Talk page. By year of death seems more reasonable.
--William Allen Simpson 15:29, 6 August 2006 (UTC)
Where was discussed and rejected? Not at the link I provided. Note also that I'm talking about an additional category, not a replacement. --kingboyk 15:52, 6 August 2006 (UTC)
Category talk:Living people#Dead People; Category talk:Living people#Dead people?; and some other parts of that talk page --Francis Schonken 16:02, 6 August 2006 (UTC)
OK, thanks. I again contend that those discussions don't really address my suggestion but rather the status quo. We're also not bound by past decisions. So, whilst I certainly respect opposition and am happy to forget the idea if it's not popular, I'd prefer a fresh debate. Good idea or not? --kingboyk 16:10, 6 August 2006 (UTC)
The idea is not popular. What part of the reactions of your fellow-Wikipedians was not clear on that point? As for me, personally, NO, I DON'T THINK THIS A GOOD IDEA. --Francis Schonken 16:18, 6 August 2006 (UTC)
Chill out man! I've had two responses so far, that's not a community reaction. I was simply, in my previous post, asking for responses to my proposition other than pointing to me old, scrappy, mostly irrelevant discussions on - or around - the issue. As before, if it's an unpopular idea that's absolutely fine. If I didn't want to get community approval first I would simply have gone ahead and done it, wouldn't I? :) --kingboyk 17:42, 6 August 2006 (UTC)

Team sports people - employees, members, officers

Quoting two people above
We have categories for hockey players by team, by nationality, plus other categoriess. It is quite normal for a hockey player to have played for ten or more teams in his career. e.g. Drake Berehowsky is in ten team categories plus three other hockey related categories. If you tried to come up with rules on how to sort these by relevance, it would be a nightmare to maintain. -- JamesTeterenko 18:28, 19 June 2006 (UTC)

I think that would be a mess if other categories end up being sorted in between the team categories. I would think that you'd want to make a rule amongst the hockey editors, or even more generally among editors that work on sports teams if that can be managed, that team categories should come after birth and death date categories and before political affiliation categories and that the team categories be sorted alphabetically (for example). Although I could see an argument for sorting the team categories chronologically by when the player first played for that team. --JeffW 18:56, 19 June 2006 (UTC) end quotation
The categories derived from sports team membership happen to be the worst offenders that I know, because (a) team names are long, perhaps including beloved nicknames, and even then some need disambiguation; (b) sports biographers list every affiliation or every major one defined by the industry (eg, every "major league" affiliation) rather than by an author; (c) baseball authors, at least, prefer to confer special honor on players even in the categories, using not only the team name-nick-disambig (maybe itself long) but also the word "players" in the category names. For example, maybe not yet populated, Philadelphia Athletics (American Association) players & Philadelphia Athletics (American Association) managers --one person may be in both categories.
Team sports organization varies, over time and at once, so categories for players, managers, coaches --not to mention officers-- are not simply employee categories (Google employees) but it is that kind of classification effort.
Categories derived from government offices must be just as bad. I have stayed away. --P64 17:50, 31 August 2006 (UTC)

Classes of categories

Maybe the software can be modified to respect a blank lines in the wikitext, displaying in one long line only those groups of category names whose tags appear together in the wikitext. And, say, alphabetizing only the last group of category names. For example, suppose the wikitext includes

[Category:Irish Tees]
[Category:Gees]
[Category:Bees]

[Category: 1812 deaths]
[Category:1811 births]

and the wikitext also somewhere includes the all-too-familiar tags

{really unlikely}
{bio-stub}

The display would be something like this, where the last of two groups of categories includes the ones generated from editorial tags and is ordered alphabetically.

Irish Tees : Gees : Bees

1811 births : 1812 deaths : Articles so unreliable that no one should rely on them 
until reliable editors do lots of work : Biography stubs

If the software respects any number of category separators (in my example, blank lines in the wikitext), then the hockey biographers and their baseball colleagues would be able to effect something like this using three groups, two separators.

Ice hockey goalies : Olympic gold medalists : People from Toronto
Boston Bruins : Buffalo Sabres : Colorado Avalanche : Dallas Stars : Detroit Red Wings : 
Edmonton Oilers : Hartford Whalers : Minnesota North Stars : Philadelphia Flyers : 
Quebec Nordiques : St. Louis Blues : Tampa Bay Lightning
1966 births : Articles needing blah blah : Articles needing even more than that and
needing it worse : Articles needing something else too : Hockey biography stubs

Here four of the categories displayed in the last group of five are generated from editorial tags located anywhere in the wikitext, and they are displayed together with the last group derived from category tags (in the example, evidently, a single vital date tag). The display order of other groups of category names, and use of separation into groups, would be managed differently in different articles. The hockey project lists all major team affiliations alphabetically in one group of categories (the second of three groups, in the example); the baseball project might do it chronologically without the restriction to major teams; others might select only the most important affiliations without separation from other meaningful categories. --P64 18:34, 31 August 2006 (UTC)

People whose existence is questionable.

Is there a good category for such people? The closest seems to be Category:Mysterious people, but even that still implies that the person actually exists. Category:Fictional characters would be better in a normal encyclopedia, but since WP is anyone-editable, that won't fly for people who think the person existed. There should probably be a category that implies "doubtful existence, but some believe so." Does this exist already? If not, should someone create it, and what would it be called?

(For clarification, if I write a novel about a 4th century Roman Pompous Maximus, the article goes into Fictional characters. But if I bald-facedly insist that Mr. Maximus actually exists based on my own "personal research" or "poetic inspiration" or whatever... what then?) SnowFire 00:25, 17 September 2006 (UTC)

People with disabilities

I would suggest to require to use inclusive language for poeple with disabilities. Such as:

  • People with disabilities, not disabled people
  • People with diabetes, not diabetics

This is standard practice. See, eg:

Not to start a style war, but it's also a ridiculous practice that is fairly new. The idea that "putting the person first" creates more respect is a guess, at best. In the English language, a diabetic is a person with diabetes. That's what it means, nothing more, nothing less. Since the PDF from the University of Nottingham stresses the fact that this applies to a disability that is "negative," would that mean that we should refer to "Nobel Prize winners" over "people who have won a Noble Prize" so as to make them feel better? Should we not say "Catholics" but rather "people who believe in Catholicism" so as not to pigeonhole someone only by their religion? This can just lead to silliness. Of course, sometimes "people who blah" is in fact the better choice, but it should be made on a case-by-case basis by usage patterns, not over hypothetical sensitivity worries.
I will add that I am not some crusty old "the language must never change!" person. I attempt to avoid needlessly gendering language when possible, for instance- usage of "He" as "Someone" clearly implies a male by default, and there are good reasons to avoid that. But if I say "amnesiacs" over "people with amnesia," I am not being insensitive, merely efficient. SnowFire 00:02, 22 October 2006 (UTC)
Ridiculous or not, the term "people with diabetes" is now common practice in all relevant publications written for the general public in North America and elsewhere. For example, it is used throughout on the website of the Canadian Diabetes Association. (The WP article talks about "people living with diabetes"). I suspect that this is the case for diabetes but not necessary for other kinds of disabilities.
The controversy is discussed in the WP article on Person-first terminology.
I believe that WP should follow this practice at least in the case of diabetes.

  Andreas   (T) 02:37, 22 October 2006 (UTC)

Here is another citation:

I claim no special expertise over the specific case of diabetes. If the usage has shifted away from "diabetic," then so be it, though I'm somewhat distrustful of a few position papers- I'm sure that there are going to be people advocating for person-first terminology in quite a few fields, but that doesn't mean that it's common usage.

Perhaps I misread you, but I took your post as interpreting that a style guideline should be laid down favoring the usage of "person with" in general. That I disagree with. Even if I believed that using "person with" actually was more polite, WP is under no particular obligation to be sensitive anyway, just like how honorifics for nobles, saints, and prophets are generally dispensed with. Going down that path lies madness, because once you start trying to please some with a wording favorable to them, you annoy others who wanted a different wording or none at all. SnowFire 06:06, 22 October 2006 (UTC)

Universities

A lot of the subcategories for universities involve people (academics, alumni, Chancellors etc...) but others do not - please see Wikipedia talk:Naming conventions (categories)#University categories for discussion on how to arrange these. Timrollpickering 15:47, 22 October 2006 (UTC)

Schemes

So, we have no consensus on schemes? Let's see if we can build one.

Alphabetical order

  • This makes the most sense to me. "Relevance" is subjective at best, and one person's "relevance" fighting with another person's "relevance" can only lead to edit wars; alphabetical order is, as far as I can determine, the only way to avoid fights. RadioKirk (u|t|c) 00:44, 9 November 2006 (UTC)
    Solution looking for a problem (examples of category-order fights, please?), and instruction creep, too. IMO, it's a stretch to support otherwise-pointless alphabetism on the grounds that sorting by relevance is "subjective." When it comes to biographies (the articles I add cats to most often), I try to start with statistics (years, place they're from, etc...), proceed to education, then to occupations, and wrap up with anything else - and I hope I'm not compelled to change this. Picaroon9288 01:02, 9 November 2006 (UTC)
One example; I've seen others and I'd have to search... RadioKirk (u|t|c) 01:08, 9 November 2006 (UTC)
Your order makes no sense at all to me. What is more important about George W. Bush, the fact that he was born in 19XX, the fact he attended Yale of the fact that he became President of the United States? Obviously the last. Merchbow 15:22, 9 November 2006 (UTC)
The problem with your argument is that it starts and ends with importance, a subjective issue. The only way to remove subjectivity is an established norm, such as Unicode order. As for W., how about "First second-term Republican president to lose Congress mid-term"? ;) RadioKirk (u|t|c) 18:55, 9 November 2006 (UTC)
That is a non-problem. Articles are subjective all the way through, but if one acts in good faith and with neutrality no problems will arise - that's certainly my experience. And if on the off-chance someone does decide to flatter Hitler by listing him as an artist first and a Nazi last, that can be changed just like any other bad faith edit. It really isn't hard to decide what the key characteristics are in most cases, and in any case no-one is suggesting that there is one "correct" version, it is just a matter of coming up with a good helpful order. If someone wants to tweak it a bit afterwards, that isn't a problem. Merchbow 23:30, 9 November 2006 (UTC)
"What is more important about George W. Bush" "Obviously the last." That's obviously your POV, plain and simple. Who defines importance, you? --Kbdank71 22:01, 13 November 2006 (UTC)
That really is ludicrous. You are making the sort of intellectual point that actually looks very silly. Sumahoy 23:10, 21 November 2006 (UTC)

Order of importance

  • Reading the previous comments this issue has been argued to death, and the idea that alphabeticisation is a good idea has been totally demolished in my opinion. It looks messy, it is not user friendly and very few major articles currently have their categories in strict alphabetical order. Merchbow 15:22, 9 November 2006 (UTC)
How is alphabetical not user friendly? --Kbdank71 22:03, 13 November 2006 (UTC)
Because if you don't know what is there you may have to read the whole list to see if there is anything worth navigating to, and then you might be disappointed. The category system is a navigation tool, and people are more likely to want to navigate in some directions than others. We can't be sure what any particular reader will want of course, but we can make reasonable assumptions about prevalent tendencies. Sumahoy 23:17, 21 November 2006 (UTC)
This isn't art, it's an encyclopedia. --Kbdank71 22:03, 13 November 2006 (UTC)
Edited by people, not machines. Some of us trust people to do a better job. Sumahoy 23:27, 21 November 2006 (UTC)
  • Forcing things into alphabetical order just removes the possibility of trying to arrange information in a more logical and informative manner. I'm not sure why User:Kbdank71 feels so strongly that Wikipedia's editors are incapable of doing this - if we can't even arrange a list of categories logically, I'm not sure how we can possibly feel qualified to write encyclopedia articles!! :-) --Stormie 03:01, 14 November 2006 (UTC)
  • I agree with Stormie. Believing in Wikipedia requires one to have some faith that most of the time the volunteer editors will make reasonable judgements. Can --Kbdank71 or anyone else provide any examples of articles where the categories have been sorted into an order that is less helpful than alphabetical, and it is a reasonable supposition that there was an ideological motivation in the selection? I have never seen such a case. There are some problems in the way that people categorize, for example people promoting articles to the top of categories that patently have no business being there, but abuse of category order just is not one of the problems that exist in the real world of Wikipedia. If it becomes a problem it will be relevant to this discussion at the time, but right now it is just a red herring. Sumahoy 23:17, 21 November 2006 (UTC)

Other

None

  • Well-intentioned, but seems like this will add a lot of unproductive bot edits, and we already have too many of those. Besides, there's no way to make it consistent. A physicist who did a little math would have one order; a mathematician who did a little physics would have another. And that's a good thing, isn't it? Chick Bowen 02:48, 9 November 2006 (UTC)
    • Also, I too have not run across fights about this. The example cited above is a page that's been fought over about many things, and so an atmosphere of bellicosity has developed. Chick Bowen 02:50, 9 November 2006 (UTC)
There is one established order: Unicode. RadioKirk (u|t|c) 03:27, 9 November 2006 (UTC)
Sorry, I don't see the connection. That's a technical issue--obviously categories and allpages have to be ordered artificially, so they may as well be ordered alphabetically, and it makes articles easy to find in categories that have many of them. But relatively few articles have more than a dozen categories, and some of those probably shouldn't. Chick Bowen 04:12, 9 November 2006 (UTC)
Forgive me but, speaking of not seeing the connection, you seem to be making a different argument. Unicode isn't just the characters, it's the order thereof. Did I miss something? RadioKirk (u|t|c) 18:49, 9 November 2006 (UTC)
Huh? You and I seem to have lost each other somewhere. In any case, I think the point is made moot by those cited below. Chick Bowen 00:20, 10 November 2006 (UTC)

What is the issue here?

MediaWiki defaults to alphabetical order, unless people add their own sort keys for whatever reason, which may make sense to e.g. sort by a person's last name rather than their first. I wonder what the perceived problem is here, are there categories that are sorted in some other fashion? How and why? >Radiant< 08:57, 9 November 2006 (UTC)

  • Oh, wait a second. By a scheme, you mean the order category names appear at the bottom of an article, no? If this issue bothers you, your best bet would be to contact the devs; it would be almost trivial to add a sort order to this in the MediaWiki software. I do believe alphabetical is best in most cases, and cannot now think of an exception, but I don't know if it's worthwhile to change all the many articles that have the categories in chronological order of addition. >Radiant< 09:01, 9 November 2006 (UTC)

FYI, the issue has been discussed previously (I'd be tempted to add "ad nauseam"), leading, as far as I can remember, to a "we agree to disagree" situation. Anyway, the "sort categories alphabetically" feature was removed from AWB, ask the AWB devs. --Francis Schonken 09:07, 9 November 2006 (UTC)

Here's a ref for the removal of the alphabetising option from AWB, after previous discussion: http://en.wikipedia.org/w/index.php?title=Wikipedia_talk%3ASemi-bots&diff=60806373&oldid=60755434 --Francis Schonken 11:40, 9 November 2006 (UTC)

living people

Something missing from the article is a note that classification of living people is especially problematic and there is a strict policy about it. --Zerotalk 13:52, 23 November 2006 (UTC)

Sorting of surnames with independent prefixes

The sorting guidelines for surnames with independent prefixes appear ill-conceived. Such names, though treated as uniquely Dutch, occur in many Western-European languages. When you look at their alphabetization in other-language wikipedias you’ll notice that there is only one rule: a surname is alphabetized on the first word that is capitalized.

Here's a summary of a little survey:

In German (von, zu, zum, zur), Spanish (de, de la, de los), and Portuguese (da, de, do, dos) are never used to alphabetize. In German and Spanish they are often dropped when calling a person by the last name (but see De Soto, Von Braun, Von Däniken, De la Parra, etc.). In Portuguese that happens rarely (Da Gama etc.). For these languages, names are already pretty consistently sorted without the prefix in the English wikipedia.

In French, some prefixes (de, du, d’) are also never used to alphabetize, while they are usually included when referring to the person by surname only. When “de la”, “de l’”, “l’” and “le” are not capitalized, they are treated similarly, but more often they are written “de La”, “de L’”, “L’” and “Le, in which case they are sorted under the letter L. In the English wikipedia alphabetization is inconsistent.

In Italian (De, Di, Del, Della) are usually capitalized and names are sorted on them. When not capitalized what follows is a qualifier rather than a surname, and people are sorted by their first name.

In Dutch, with its multitude of tussenvoegsels the apparent conflict between the alphabetizations in Belgium and the Netherlands disappears when considering that Belgian surnames have capitalized (or fused) prefixes.

pardon? No idea what you're trying to say here. --Francis Schonken 10:17, 27 December 2006 (UTC)
I'm trying to say that "Belgian surnames have capitalized (or fused) prefixes", but I should have added "usually". I believe you are Belgian, so you know this better than I do, but looking at the List of Belgians I see Henry Van de Velde, Charles De Coster and Jean Claude Van Damme as examples of capitalized prefixes and Raoul Vaneigem, Wim Vandekeybus, Philippe Lafontaine, Fud Leclerc en Filip Dewinter as examples of fused prefixes. Note that in that list, presumably mostly made by Belgians, names with uncapitalized surnames are usually placed where I expect them (e.g. Charles de Broqueville, Jozef-Ernest van Roey, Andrée "Dédée" de Jongh, Charles Jean de la Vallée-Poussin, Adrien de Gerlache, Jan Baptist van Helmont, Victor d'Hondt even), though a few (Herman de Coninck, Hendrik de Man, Jacob van Artevelde (should probably be under the J), Eric van de Poele) appear misplaced. Then again, errors are expected. — Afasmit 21:49, 27 December 2006 (UTC)
Seems a flawed ananlysis to me. The "where I expect them" is even more subjective, and not really usable as principle in Wikipedia. I've given a more comprehensive description of the de facto "lack of rule" below (although not even half as comprehensive as it could be). There's really nothing to go by or to conclude from here, as far as collation of names in categories in English Wikipedia is concerned. --Francis Schonken 15:00, 28 December 2006 (UTC)

In South Africa there does seem to be inconsistency if not confusion (though perhaps not in Afrikaans): the List of South Africans sorts uZibhebhu kaMaphitha and Cetshwayo kaMpande under the 'M', while "de Klerk" is under the “D”.

Sorting by the first capitalized letter seems a very straightforward rule that also follows the "principle of least surprise". It is adhered to by (all?) traditional English language encyclopedias (like the Britannica) but is different than the current wikipedia “policy” (I believe a single person’s edit on Feb 12 2006) to sort on the prefixes. Although many English-language sources online now sort like that, this is probably due to widespread programmer’s laziness and/or ignorance, e.g. only offering one field for entry of the surname, akin to the U.S. custom of allowing only two given names on any official form.

Previous arguments have been made that English-speaking wikipedia users will be lost not finding “Marco van Basten” under the V. Are they similarly confused not finding Vasco da Gama, Robson da Silva, Guy de Maupassant, Victoria de los Ángeles, or Hernando de Soto under the letter D? Do they expect Wernher von Braun or Gustav zu Putlitz under V and Z? And even if they are, wouldn’t the capitalized letter be the second place to look? For the uninitiated there are many surprises, like finding that many East Asian surnames come first or finding pterodactyl under the letter P, but the "principle of least surprise" can not mean “adjusting to the lowest common denominator”.

This issue comes up in other places than categories as well, e.g. in lists of people, and this page may not even be the right one for a guideline. Is there a general alphabetization guideline page? I couldn’t find it. At any rate, I would like to change the guideline unless some serious counterarguments can be made. -- Afasmit 02:17, 27 December 2006 (UTC)

Re. Dutch, I was browsing the Taalunie website to see whether I could find anything. Thus far not really, though I found these two examples on a page explaining Dutch grammar:
  • "(9) Dit museum heeft maar twee Van Goghs." (and not: "Dit museum heeft maar twee Goghs")
  • "(14) Wie leest er nu nog Vondel?" (and not: "Wie leest er nu nog Van den Vondel?")
I'm sure there must be a logical explanation for this difference somewhere (please provide such logical explanation, because I can't) - the only thing I know is that in English I'd sort Vondel as "Vondel, Joost van den", and Van Gogh as "Van Gogh, Vincent". --Francis Schonken 10:52, 27 December 2006 (UTC)
I don't think there is, though dropping of the prefixes in Dutch is probably limited to historical figures. Your examples are similar to Wernher von Braun being "von Braun" and Otto von Bismarck being "Bismarck", Hernando de Soto being "de Soto" and Miguel de Cervantes being "Cervantes". Moreover, prefixes are often inconsistently dropped for a single person, e.g. Anton van Leeuwenhoek variably is called "Van Leeuwenhoek" and "Leeuwenhoek". Since there is no rule nor an automatic way of deciding from the article's title if the prefix is kept when using the surname only, it will be hard to implement your suggestion. — Afasmit 21:49, 27 December 2006 (UTC)
Other example, this time involving French names (note that this time I don't say this is the way I think it should be, I only note that this is the way it is currently done in the category sort keys in English Wikipedia):
I'd really like to know the logic behind this... --Francis Schonken 12:30, 27 December 2006 (UTC)
These are just decisions by different editors.
More examples. These are US people with Dutch or French sounding names:
--Francis Schonken 13:20, 27 December 2006 (UTC)
From the UK:
  • Zoete, Beryl de - note that in the index of a UK-published book I have in front of me (ISBN 0701134097) this person is under "d"
  • Valois, Ninette de (ditto)
  • Wint, Peter de (ditto)
  • Then, surprise, D'Abernon, Edgar Vincent, 1st Viscount
  • and, D'Offay, Anthony
  • Cole, Horace de Vere
Could someone please explain the logic?
"de Vere" apparently was his middlename, which can be anything in English (e.g. in the US often maiden surnames). Deciding where a surname starts, as in Henry Dudley Gresham Leveson-Gower and your example Ludwig Mies van der Rohe, can be really tricky, but the biographer hopefully knows. — Afasmit 21:49, 27 December 2006 (UTC)
Irish/international:
Du Plessis:
The remaining "du Plessis" families living in South-Africa, Australia and the US probably will decide in the future to capitalize Du in their name to just get it over with, following the surrenders of countless other emigrants.
Belgium:
Tricky one:
--Francis Schonken 15:31, 27 December 2006 (UTC)
I'm pleased to see that in your examples most wikipedians also wanted to sort by the first capital in the surname. This supports its use as a wikipedia guideline, though its ease and its use in paper encyclopedias may have been enough support (note that even Vincent van Gogh, perhaps the ultimate "van" name, is under the letter G in the encyclopedia britannica). If a name is mis-capitalized in the title of an article this will eventually be fixed by an alert editor. — Afasmit 21:49, 27 December 2006 (UTC)
Re. "I'm pleased to see that in your examples most wikipedians also wanted to sort by the first capital in the surname." - that remark seems completely clueless, as to what the examples show. The examples don't show that "most wikipedians [want] to sort by the first capital in the surname" (e.g. [[David du Plessis|Du Plessis, David]] and [[Corne Du Plessis|Du Plessis, Corne]] are both included in the list above, as are [[Jean le Rond d'Alembert|Alembert, Jean le Rond d']] and [[Jean-Henri d'Anglebert|D'Anglebert, Jean-Henri]] etc). Maybe, but I'm not even sure about that, the examples show wikipedians put a capital where they think the last name should start (which you're forced to do by the current "category sort key" mechanism the way it is programmed in the MediaWiki software), only Géry van Outryve d'Ydewalle was on error on that software issue? Anyway there are certainly a lot of errors contained in these examples. You're just jumping to conclusions, and try to make a cocktail of orthography and collation rules in English and widely different rules in other languages (which aren't even always correctly applied in Wikipedia where this was intended): I don't think this cocktail of rules would work in Wikipedia. I simply have no idea what the rules usually are in English? Could someone explain?
Francis, you're taking this too personal. No reason to call me clueless also, as such a remark can all too easily backfire upon re-reading the text. First off, you're listing individual decisions (and typos) by wikipedians. One would expect a wide variety of them, even if there were easy-to-find rules and people would be interested in looking them up. You gave 45 examples, 33 of which follow the rule of indexing on the first capitalized letter, 1 is neutral (no prefix in "Cole"), 1 makes no sense (Morne du Plessis), 2 overshoot on dropping the capitalized prefix (Cardinal de La R., Peter De Winter), and 8 index on lower case lettered prefixes. I thought the latter would be more common, as such sorting is done in automatically created indices and websites (I've suggested some reasons already) and because the current guidelines suggest to do so. I was pleased, considering that you could easily have picked 45 cases that contradicted the encyclopedia order and I had expected a higher discrepancy in a random pick.
Nobody suggested a "cocktail of rules". I found there to be one very simple rule that seems to be followed by standard English-language reference works as well as the other language wikipedias that I've checked. Wouldn't it be nice to keep it simple and consistent? — Afasmit 04:13, 28 December 2006 (UTC)
Sorry about the "clueless", shouldn't have said that. I didn't say it about you though, but about the rule you think to be apparent, which I can't see to be apparent. But still, shouldn't have used the word "clueless".
Nonetheless: I don't think you understand: there is no fixed capitalisation of the prefix in Dutch. "Christine van den Wyngaert" is correct in Dutch. The expression "Judge Van den Wyngaert" (without first name) should however always have at least the "V" and the "W" capitalised. In Belgium there is however a more dominant habit of writing "Christine Van den Wyngaert" (I'm not even sure the "Taalunie" endorses this, I couldn't find it on their website, but that's what happens in Belgium), where in the Netherlands the more dominant habit is to write "Christine van den Wyngaert", which is the orthography I was taught in school (in Belgium, but that's some years ago). When names are anglicized, there isn't much of a problem for Belgian names (although English speakers sometimes get a bit overzeleaous and write "Christine Van Den Wyngaert" also capitalising "D"). There is a problem with nobility names though: the orthography I was taught in school was "Géry van Outryve d'Ydewalle"/"Van Outryve d'Ydewalle, Géry", suppressing the former rule (in Belgium, but more or less a "French" rule) that nobility last names were always written with a decapitalised prefix (decapitalised "d'" was a sign of nobility, allowing to see, just by the way it was written that "Victor D'Hondt" was not member of nobility, while "Géry van Outryve d'Ydewalle" was). Now that old rule has somewhat revitalised, but not completely: when "Frank De Winne" got his noble title, in Belgium his name was not "changed" to "Frank de Winne" (although that would have been a correct way to write his name, as well before as after his getting a noble title). In sum, the argument that capitalisation in English can be "derived" from orthography in Dutch is moot, the rule doesn't exist in Dutch.
More importantly, as such, this doesn't have anything to do with collation habits either. These are (for telephone directories for instance) simply different in the Netherlands and in Belgium. Taalunie says even less about it, I suppose, and I didn't learn anything about it in school. In Belgian telephone directories all letters of a last name are always capitalised ("VAN OUTRYVE D'YDEWALLE, Géry" doesn't show whether or not the last name theoretically starts with a decapitalised "v") so again, there is no "rule" regarding capitalisation in Dutch from which collation rules in English could be derived.
The only rule I see is that when a person moves to an English-speaking country his/her name is very often written with a capitalised prefix, at least for the Belgian people. And not so often for people from the Netherlands, which confuses English speakers, while there is no way that "van Basten" (not preceded by a first name) could be correct in English, it isn't even correct in Dutch. But again, that has no relation to collation rules, not even in Dutch: if Van Basten would move to Belgium, his name would be as "VAN BASTEN, Marco" in a Belgian telephone directory. Probably, as with many language-related rules in English, there is no uniform fixed rule in English for collation (e.g. the differences between Britannica, and the index of ISBN 0701134097 I quoted above, both UK publications). If there is no rule, and if certainly the rule you seem to think there is doesn't exist, Wikipedia can choose the rule that is most appropriate for its purposes, taking account of its technical surrounding (MediaWiki software factors). And that is, imho, the rule as it is currently in Wikipedia:Categorization of people#Ordering names in a category. Leaving aside obvious anomalies (van Outryve, Cardinal de La Rochefoucauld) and cases I'm not sure about (Horace de Vere Cole, de Launoit), this rule is followed as satisfactorily as yours currently in Wikipedia. --Francis Schonken 12:06, 28 December 2006 (UTC)
Failing such explanation, I'd keep to the rules as they are currently in the Wikipedia:Categorization of people guideline. That is, always use the last name in full before the given name(s), except where a person is widely referred to by a part of the last name that doesn't include one or more of the prefixes (examples: Beethoven, Vondel, Bismarck, Cervantes,... and nobility where this is usual). Also, always capitalise the first letter of the sequence and de-diacriticise all letters (for software reasons). Maybe the only point that could be clarified is what should happen when a person is both referred to with a full last name and a de-prefixed one (your "Van Leeuwenhoek"/"Leeuwenhoek" example): in that case I'd play it safe and start with the full last name, including the prefix(es).
As for the paper Britannica's rules:
  • It has "Van Buren, Martin" (which was an American with a Dutch sounding name);
  • It apparently uses the Hollandish/French rules also for the Belgian persons, which I don't think an acceptable solution;
  • The rules seem somewhat untenable with current internationalisation: Christine Van den Wyngaert is a Belgian person, appointed by the UN (in New York, where they write her name sometimes "Christine Van Den Wyngaert" - [2]) and working in The Hague (Netherlands), where they would write her name "Christine van den Wyngaert" (which would not be an error). If you want to make a worldwide encyclopedia, like Wikipedia, then, for the English-language version you should make the entries in the lists enclosed in categories thus that they are most easy to find for the largest possible group of English speakers that might visit the encyclopedia. Then I think the current rules do optimise that. And not rules that would sort Dutch-sounding names from Belgium different from Dutch-sounding names from the Netherlands and from French-sounding names from wherever. What average English speaker would know the difference whether "de" followed by "Ghelderode" is French-sounding although "Ghelderode" is a Dutch-like name? I don't even know for myself whether "De Duve" is supposed to be Dutch-sounding or French-sounding... etc. If someone reads a book or a newspaper and encounters the name of a Belgian author indicated as "De Ghelderode" (without first name), then he should be able to find that name in a Wikipedia category of Belgian authors under that name ("De Ghelderode,..."), without needing to fuss about anything. And similarly for the Dutch Football player "Van Basten", in the respective category. And similarly for the Dutch author "Vondel", etc. --Francis Schonken 01:50, 28 December 2006 (UTC)
I agree with you, Francis. In Afasmit's listing above, he forgot:
  • In English, "of" is never used to alphabetize.
It's a lot like the indexing of other things, where in English we ignore the articles (a, an, the) in indexing, but often do index the articles in other languages. For example, Das Boot is more often indexed under D than under B.
There is no reason for it to depend on capitalization; that is never used as a criterion in the sorting of American names, for example. Like Francis said, "not rules that would sort Dutch-sounding names from Belgium different from Dutch-sounding names from the Netherlands and from French-sounding names from wherever". Gene Nygaard 16:29, 30 December 2006 (UTC)

Regarding "of" (English) there's an interesting issue, I maybe should mention to make the distinction clear between "de" in French and "de" in Dutch: I'll use Fibonnaci's name as an example (Fibonacci is a nickname, more formaly his name is Leonardo Pisano):

  • Original: Leonardo Pisano (where "Pisano" is a Medieval Latin genitive case of "Pisa", the Italian city);
  • Contemporary Italian: Leonardo da Pisa;
  • German translation: Leonardo von Pisa;
  • English translation: Leonardo of Pisa;
  • French translation: Leonardo de Pise ("Pise" being the French form of "Pisa");
  • Dutch translation: Leonardo van Pisa.

In other words English "of" translates to French "de", translates to Dutch "van".

The last name "De Ghelderode" thus means something like "of Gelderode" (where Gelderode is a location)

Similarly "d'Udekem", means "of Udekem" (where Udekem is or was a location name)

But names like "De Winter" are a different issue: here "de" (or in its abbreviated form "d'") is the Dutch definite article "de" (which in English translates as "the")

So "De Winter" is something like "The Winter".

If there is a "van" preceding "de" (or "der", "den" which are remnants of cases applied to the Dutch definite article), the expression could be (again liberally) translated to "of the" (and implies there was no French root to the way the name is written), e.g. "van de(n) Wyngaert" is something like "of the Vineyard".

This Dutch "de" would become "le" or "la" (depending on the gender of the word that follows) in French.

"Van de(n/r)" becomes, depending on gender, "du" (contraction of "de"+"le" - can also be abbreviated to "d'" before a vowel) or "de la" in French.

So,

  • (fr) "Le Plessis" → (en) "The Plessis" ("The" somewhat like in "The Hague", in Dutch: "Den Haag")
  • (fr) "du Plessis" → (en) "of the Plessis" (more accurately: "of The Plessis")

I just wrote this down to clarify how unreasonable it would be to expect average English speakers to differentiate between these "de" - "d'" - etc... before starting the search for a name in a list. --Francis Schonken 18:29, 30 December 2006 (UTC)

  • Revived It is a pity this discussion got bogged down so quickly & has ground to a halt. The guideline as it is now just does not give a reasonable statement of the position in English, and is rightly ignored by the vast majority of Wikipedians. It certainly is a complex field, but the key principle to minimize confusion has to be to index under the first capitalized word in the surname. There is a difference between the descendants of emigrants to English-speaking countries, from Van Buren to modern sportsmen, and historical figures who stayed in the countries their names came from - Anthony van Dyck and Steve Van Dyck are indexed differently. Someone tried to categorise Jan van Eyck under V, which is just WRONG, as the index to any history of painting will show. The present text just isn't adequate. Johnbod 14:43, 21 June 2007 (UTC)

Put it another way, is anybody other than User:Francis Schonken opposed to a rewrite of the present paragraphs to clarify that the first capitalized word in the surname is usually the one to index on? Johnbod 15:41, 22 June 2007 (UTC)

That suggestion looks good to me. It's simple to apply, and thus transparent. We may need to make exceptions sooner or later, but at least this way the basic rule will be clarified. I had thought about this before, and sorting on the first capitalised word in the surname is the easiest method which incorporates a reasonable amount of the complexity. --Stemonitis 16:11, 22 June 2007 (UTC)
Continued in new section below [3] Johnbod 23:00, 24 June 2007 (UTC)

"Activists" and "movement"

There's a CFD for Category:Animal rights activists; the members of an animal rights project don't want the subcat for "activists"; a couple of us who have been categorizing and diffusing various people categories (Category:People by occupation" and Category:Activists) think it's important to be consistent. Thoughts from other people-categorizers welcome, since this could set precedent for other movement pages. --lquilter 18:59, 5 January 2007 (UTC)

Ordering and sort-keys

I would like to add several new statements to the guidelines, but thought I should lay them open to debate first:

  1. Capital letters which are not at the beginning of a word should be converted to lower case, so that James BeauSeigneur should sort as "Beauseigneur, James", not "BeauSeigneur, James".
  2. Punctuation (including hyphens and apostrophes but not spaces) should be stripped out of names for sorting purposes, such that Maurie Fa'asavalu sorts as "Faasavalu, Maurie", not "Fa'asavalu, Maurie", and D:Fuse sorts as "Dfuse".
  3. Accented characters should be replaced with their unaccented counterparts: Uldis Bērziņš sorts as "Berzins, Uldis", not "Bērziņš, Uldis".
  4. Disambiguating terms should be omitted: Will Smith (comedian) sorts as "Smith, Will", not "Smith, Will (comedian)" or "Smith (comedian), Will".
  5. All parts of the title should be included except disambiguating terms; no names should be abbreviated or omitted, including middle names, where they form part of the title.
  6. Similarly, the sort key should not include any term which does not appear in the title. For instance, where an article's title is a nickname or stage name, the article should not be sorted by the person's real name. Also, if the title is an bbreviated name such as "Don" or "Chris", the article should be sorted by that name, and not the long version ("Donald", "Christopher").
  7. Suffixes should be placed at the end of the sort key, such that Robert J. Smith II sorts as "Smith, Robert J., II", not "Smith II, Robert J."
  8. Second and subsequent capital letters in a series of capital letters should be converted to lower case, such that RJD2 sorts as "Rjd2".
  9. The prefix "Mc" is generally sorted as "Mac", such that Anna McCurley sorts as "Maccurley, Anna", not "Mccurley, Anna" or "McCurley, Anna".

Any comments? --Stemonitis 13:19, 6 January 2007 (UTC)

Re. #1: Not needed imho. Would this make an essential difference? We try to avoid complexity in guidelines where there's no real issue to address.

Re. #2: Seems useful, apart from the "hyphens" which I wouldn't change (in other words: keep them in the sort key as they are in the page name). Note also:

  • This is not a "people names" exclusivity, so should go in Wikipedia:Categorization#Category sorting;
  • Attention should be drawn, however, that the current "people" sorting rule usually introduces punctuation (comma) where there wasn't one the in the original article name, and also that the comma in, for example, Charles de Secondat, baron de Montesquieu is retained. Plus the comma added that example uses two commas in the category sort key ([[Category:Enlightenment philosophers|Montesquieu, Charles de Secondat, Baron de]]). Note that this is an example currently in the categorization of people guideline. I also don't think that geographic names using a comma in the page name (e.g. London, Ontario) omit that comma when sorting. Maybe the simplest would be to make exception for commas for the "no punctuation" rule. Anyway, if "commas" are a general exception, this rule should move to Wikipedia:Categorization#Category sorting entirely.
  • Also, I see that you kept the period in e.g. "Smith, Robert J., II". This needs some refining if we want to use "remove punctuation" as a more or less general rule.

Re. #3: In fact already included. Since it's not only "people" that may have accents in their name, the issue is discussed at Wikipedia:Categorization#Category sorting. These general rules are linked from Wikipedia:Categorization of people#Ordering names in a category.

Re. #4: Not needed imho. "Smith, Will" and "Smith, Will (comedian)" would generally lead to the same sorting result. Warning against "Smith (comedian), Will" seems a bit like treating our editors in a childish way. WP:BEANS.

Re. #5: Not needed: redundant redundancy.

Re. #6: Not needed imho, self-evident.

Re. #7: Useful. Since you used the generic term "suffix" it also encompasses the "disambiguating terms" of #5 (so that the more complex and in fact contradictory formulation of #5 is certainly not needed separately).

Re. #8: No. This would be an issue for Wikipedia:Categorization#Category sorting anyhow (I mean, consecutive capitals and letters in a name is not a "people" exclusivity). And then I'd oppose it. There's no common sense in it, imho. This would, to name only one of the multiple problems this would create, make the current {{3CC}} template impossible to use, and then after all the instances of where this template is used are manually converted, the sorting order in the Category:Lists of three-character combinations would be exactly the same as it is now. Solution in search of a problem.

Note that for stage names etc using non-standard letters in their name (like for instance DJs but the same could probably be said about wrestlers etc) I do believe sometimes a bit of creativity is needed, but the kind of creativity that is hard to catch in rules:

  • Use of {{DEFAULTSORT:Rjd2}} in the RJD2 seems perfectly OK to me (didn't even know about the existence of {{DEFAULTSORT:...}}...);
  • ?uestlove: [[Category:American hip hop musicians|Questlove]] and the first entry of List of hip hop DJs and producers#0–9 show a different sorting order... I wouldn't know which is more appropriate...
  • [[Category:Hip hop DJs|DJ Quik]] and [[Category:Hip hop DJs|Cam, DJ]] reflect different approaches too (note that both are under "D" in List of hip hop DJs and producers). Also here I wouldn't know what works best in general. Couldn't the DJ-interested people agree on this? Or try to find out which way this is done most often in English? For example, check a few record stores, and see how the CD's are usually sorted?

Re. #9: Don't know about that one for sure: what is most habitual in English? Unless it can be clearly demonstrated that in general in English that is the way names are sorted, I'd reject it as redundant complexity. --Francis Schonken 14:15, 6 January 2007 (UTC)

These examples are all based on real situations that I've come across, so it's too late for WP:BEANS to apply. Of all of them, the internal capitals (1 and 8) is the most important problem, in my opinion. The MediaWiki software sorts all capitals before all lower-case letters, with the result that names like "DeWoskin" would be sorted before "Deac", which is clearly at odds with alphabetical order. The ideal situation would be for sort keys to be in all-capitals, but we've gone too far with the current system to do that. The best approximation is to consistently capitalise the first letter of a word only. Similarly, punctuation tends to come before A-Z/a-z, which also screws up the sorting, though I would perhaps be prepared to relent on the hyphens.
Perhaps there is some redundancy here, but I wanted to explain my ideas fully. I'm open to suggestions for better wordings. If you think it would be clear that "suffix" also includes "disambiguating term" (I wouldn't consider the disambig. a part of the name, but would consider the suffix as such, for instance, so would interpret the two differently), then that's fine. I see, however, no harm in reiterating rules held elsewhere. If it's important to strip out accents (and for sorting in categories, it obviously is), it would be appropriate to have a statement (like my 3), and link it to the appropriate guideline (to prevent real or apparent conflict between the two guidelines).
In response to your comments to 4, 5 and 6, I can only re-iterate that all these mistakes have been made in the past (and in good faith) and will almost certainly be made again. If that could be prevented by the inclusion of a short sentence here, then I see no reason not to. Providing too much information is usually better than providing too little. For the "Mac" question, we probably need to check how other encyclopædias do it; I haven't any to hand. --Stemonitis 14:56, 6 January 2007 (UTC)
Re "real situations", yes, I kinda noticed. For that reason I elaborated my answer to #8 a bit, but ended up in an edit conflict. Anyway, you can see my elaborations above, resuming (partly) to: it's not possible to catch everything in rules. Rules don't outdo using your common sense.
Also, if "mistakes" have been made, some of these are so evident, that it would be unnecessary to write guidelines about them. Typos are corrected by the dozen every second, yet there is no guideline listing all possible typos, warning against each of them separately. Indeed summing up "most popular" typos in a guideline would be somewhat of a WP:BEANS approach, while even for popular typos most people don't make them. --Francis Schonken 15:22, 6 January 2007 (UTC)
English usage on the alphabetization of Mc varies extravagantly; I have even seen it alphabetized as a separate letter, after M. It is probably simplest to let them fall between Mb and Md - because that is what editors who haven't looked at this guideline are most likely to do; and for some names the distinction between McX and MacX represents a difference of families. Septentrionalis PMAnderson 18:40, 10 January 2007 (UTC)
Yes, this is what I tend to do at the moment, and I have left that recommendation out from my new draft (see below). --Stemonitis 18:42, 10 January 2007 (UTC)
On the same grounds, I'm not sure about the decapitalization of internal capitals, which will at least sort as letters. Internal š, say, is different; it will sort in a place that is definitely wrong; but there's nothing really wrong with sorting Macbean separately from MacBean; and, again, trying to enforce this will mean an awful lot of corrections of what editors will naturally do. (And for not much profit; most categories won't have more than one Scot, so that, although we will have to correct all the MacBeans, for most of them it will have no effect on the category.) Septentrionalis PMAnderson 19:28, 10 January 2007 (UTC)
Most people will not realise that the software effectively uses two non-overlapping alphabets. B comes between A and C in alphabetical order, regardless of whether it's uppercase or lowercase, so there is definitely something wrong with sorting Macbean after MacWilliam or MacBean before Macalistair. I don't believe that any reputable reference work puts all the people with a capital letter after the "Mac" before all the people with a lower-case letter after the "Mac" (the dictionary I have to hand certainly doesn't), and I can't really see any argument in favour of it. (In fact, the two alphabets do not even abut; the characters "[", "^", "]" and "_" appear between them, although those characters will rarely appear in article titles, and especially rarely straight after "Mac").
The argument that many categories will contain at most one Scot does not hold water. There are plenty of categories with many people whose names begin with "Mc" or "Mac", and it would be helpful if they all sorted in the same way. Any kind of consistency will require a great number of changes, but that doesn't mean that we shouldn't make them, and it certainly doesn't mean that we shouldn't work out what they should be. It is to be hoped that editors who are unsure about how to categorise biographical articles will come to this page to find clarification, and find it, and so the consistency will trickle through over time even if no direct action is taken. And for those cases where it doesn't make any difference, no-one is obliged to make any change, so there's no harm done. --Stemonitis 22:54, 10 January 2007 (UTC)
OK, I've tried to cover all of the things I was trying to say into as concise a form as possible. My sandbox contains a draft of the relevant section, with my additions mostly confined to the last three bullet points. I think the extra explanation is helpful, because a lot of people seem to be confused as to why these changes are necessary. The short sentence "The sort key should mirror the article's title as closely as possible" covers several of my ponits above; I wish I'd thought of that wording before. I have also tidied up the remaining text a bit, and tried to make the layout more readily comprehensible. Any comments would be greatly appreciated. --Stemonitis 17:31, 10 January 2007 (UTC)
No-one has made any complaints about the draft, so I'll copy it across (but removing the "In categories dealing with British peers" qualifier which was included there). --Stemonitis 12:32, 12 January 2007 (UTC)
No approval either! I thought it too scetchy to give it a serious tought - didn't you see the evident errors? No, Louis IX is not a British peer, etc, etc. --Francis Schonken 13:02, 12 January 2007 (UTC)
It would probably have been better if you'd mentioned this before, but at least I'm getting the feedback now. Other than the Louis issue, are there any other changes you'd like to see made? Little things like that can easily be changed (for instance, by generalising the sub-heading to "Nobility"), and at any time, including after the new text is included. --Stemonitis 13:31, 12 January 2007 (UTC)
Sorry, no, I don't think the rough draft at User:Stemonitis/Testing ground worth any further consideration in this stage. If you'd be able to get the *obvious* (as in: should be obvious to anyone) errors, typos and internal contradictions out of it, that might be a step forward, and I might give it a second look. But then please also don't run ahead of other discussions regarding the changes you'd like, but which are currently far from approved (see the comments by myself and by others on this page). --Francis Schonken 15:47, 12 January 2007 (UTC)
For goodness' sake, if they're that obvious, then please tell me what they are. I have read through it, and didn't spot any typos, for instance (which doesn't necessarily mean that there aren't any). And I have heeded the comments on this page. It was too wordy and included redundancies; now it is shorter and better. It was thought that capital and lower-case letters both sorted as one alphabet; this is not the case. It was unclear whether there was consensus for forcing "Mc" to sort as "Mac"; that was removed. In the absence of any further or more detailed comments, it is difficult to know where the problems lie. And may I add that taking the attitude that improving it is beneath you is really quite galling. If you don't want to help, then that's fine, but please don't hinder. I can re-order the bullet points quite easily (perhaps you think that the suffixes issue belongs under "Sort by surname" or that "Augustine of Hippo" belongs under "Other exceptions"), so I don't see that as a major stumbling-block, either, even if the order is currently imperfect. --Stemonitis 16:06, 12 January 2007 (UTC)
Since no concrete suggestions have been proferred, I have reinstated my improved text. I hope that any problems that are found with it can be worked out without recourse to large-scale reversion. --Stemonitis 11:55, 16 January 2007 (UTC)
Your changes were an improvement over the current state, especially in organization. The textual changes are minor, so I'm not sure what Schonken's problems are. The existing text is flimsy, but that is another point. Your two additions, eliminating punctuation and lower-casing internal capitals, lead to an alphabetization that, as far as I can tell, confirms with English-language reference works and the MARC standards, so they are recommendable. --Afasmit 13:24, 17 January 2007 (UTC)

@Stemonitis:

  1. Since you didn't stop pressing me, I took a closer look at User:Stemonitis/Testing ground, and improved it to the best of my abilities. You will see that a lot of the improvements you have proposed are no longer there, for the reasons I explained above, mainly that they are not specific to names of people, so they should be discussed at Wikipedia talk:Categorization, and in the case a consensus for them would emerge, they should be implemented at Wikipedia:Categorization#Category sorting.
  2. Note that even for those of the new principles proposed by you I'm mildly positive about: *they need work*, among others the "remove punctuation", is vague, the way you desribed it contradictory (while not mentioning the commas etc); the "Decapitalisation after first capital" rule makes no sense in more than 90% of the cases, we *have* categories where everything is sorted capitals only, etc, etc, (all this are repeats of what I already said, and which was not implemented yet in any version of your proposals)...
  3. Again, take these new proposals to Wikipedia talk:Categorization, they are not specific for articles on *people*, so they don't belong in a guideline that is exclusively on the categorisation of people: first you need consensus on general collation rules for categories, before any of this would be implemented in the guideline on categorisation of people. --Francis Schonken 01:36, 18 January 2007 (UTC)


Francis, forgive me if I seem underwhelmed. A lesser person might have been insulted by your edit to my draft (which replaced the entire text and further improvements with the existing text, without any attempt to incorporate any of my hard work). I will nonetheless try to assume that you were acting in good faith. The issue about removing punctuation is not important enough to warrant full-scale reversion; perhaps a clarification along the lines of "and then" would be enough to indicate the (already implicit) order of events. (Furthermore, the examples would probably clear up any potential confusion.) Your assertion that internal decapitalisation "makes no sense in more than 90% of the cases" seems not to be backed up with any evidence. My own (extensive) experience suggests that it nearly always causes a problem. Names such as LeXxx, DeXxxx, and so on are routinely mis-sorted. I cannot remember a name with internal capitals that did not need to be manually resorted. But this all leaves the biggest point, namely that the rules could be applied more widely. Well, yes, perhaps they could, but I don't see that that stops them being implemented here, and, more importantly, I'm not sure that they do apply so generally that they would make sense at WP:CAT. Rules that apply to human names need not apply to microchips or automobiles or other types or article. I have seen no rule stating that all guidlines must apply at the highest level possible, and I can't imagine that such a rule would be helpful. If this is the limit of your criticism, then I really can't understand why you insist on reverting. Nobody owns the text, which can be improved as long as the changes represents consensus. I believe that my draft does represent consensus here, since you are the only person opposing it, and you don't seem to want to help improve the text. --Stemonitis 02:28, 18 January 2007 (UTC)
Sorry, no, that is not how it works. You can't force the rules in Wikipedia:Categorization via the back door of Wikipedia:Categorization of people. If you continue to try that, I think it would be better to move the content of this section to Wikipedia talk:Categorization.
Note that many categories contain both people names as entries of articles that are not person names. It made me think about this old joke of a dictatorial president-for-life who allegedly visited the UK, and was impressed by the cars driving at the left side of the road. So he came home and ordered the cars to drive left in his country, adding: "if it works well, we'll order in a few weeks the same for trucks". A bit exagerrated, but I hope this made clear what I think.
Note also that Wikipedia talk:Categorization is a much more active talk page (I mean, in terms of number of people participating), with a high average experience in categorisation issues. Two people against one, plus some additional criticism by PManderson (like it is on this page) is not going to establish "consensus" anywhere near soon on this. --Francis Schonken 09:34, 18 January 2007 (UTC)

Peers

The change from "In categories dealing with peerage" to "In general" is a big one, and should be discussed before being adopted. It has far greater implications than the matters dealt with in the paragraphs that follow it. I haven't (yet) seen a consensus for title-based sorting outside peerage categories. I suspect that such a consensus will appear soon, but I haven't seen it yet. In the meanwhile, can we please return to the accepted text? --Stemonitis 12:20, 9 January 2007 (UTC)

It is not a "big change"; it is a clearer description of what Wikipedia actually does, and has always done, for the vast majority of peers. See any article on a peer which uses his title in the article title; (that is to say, not the handful which resemble Eden, North, Bertrand Russell, or Francis Bacon.) Septentrionalis PMAnderson 18:23, 10 January 2007 (UTC)
Isn't the point of a guideline that it says what should happen, rather than what does happen? Actually, I'm coming round to believing it is the best way (or is at least the consensus view), but I'm not aware of it having been specifically discussed for more general categories, and it is a large change to the guidelines, since it's now giving different advice about the sorting of peers in general categories. The fact that they're mostly already sorted that way is merely an advantage to making the change, but doesn't prevent it from being a change in the first place. --Stemonitis 18:39, 10 January 2007 (UTC)
What should happen? No, absolutely not; you have an exaggerated idea of the importance of this page. Please read Wikipedia:Policies and guidelines; and, less formally but more coherently, WP:PRO. Guidelines are not legislation; they're not even policy; they record what most editors agree on so we don't have to discuss it again. The best evidence of consensus is what Wikipedia actually always does; much better than some poll of a dozen editors on some page somewhere. This is why m:Voting is evil. Septentrionalis PMAnderson 18:52, 10 January 2007 (UTC)

examples

I suggest to do some examples, in order to get a clearer view of this issue. I'll start with one example. Feel free to add others, but please also let us know what you think about this particular example:

Example:

  • The article on William Cavendish, 1st Duke of Newcastle-upon-Tyne contains (among others) this categorisation: [[Category:Riding masters|Newcastle, William Cavendish, 1st Duke of]]
  • His wife, Margaret Cavendish, has categorisations in this format: [[Category:English dramatists and playwrights|Cavendish, Margaret]]
  • October-December 2006, in Antwerp, there was an exposition about these persons: Royalist Refugees: William and Margaret Cavendish at the Rubenshouse (1648-1660). As you see the exhibition's title only mentions "Cavendish", the description of the exposition on the Rubenshouse website mentions "Newcastle" twice.
  • I'm not sure whether [[Category:Riding masters|Newcastle, William Cavendish, 1st Duke of]] or [[Category:Riding masters|Cavendish, William, 1st Duke of Newcastle-upon-Tyne]] would be more appropriate in the William Cavendish, 1st Duke of Newcastle-upon-Tyne article. I'd tend to think the second (sorting on last name) though. --Francis Schonken 18:48, 10 January 2007 (UTC)
    • Newcastle would be much better usage. A Dutch site is not the best guide to English idiom, although it is otherwise in perfectly good English. The Oxford DNB is notoriously anti-title; they even call the article "William Cavendish (1593-1676)"; yet they call him Newcastle in every reference after he became a peer. Unless a text specifically refers to him before 1628, "Cavendish" is a soleicism; in strict form, it even stopped being his name.
    • His wife is untypical. Largely because Lamb did so, it is semi-conventional to call her "Margaret Cavendish"; and so they together can be called the Cavendishes. I would strongly deprecate indexing Georgiana Cavendish, Duchess of Devonshire anywhere other than Devonshire. Septentrionalis PMAnderson 19:12, 10 January 2007 (UTC)
      • Thanks, the Rubenshouse website is however not "Dutch", but "Belgian". If you mean Dutch as in "Dutch language", the website is mixed Dutch and English, and above I linked to a page exclusively in English. Of course I get your point, just adding some precision to your wording.
      • This American/English on-line catalog ("American" as in US-based; "English" as in in English language), sorts both Margaret and William under Cavendish, not Newcastle: http://www.letrs.indiana.edu/cgi-bin/eprosed/eprosed-idx?type=boolean;layer=2;rgn1=period;q1=Jac&size=100&slice=1
      • But I have no preference. Of course, living in Belgium, I had heard about "Cavendish" before learning he was a duke of "Newcastle". Depends on average expectation, for which mine would be far from indicative in this case. And the Cavendishes are no "average" example for British peerage either I suppose, if basing oneself on their time spent in exile in Belgium. --Francis Schonken 20:15, 10 January 2007 (UTC)

"Authority control"

Some of the current guidelines clash with the Anglo-American Cataloguing Rules and/or Machine-Readable Cataloging standards. These norms appear to be followed by the major English-language reference works and publishing companies. As they probably represent decades if not centuries of expert consideration it seems reasonable to follow these guidelines, rather than create them from scratch and/or base them on a personal preference.

The complexity of the rules may be taken as a disadvantage, but there is a very good website at the Library of Congress that allows easy verification of the proper way to index a person, for which the rather Orwellian term Authority control is used. The site follows the MARC 21 rules, a "harmonization of the US and Canadian standards" that, since 1975, has been followed by the British Library as well.

The advantages of using the MARC 21 standards are numerous. It could function as a basis for spelling, capitalization, proper address of people that still need proper address, sorting in lists and categories, and probably more that I can't think of. It will make wikipedia consistent with other reference sources and libraries and aid consistency within wikipedia. It will greatly simplify the text of the guidelines, as the rules don't all have to be spelled out. Many revert wars (and discussion on this page!) can be prevented by simply following the “authority”. This does not mean that there can’t be wikipedia-specific exceptions (or errors in the database), but these can be resolved via discussions, which should be fewer and more fruitful than the current ones all over wikipedia.

Though the number of persons indexed is far, far larger than the number of people entries in wikipedia, not all names will be there. In a survey of several hundred “problem cases" I found some 3 errors and 5-10% missing, usually involving living people. For some reason I couldn't locate a fair number of (the more obscure) ancient Portuguese explorers. All this means is that for these people there is no "referee" to help, though names of relatives etc. that are in the database can often still be used for a resolution.

Some may find the Anglo-centric approach for an encyclopedia that is in use by anybody who can read English problematic, but that is easily defended, while the rules are fairly international-sensitive and in many cases less insular than the current situation.

For the “Ordering names in a category” section, I suggest to use the following introduction after "Conventions specific to articles and categories relating to people are:":

Check this website of the Library of Congress first. It shows how a name is indexed in libraries, following rules that are also more or less followed by all major English-language reference books and publishing companies as well. Some alternatives are linked via the button "references" on the left to the "authorized heading". Not all permutations are listed though: if you can't find it under one part of the name (e.g. Zedong), try another one (e.g. Mao).

Though the website's index is very large, not everyone will appear. Often relatives in the index can be a guide. Otherwise, the following rules of thumb should be followed:

Note again that these rules of thumb sometimes will be different than the current ones.

As this suggestion has more widespread impact than just on the ordering of people in a category it probably ought to be discussed at other places as well (e.g. that of Wikipedia:Naming conventions (people)). Any advice on this is appreciated.

- Afasmit 22:21, 16 January 2007 (UTC)

Tx for the suggestion,
  1. As far as I understand we follow Chicago Manual of Style on many style-related issues. Don't know where MARC 21 overlaps (possibly with conficting rules...) with CMS?
    Can you show a project page where this manual is stated to be the one to follow? The Chicago Manual of Style Online page is by subscription only. A free trial gives me some time to do a comparison, but we can't expect people to take a subscription just to check the proper way of spelling and collating a name. - Afasmit 00:50, 17 January 2007 (UTC)
    E.g., Wikipedia:Footnotes#Where to place ref tags, end of second paragraph. The general principle is in the first section of Wikipedia:Manual of Style:

    Some examples of authoritative style guides are: The Chicago Manual of Style and Fowler's Modern English Usage. Chicago also provides an online guide, the Chicago Manual of Style Online. Style guides available at no cost are the Mayfield Electronic Handbook of Technical & Scientific Writing and the CMS Crib Sheet by Dr. Abel Scribe.

    --Francis Schonken 09:50, 17 January 2007 (UTC)
    I'll check the online version of the CMS, but there may be little overlap with this topic. Authority control is not a style issue, but what the agreed spelling of a person (or subject)'s name is and how it is indexed.
  2. As for collation rules,
    • when it comes to persons there's only one rule: "enter surname first" - yeah, we got that one.
      Hence the endless discussions on this and other pages ;-)
      Pardon? The only rule they have is the one that gives trouble on Wikipedia, is that what you're trying to say? --Francis Schonken 09:50, 17 January 2007 (UTC)
    • It assumes everyone has a surname. Or, that what usually follows a given name is always a surname. Don't see how "Hirohito" could be sorted by surname. Nor "Augustine of Hippo". Nor "Saint Alban". Nor do we sort "Mother Teresa" by surname, etc., etc. So, we tell how to sort all people that don't have a surname, or aren't generally known by their surname. For which the Library of Congress page you linked to is of no help.
      It does though. The authorized headings are rather intuitive:
      Hirohito, Emperor of Japan, 1901-1989
      Augustine, Saint, Bishop of Hippo
      Alban, Saint, d. 304?
      Teresa, Mother, 1910-1997
      Then they don't follow the "surname first" rule: that last one should be: Bojaxhiu, Agnes Gonk. Recognisability: zero. It's not as if "Mother" is her first name, and "Teresa" her last name, is it? So you'd still need to describe what really happens. --Francis Schonken 09:55, 17 January 2007 (UTC)
    • Neither does it seem to be bothered by capitalisation, nor by diacritics (which are issues with the current state of the MediaWiki software). --Francis Schonken 23:43, 16 January 2007 (UTC)
      I didn't promise it solves all wikipedia's problems, but I think the benefits that I mentioned above are considerable. - Afasmit 00:50, 17 January 2007 (UTC)
      Sorry, this doesn't help. Again, Wikipedia already has the rule that last names go first. And it doesn't solve all the other issues, like you admit. Sorry, void & moot argument. --Francis Schonken 09:50, 17 January 2007 (UTC)
Francis, your comments don't make any sense. I hope this and the above exchanges with me and others are honest misunderstandings, and do not represent a strategy to prevent changes on this site by drowning every well-meant suggestion with tangential or nonsensical remarks.
Please visit the site again. I have no idea what gave you the idea that all it does is to recommend to sort by surname (if I interpret your comments correctly now). It offers you to type in the name of the person about whose name you're uncertain, say "Bojaxhiu, Agnes". It will return an alphabetical list starting with the name you typed in (or immediately following it, if it is not in the database). In this case "Bojaxhiu, Agnes Gonxha, 1910-1997" shows up first. To the left of it is a reference button that leads you to the "authorized heading" of "Teresa, Mother, 1910-1997". Our guideline would be to use that as sorting key for the categories.
This database is very useful for all the reasons I've mentioned above and more (e.g. it tells you where the surname starts of someone you've never heard of, it resolves naming of peers, it tells you that Iwasa Matabei's surname is Iwasa, etc.
--Afasmit 13:24, 17 January 2007 (UTC) (who won't be online for the next 10 hours or so).
That's the third time you link to that same page of the Library of Congress. *On that page* there is *only one rule specific for person names*, it reads:

- For personal names, enter surname first

Further, the other systems of the Library of Congress website you explained above, have, for what I know about the *technical* side of the MediaWiki software (used for the Wikipedia website), few or no relevance at all. For example, the first "Search Example" on Searching Name Authority Headings reads

weber carl maria von

...for which Wikipedia uses "Weber, Carl Maria von" as category sort key (capitalisation needed for the current state of the MediaWiki software; comma added per convention we're not likely to change anywhere soon,...). Another example, quoting from the same LoC Name Authority headings page:

For names with connectives (van, von, der), follow the conventions of the person's country or language: gogh vincent van

Which is particularily a rule that is nearly impossible to follow, because, for example, for a certain language (Dutch) there are two countries (Belgium, Netherlands) that follow different collation rules. See Christine Van den Wyngaert example above (a Belgian person, that is primarily known for working in the Netherlands), the LoC names her MARC authorised heading as "Wijngaert, Christine van den" - which is not OK for the country where she lives, and besides has a *spelling* error (Wyngaert, not Wijngaert). You can't expect average readers of the encyclopedia to know rules the MARC system as implemented by the LoC doesn't even know how to interpret correctly according to their own rules.
Then there's Jean de La Fontaine, whose MARC authority record on the LoC website appears to be "La Fontaine, Jean de, 1621-1695":
  • We generally don't do dates in category sort keys, nor in article names (there are a few exceptions to that, but not for Jean de La Fontaine);
  • But I want to draw attention to the confusing set of rules on the Searching Name Authority Headings page (Wikipedia does a lot better than that!):
    • "Omit initial articles (a, an, the, das, el, la, etc.) in any language" - as it happens "La" in "La Fontaine" is as much an article-that-has-become-part-of-a-name as "The/Den/La" in "The Hague/Den Haag/La Hague" (authority record: "Hague (Netherlands)"). Confusing rule, while it means that the authoring record should normally start with "Fontaine,..."
    • "For names with connectives (van, von, der), follow the conventions of the person's country or language": actually, that rule is followed correctly here. But the problem is, how is an "average English speaker" supposed to know which are the connectives (nor "de" nor "la" are listed in the example list) and then know the french rule that in this case one of the two connectives goes before the name, and the other after the name. While, for others with the same last name, there are authoring records as diverse as:
      • De La Fontaine, Oliver Roberts, b. 1857.
      • Fontaine, Helena de la, 1939-
      • La Fontaine, Christophe
      • Fontaine, A. de, b. 1798 (some of these people apparently never die)
There have been some old discussions on whether or not Wikipedia should adopt categorisation systems external to Wikipedia: we've had several candidates, including but not limited to, Dewey Decimal System (see *rejected* guideline proposal Wikipedia:Wikipedia Indexing Scheme), a few that were discussed at Wikipedia talk:Contents/Archive 1#Non-Wikipedia categorization/classification systems, etc. Each time the conclusions were more or less the same (that is, if discussions weren't left alone before reaching a conclusion): they don't fit. For what I see about MARC, and its implementation by the Library of Congress, makes me think its chances of being implemented in Wikipedia are about as big as the chances for Dublin Core to be implemented, and Dublin Core is a standard nobody has been proposing yet. These are standards that, none of them, fit in to Wikipedia (although for each of them, some of their good ideas might be inspirational)
Anyway, if you're interested in a library approach, there's Wikipedia:WikiProject Librarians you might want to get involved in. --Francis Schonken 14:01, 17 January 2007 (UTC)
Sorting rules differ not only by language and by country (the discussion above, for example, mentions difference in English in Canada and English in the United States, and in Dutch in the Netherlands and Dutch in Belgium), but by the particular application for which they are used. Phone books are often different from card catalogues. Card catalogue rules aren't necessarily the best rules for an encyclopedia.
We should ignore English articles (a, an, the) and index articles in any other language just as any other word. Sure, a specialized book or journal which deals with a lot of them from one particular foreign language might do it different for that language, but we are dealing with hundreds of different foreign languages used in article names on Wikipedia. and there is no reason for Wikipedia readers to have to know the functions of all these foreign words in order to know where to find something in Wikipedia cateories. Gene Nygaard 05:04, 18 January 2007 (UTC)

Protected

I'm not sure what the dispute is about, but could Francis Schonken and Stemonitis please desist from edit warring and discuss the matter on the talk page? Perhaps requesting a third opinion would help. >Radiant< 09:56, 18 January 2007 (UTC)

people by political belief

In Nov 16 2006 there was a CFD about the Category:People by political orientation which, essentially, punted back here or to Village Pump. I got triggered on this issue because in the last couple of days there have been a slew of Category:American liberals cats placed on wildly inappropriate articles, like Howard Zinn and Ani DiFranco and other folks who would self-define as "radicals"; when I looked at the category I noticed a lot of people who would not self-identify as "liberal"; and the subcats Socialists and Democratic Party -- which are completely inappropriate, because socialists, even as defined in the US, are not "liberal", and the Democratic Party certainly does not include all liberals. This is obviously a particular definition of "liberal", but it's not a universal one. I suspect the same problem exists with "conservatives" category. Unfortunately, these categories, while important if properly administered, are going to be a nightmare to maintain, because everyone has an opinion -- and a different one -- about what these terms mean. Has this discussion happened somewhere? --lquilter 05:04, 19 January 2007 (UTC)

  • Also, is this supposed to reflect current belief? Or belief at death? Or belief most of their life? Or belief during which they were politically active? Or what? If someone switches or changes over time do they get both? --lquilter 05:29, 19 January 2007 (UTC)
  • Note: Users engaged in discussion about the category have been notified of this discussion, including 68.37.166.168, Bduddy, Dkreisst, Domminico; and relevant WikiProjects WP:WPBIO and WP:PLT. --lquilter 21:18, 18 January 2007 (UTC)
  • Maybe a better category name, though equally, or even more, broad, would be "American progressives?" This seems to be what the originators of the category "American liberals" were trying to achieve. Are there other terms besides "progressive" that would also fit? Although the word "activist" does not define political beliefs (with the exception of being apolitical), maybe this category can be tied into the "activist" category that exists already with something like "American progressive activists." Honestly, I don't know whether these suggestions are helpful, I find categories a bit overwhelming. I don't claim to be informed enough to know whether these would be valuable as a category. Dkreisst 13:07, 19 January 2007 (UTC)
    I think that would be better than "liberals" simply because "liberals" is such a contentious and ill-defined term in the US these days. But "progressives" has its own problems; not least of which is the fact that it has a particular historical context (see disambiguation for US Progressive Party) and a modern reinterpretation. --lquilter 14:55, 19 January 2007 (UTC)
    Good point. Thanks for the link, I hadn't bothered to check it before I wrote. Dkreisst 00:45, 20 January 2007 (UTC)
  • 'See also discussion at Wikipedia:Categories for discussion/Log/2007 January 19#Category:American liberals.
  • I'd like to propose the renaming of these categories. Both categories could be vital if used properly, but I feel a good deal of confusion has erupted over them. My suggestions are Category:Right-Wing Americans and Category:Left-Wing Americans. -- User:68.37.166.168

Ordering of surnames in specific cases: Mc, O' and St.

I think specific mention should be made of surnames where the alphabetical indexing is different to the strict spelling of the surname. The three specific cases are mentioned in the section heading. I have in front of me a late 19th century Whitaker's Almanack listing the members of the House of Commons which includes:

Lyell, Leonard
M'Arthur, Wm. A.
Macartney, Wm. G. E.
M'Calmont, James M.
M'Cartan, Michael
M'Carthy, Justin
McDermott, Patrick
Macdona, John C.
Macdonald, J. A. Murray
MacDonnell, Dr. Mark A.
M'Ewan, William
Macfarlane, Donald H.
McGittigan, Patrick

.. and so on. Were this expressed in our category indexing, anyone with a surname beginning in M', Mc, or Mac, would be indexed with their surname beginning "Mac". Another case is surnames beginning with O', and here the list goes:

O'Driscoll, Florence
O'Keefe, Francis A.
Oldroyd, Mark
O'Neill, Hn. Robt. T.
Owen, Thomas

This means that surnames beginning with "O'" should be indexed without the apostrophe. Likewise surnames beginning with "St." as an abbreviation for "Saint" should have "Saint" spelled out in the category indexing. I propose to make this addition to the policy, but would be interested to know if anyone wishes to object. As I see it, this is not a controversial issue. Sam Blacketer 09:35, 24 February 2007 (UTC)

I'm with you on the O'Xxx issue, but the Mac variations may be more debatable. Sorting all of Macxxx, MacXxx, McXxx, Mcxxx, M'Xxx and M'xxx as Macxxx is bewildering for people unfamiliar with the practice. It would be more transparent, albeit perhaps less traditional, simply to sort by the letters included in the name, without punctuation, and without any internal capitals. Abbreviations like St. are somewhere in between "O'" and "Mc" in clarity, but are probably sufficiently widely understood to be sorted as "Saint Xxx" even when titled "St. Xxx". However, this does bring up the overlooked problem of spaces in surnames: "Saint John, Forename" sorts before "Saint, Surname", because space sorts before comma in ASCII, but I doubt that any reputable reference work (except Wikipedia) follows such a scheme. --Stemonitis 17:19, 25 February 2007 (UTC)
On "St.", I have certainly seen alphabetical lists produced recently where "St. " was indexed at the beginning of ST, but presumed it was always an error caused by computer sorting not recognising the abbreviation. Certainly, a surname of "Saint" should be before "St. John". On the "Mc" problem, I wonder if this is a case of national variations? I have not seen any alphabetical list which has separated them which could not be attributed to a computer's misrecognition. Sam Blacketer 11:28, 26 February 2007 (UTC)
I wonder if it's possible to distinguish between the naïvété of a computer and a deliberate, conscious simplification. Sorting "McX" after "Mbxxx" could easily be someone's deliberate choice. I guess we can only mirror the practice used by Britannica and other encyclopædias, whatever that might be. The only solution I can see to the multi-word surname issue is to replace the spaces (ASCII 32) in the surname with a character that would sort after the comma (ASCII 44) but before any (lower-case) letters, and my recommendation would be either to use the underscore ("_", ASCII 95), so that Ian St. John would be sorted as "Saint_John, Ian", for example, or to simply run the words together so that Ian St. John would sort as "Saintjohn, Ian". I am not sure which approach I prefer. Again, it probably depends on the usage preferred elsewhere. --Stemonitis 17:36, 26 February 2007 (UTC)

Mish-mash of firstname, lastname indexing

There is a serious problem developing, largely because of the efforts of a few editors such as User:Blnguyen, with some categories having a large number of people sorted by the normal "Lastname, Firstname" rules here, and another large number in the same category sorted by "Firstname Lastname". See, for example, Category:Indian Test cricketers.

This discussion copied from Incidents illustrates the problem.

Talk:Yuvraj Singh includes a link to a previous discussion at WT:CRIC about this issue where Gene was the sole voice arguing for mandatory classification for last name, whereas everybody else felt that it was correct to use whatever the main usage of the term was. That archive also shows that the examples of Indian Sikhs and Muslims who are indexed by first name are noted. When the switch was made to the Yuvraj entry, there was a reminder on the talk page. After another user came and fixed up typos and grammar in late 2006, they weren't aware of the way Yuvraj is categorised, so when I switched it back to Y, I left an invisible comment [4] in late December. Since then, Gene has reverted the article four times, despite the article having a note and the talk page having a note, for a total of six reverts, whereas other articles such as Harbhajan Singh and Maninder Singh, which do not have a reminder notice, have been less frequently targeted. As for Gene's comments that my failure to revert all his edits shows that I have a rationale problem; this is is incorrect - I am categorising them by what they are referred to publicly, per the previous discussions. Robin Singh and VRV Singh are not Sikhs and are common referred to as Singh, while the others are referred to by first name. As for Shah Nylchand and any others, the same applies. Blnguyen (bananabucket) 01:25, 1 March 2007 (UTC)
Note carefully: I was the last voice addressing the issue on that talk page, and still am a month and a half later. Neither User:Blnguyen nor any other editor has addressed the points I raised there, in my only comment there.
Note more carefully that Blnguyen misrepresented what the previous link dealt with:
  1. It dealt specifically with cricketers from Pakistan, from Bangladesh, and from the United Arab Emirates—not with cricketers from India.
  2. It dealt with indexing all people in the categories related to cricketers from those countries by first name, not some haphazard mish-mash with some indexed by first name and some indexed by last name as Blnguyen proposes.
  3. It specifically dealt only with the cricket categories related to those countries, not to categories for cricket in other countries for people who may have played in more than one place, not for categories for people also notable as politicians or writers or whatever, not for the birth and death and living categories.
  4. What Blnguyen describes here, in his "I am categorising them by what they are referred to publicly" statement, is a category determination that depends on the establishment of a factual foundation.
  1. Even if that were the rule of our guidelines, it would require he establishment of that fact on an individual, case-by-case basis for each person, by proper citation to reliable sources, and not be based on WP:NOR by Blnguyen or any other editor.
  2. Blnguyen has not met the burden of establishing this fact in any single case. He has not even attempted to do so.
Discussions on Wikipedia talk:Categorization of people have dealt with the guideline there ("normal order and not (for example) according to the Dutch system") by pointing out that we should not expect to readers to know whether a person is of Belgian heritage or Dutch heritage or German heritage or American heritage whatever, in order to figure out how his or her name will be sorted in categories. It is even more ludicrous to expect that readers should know a person's religion in order to know how his or her name will be sorted in categories.
I am taking this discussion to Wikipedia talk:Categorization of people Gene Nygaard 17:54, 1 March 2007 (UTC)

<end of copied discussion>


Proposed clarifications

  1. First-name sorting rather than last-name sorting is a property of a category, not of a person.
    • Obviously, when our article's name only uses one name by which the person is generally known, such as Björk or Pelé, they should be sorted as "Bjork" and "Pele" respectively (note that a sort key is still needed in these particular examples). In these cases, the redirect from the full name could also be indexed if people might look for that.
    • But Oprah Winfrey should be indexed under "W" and Shaquille O'Neal under "O" in most categories. (in these particular cases, the Oprah and Shaq redirects could be indexed too, but in most cases the single name version will be a disambiguation page).
  2. Categories should only be first-name indexed if this is specifically pointed out on the category page, and only as long as there is consensus for it. Discussions should normally take place on the category talk page. If a series of related categories are involved, then the discussion should be consolidated in one place, with links to the discussion from each category. Then editors know where to go if they disagree with the sorting order, and furthermore the question can be raised as to whether any particular category belongs in that group of categories or not can be raised.
  3. No category should be a mixture of last-name indexing and first-name indexing.
  4. Knowing where to find a person listed should not depend on whether or not the person is Dutch or Belgian or whatever. Knowing where to find the person listed should not depend on knowing whether the person is a Sikh, a Christian, a Buddhist, or whatever. There is no reason to even try to establish that as a factual basis for ordering the listings.

This is just a starting point for discussion. Gene Nygaard 18:06, 1 March 2007 (UTC)

Animals in the people categories

The child categories of Category:Racehorse deaths by year are all intertwined into the Category:Deaths by year subcategories, categories which are of course meant for people. Example: Category:1789 racehorse deaths is in Category:1789 deaths. The same mess is also present in the Category:Racehorse births by year categories.

Does anyone here have the hour or two it would take to clean this up? Or, is there a bot out there capable of cleaning this up? --kingboyk 22:40, 3 April 2007 (UTC)

I can try this with my bot. Basically it's just going through all the subcategories of Category:Racehorse deaths by year and Category:Racehorse births by year, and removing them from subcategories of Category:Deaths by year and Category:Births by year, right? I'll request approval if this is correct. —METS501 (talk) 03:02, 4 April 2007 (UTC)
Yes, that's it. Perhaps the people and horse categories should get a "see also" too? Then there's still a link, just not an improper categorisation of horses as people :) --kingboyk 11:05, 4 April 2007 (UTC)
OK, I'll work on the request later. I'm about to lose my connection as I'm upgrading to Verizon FiOS, so that will take a few hours. —METS501 (talk) 12:42, 4 April 2007 (UTC)

Categorizing people by television show

There is an ongoing revert war going on at Matt Groening over whether he should be included in categories for The Simpsons and Futurama. The two sides argue that policy is against this type of categorization and that because he is notable as creator of the two shows he should be included regardless of this. I was hoping someone here could give a non-biased opinion on this matter or otherwise point me to an appropriate policy statement. I found two points in this page which could be used to argue either side so I was wondering if there was a better place to look. Thanks for any help. Stardust8212 16:04, 4 April 2007 (UTC)

nationality -> country

I have a new bold proposal to reduce controversy in our categorization scheme. I propose we categorize people by "country" rather than "nationality". Nationality does not equal citizenship even if that was the original intention. If one changes citizenship or has multiple citizenship it can be tagged by all that apply. Stuff ambiguous like "American people" would also be gone as a result of this since it is "People from the United States". -- Cat chi? 01:15, 13 May 2007 (UTC)

Provisional support for this, but what if the person is from a state that has ceased to exists or an area that has changed hands? Not a deal breaker, just curious as to how we'd handle that situation. -Mask? 08:03, 18 May 2007 (UTC)
The person can be tagged with the past state as well. It isn't controversial to suggest someone was from Soviet Union and from Russia assuming they had a Soviet Union citizenship as well as a Russian one. Same rule would apply if the area changed hands. Basically we are using the citizenship of the person.
For instance Mustafa Kemal Atatürk was born in Thessaloniki. At the time Thessaloniki was Ottoman territory, later Ottoman Empire became kaput and two new countries came in to play Turkey and Greece. Thessaloniki eventually became a part of Greece but Atatürk was never given a citizenship from Greece (save me the historic details please, I do not care). He was however given a citizenship from Turkey. In sum, it is perfectly fine to categorize Atatürk as being "from Turkey" and "from Ottoman Empire" rather than "Turkish".
-- Cat chi? 08:29, 18 May 2007 (UTC)
Sounds perfectly acceptable to me. -Mask? 19:12, 18 May 2007 (UTC)

The existing system is fine. This massive change would create as many problems as it would solve, and it would be stitled and further removed from normal English usage. Honbicot 19:30, 26 May 2007 (UTC)

Birth by date hierarchy

There has previously been an attempt to create a temporal hierarchy based on date of birth in addition to the existing hierarchy based on year of birth. This initiative was however strangled in its birth about a year ago. I am now wondering if this is really such a bad idea. I have inquired whether there might be any support for a renewed initative from members of a few WikiProjects. On the Norwegian (nynorsk) Wikipedia such a hierarchy exists and seems to be both innocuous and well-functioning. __meco 13:26, 22 May 2007 (UTC)

This is a terrible idea. There is absolutely no educational benefit in connecting people by the completely random co-incidence having been born on the same day. The Norwegian Wikipedia probably hasn't reached such an advanced stage of category clutter as we have here. Honbicot 19:28, 26 May 2007 (UTC)

Proposed guideline for lists of people grouped by ethnicity and other cultural categories

There has been a lot of debate recently around deleting lists of people grouped by ethnicity, race, religion, and other cultural groupings. Unfortunately, Wikipedia:Categorization of people deals with this issue from the point of view of categories; there is no guideline for the similar but distinct lists of people. As a result, there has been a ton of confusion on this issue. To counter this, I have created Wikipedia:Proposed guideline for lists of people by ethnicity, religion, and other cultural categorizations to clear clarify this subject. I hope people will comment on the proposed guideline. Best, --Alabamaboy 12:28, 9 June 2007 (UTC)