This is my page for musings about categories on wikipedia, which are an endless source of pleasure and pain for me. Pleasure because I enjoy organizing articles and having nice clear usable hierarchies to browse through; pain because our category system is just so painfully bad, and so terribly implemented, that I despair at the amount of work left to fix it.
On ghettoization, non-diffusing categories, and LGBT heads of government
editI will start by walking you through something I do from time to time, which is de-ghettoizing an area of the category tree.
Let's start with a definition of ghettoization[1], which I developed in a previous essay, and then I'll give you a new example I found today which has confounded me somewhat...
What is ghettoization?
editTo start, we will consider that ghettoization only applies to categorization of human biographies on wikipedia
A biography is ghettoized if the following are true:
- The bio is a member of a gendered, ethnic, sexuality, or religion-based category X, and
- The biography is not in an ancestor or "blood relative" category of X (e.g. sibling, cousin, parent, grandparent, etc) that is neutral, i.e. non-gendered, non-ethnic, non-sexuality-based, and non-religious and which retains an equivalent descriptive specificity. By equivalent descriptive specificity, you can't say "Well, Y is in Category:Women novelists, and in Category:American people, which is gender neutral, so she's not ghettoized." An essential aspect of ghettoization is that the biography is in a ghetto, but not in a neutral category which is an analogue to the ghetto.
- If multiple facets are intersected on the bio (e.g. gender + ethnicity + sexuality + religion + ...), as you go up the tree, the bio is ghettoized if it is not a member of each extant iteration that removes a facet while retaining the same noun. For example, to avoid ghettoization, Category:African-American women in politics members should also be, at the very least, in Category:American women in politics (removal of "African-American"), Category:American politicians (removal of "Women" and "African-American")[2], and Category:African-American politicians (removal of "Women").
- The above rules do not apply for any characteristic which has been fully diffused - i.e. if all men and women are fully diffused, there is no ghettoization concern. See for example Category:Actors from Adelaide, which is empty save its sub-categories. In these cases, there is no need for a neutral category, each person can be an actor or an actress and that is not considered "ghettoization".
What are non-diffusing categories?
editAnother definition related to ghettoization is needed, that of a non-diffusing category. Briefly, a non-diffusing category is one that behaves differently from normal categories: normally when you place something in a sub-category, you remove it from the parent, but a non-diffusing category behaves differently - if you place something in a non-diffusing category, you do not immediately remove it from the parent.
Wrinkle #1: whether a category is non-diffusing or not depends on its parent. A category can be non-diffusing for one parent, and diffusing for another. Here's an example:
- Category:Scottish_women_novelists is a non-diffusing subcategory of Scottish novelists, because you don't want to ghettoize the women, leaving the men to gloat alone, victorious, in the novelists parent.
- On the other hand, Category:Scottish women novelists is a diffusing subcategory of Category:Scottish women writers. If someone is really known as a novelist, they don't need to remain in the writers category, and can be diffused down. Scottish women novelists also diffusing on Category:British women novelists and Category:Women novelists by nationality.
So a single category can be diffusing and non-diffusing at the same time.
Wrinkle #2: a category can be non-diffusing, but members of a non-diffusing category won't always remain in the parent. How does that work? This can happen in cases where the parent category has diffusing categories underneath it, especially if those categories diffuse fully (i.e. if everyone in the parent can be placed in at least one diffusing child cat).
- For example: If you look at Category:American women in politics, you may think all of those women should be in Category:American politicians, right? Wrong. There is nobody in Category:American politicians - it's empty! In fact, it's marked as a container category - it's not supposed to have anyone in it. So, are all of the women in Category:American women in politics ghettoized? Not necessarily - because as long as they are in a gender-neutral category underneath Category:American politicians, then we are ok.
Optional side bar on why you don't always bubble up non-diffusing categories
|
---|
This particular wrinkle was a bone of contention during the Category-gate discussions, with many arguing "If the category is non-diffusing, it means we must bubble them up!" To show why this is not workable, consider the following simple category structure:
We start by placing Bob and Mary in the Novelists category. Now, someone says "Mary is a woman", so she gets added to the Women novelists category as well. Someone else says "Bob is Scottish", so he gets moved to the Scottish novelists category and is removed from the parent, as is normal for diffusing categories - we regularly diffuse based on nationality. Finally, someone comes along and says "Well, Mary is American, so I'm going to move her to the American novelists category and remove her from the parent (in other words, treating her the same as Bob)" - but an editor opposes: "You can't do that - she's in Women novelists which is non-diffusing, so she has to stay in the parent otherwise she will be ghettoized!" - so she gets placed back in the parent. So now our situation looks like this:
Do you notice anything weird? Mary is the only one in the parent "Novelists" category - this is a rich irony indeed, as she now gloats over her spot at the top of the food chain, while Bob languishes down in the Scottish novelists dregs. There are two ways to fix this problem:
|
Real-life example: LGBTQ heads of government
editSo, now that we know what ghettoization and non-diffusing categories are, let's do a real-life example, with a puzzle/quiz at the end.
Today I picked Category:Heads_of_government, which has a subcat Category:LGBTQ heads of government. I think we all agree that being LGBTQ doesn't mean you are somehow less a head of government, so we want to make sure all of those fellows in Category:LGBTQ_heads_of_government are also in a diffusing, neutral subcat (and ideally, several) of the parent. How do we find them? It's rather tricky. We will be using the Category intersection tool to help us. We want to find out, who is in LGBTQ heads of government, but not in any other neutral categories under Category:Heads of government. But there are dozens of articles, and dozens of nested categories - how do we sort this out? Here are the steps I took:
- Get a list of all diffusing, neutral sibling categories of Category:LGBTQ heads of government. You can do so using the Category intersection tool like this. This gives us a list we can copy paste, which we paste into the Negative categories box. The list looks like this:
List of sibling categories
|
---|
14th-century_heads_of_government 15th-century_heads_of_government 16th-century_heads_of_government 17th-century_heads_of_government 18th-century_heads_of_government 19th-century_heads_of_government 1st-century_heads_of_government 2nd-century_heads_of_government 3rd-century_heads_of_government 4th-century_heads_of_government 6th-century_heads_of_government 7th-century_heads_of_government 9th-century_heads_of_government |
- Now, we remove from the list some of the categories, like any non-diffusing categories (such as Female heads of government - if our bios are in that one too, that's great, but that doesn't de-ghettoize them.). The sibling cats I removed are struck in the list above.
- Now we place target cat, Category:LGBTQ heads of government in the Categories box, select a depth of 8 or so (or deeper, depending on how far down your categories go), select 'Subset', and then "do it". This link shows you a filled out form. What this search does is say "Show me all LGBTQ heads of government that aren't in this whole other list of categories, recursively."[3] Using this technique, you can search through trees with hundreds or thousands of bios, and quickly find the ones that are ghettoized.
Ok, that was fun, but now to our puzzling result. We found three fellows who are ghettoized - they are considered "LGBTQ heads of government", but they are nowhere to be found in the Heads of government tree. Here are the questionable characters:
Now, how could a great king like Tiglath-Pileser III be ghettoized? He's in Category:Assyrian kings, Category:Babylonian kings, and even Category:Monarchs of the Hebrew Bible! But the category intersection tool tells us he's ghettoized?? What's going on?
Well, here's where you need to start to explore your tree. If you do so, you will find that it goes something like this: Assyrian kings -> Kings -> Monarchs -> Heads of state - aha! Category:Heads of state is sibling to Category:Heads of government, so the algorithm was correct - these bios *are* ghettoized.. in a manner of speaking. You may notice that there is no Category:LGBTQ heads of state - only Category:LGBTQ heads of government, so we have a bit of a contradiction - our LGBTQ category says they are a head of government, but the parenting of the other categories suggest, not really. This is a great example because it illustrates something which you will see all over the category tree: inconsistency. Sometimes you will have a female category and an LGBTQ category and an African-American category and sometimes even a combination of same, and then for a slightly different job title, you will have none of that. You will also find the gender/ethnic/religious/sexuality categories placed at all points in the category tree - up high, in the middle, and down low. If you've got a woman and you want to make sure she is in a gendered category for something close to her job, you may have to click up 2 or 3 levels before you find the appropriate "Women X" category; in the case of old Tiglath-Pileser III, someone went to distant uncle to find the LGBTQ category, placing him, somewhat incorrectly, as a "head of government". Sorting this out I leave as an exercise for the reader, as I frankly don't know what the best path is, but here are a couple of options to consider:
- We could create Category:LGBTQ heads of state, since all kings are heads of state, but not all kings are heads of government. This is probably the most "correct" solution.
- We could just say "Forget it, it's close enough", but it technically violates our rules against ghettoization, so perhaps do the rules need changing? Think about it this way - if you classify a woman as a "Woman novelist" and an "American poet", is she ghettoized or not? I think, yes.
Anyway, please share your comments on the talk page. I welcome your feedback on this meandering...
Footnotes
edit- ^ Ghettoization has been called out in the popular press as a form of sexism. My essay, attempts to explain why this is a bad word choice, because setting up a proper category structure, and then properly categorizing biographies without ghettoizing is so complex that to call it sexism is sort of like asking someone to solve 20 differential equations about African-American population growth in their head and then calling them racist if they get the wrong answer.
- ^ Important: or a diffusing sub-category of same
- ^ Please note, this is a simple version, and as the category tree gets more complex it becomes harder and harder to do this, but this search will give you a lower bound on the number of ghettoized people.