Archive 180Archive 185Archive 186Archive 187Archive 188Archive 189Archive 190

ENGVAR and national varieties of English

This one comes up from time to time and I was wondering whether it was worth dealing with it once and for all. The problem is that the value of WP:MOSTIES as currently written and of having individual templates for so many purported sub-types of English is not clear. Yes all these varieties of English (and more) exist when it comes to informal spoken English and dialect, but there are only a limited number of "types" of standard English when it comes to the formal, written language. The main distinction understood across the world is between "British" and "American" English and mostly relates to not much more than a few minor spelling, grammar and vocabulary differences, as well as date formatting (and with vocabulary, WP aims to find common ground anyway; plus the occasional presense of a "foreign" term – or local term, depending on your perspective – doesn't suddenly mean we are dealing with a whole new variety or style of English, just as the presence of the occasional technical term doesn't). The purpose of these templates should be simply to explain what style readers can expect to see and what style editors need to follow when adding content.
However, they seem to be more about declaring national ownership of topics and creating multiple walled gardens within WP than helping with the above; indeed they arguably add confusion, because on seeing them, people might wonder whether there are further differences that they are not aware of between "XX English" and a more familiar standard (per the above, invariably there are not anyway). Equally, several of them talk about the article using "XX dialect", which is completely misleading, suggesting as it does that the page should use local slang or informal terms.
In actual, practical terms, all that are really needed are the Engvar A, B, C and O templates, perhaps with a couple of additional ones and some variation to the text, eg to say probably at most three or four templates which say something like: " .. uses American-style spelling (labor, traveled etc), grammar (write me) and date formatting (mdy), and sometimes vocabulary (sidewalk etc)" and " .. uses British/Commonwealth-style spelling (labour, travelled), grammar (write to me) and date formatting (dmy), and sometimes vocabulary (pavement etc)" etc. The rest of these can probably be deleted, and WP:ENGVAR amended accordingly, as all of them are, in effect, covered under one or other of the composite ones. I guess this might require an RFC here and a TFD debate as well but would be interested in others' thoughts first. Just to repeat, this is about the practical issues involved when it comes to formal writing, not some kind of crusade to deny the existence of other variations in English. N-HH talk/edits 10:12, 31 October 2016 (UTC) Wording amended post-discussion to make point clearer for posterity, FWIW. Relying on Engvar A, B etc to make point caused distraction, not least because it's not 100% clear what they mean currently. N-HH talk/edits 09:50, 3 November 2016 (UTC)

TLDR: are 20-odd different "national" templates really needed to explain to readers and contributors what spelling etc rules a page is following? N-HH talk/edits 10:19, 31 October 2016 (UTC)

  • They're not needed but they certainly are useful, and I think we should be a lot more conservative than we sometimes are when applying TIES. If we have an article on a Mainland Chinese actress who sometimes works in Hong Kong, and the user who originally wrote the article wrote in American English, then "ties" to Hong Kong English should not overrule PRESERVE. TIES should not even apply. I think marking one's own work as being written in the variety of English you were using is a good idea. This doesn't apply to "national varieties" specifically, mind you. I write articles using Oxford spelling, and having a template to tell later editors that this is not "American spelling" or "British spelling" or a random mixture of both is nice. I am assuming the Oxford template would be among the "20-odd different national templates" being referred to. And of course WP:FORMAL bans the use of informal spoken dialect words and grammar when writing Wikipedia, so these templates do not create a problem on that front. Hijiri 88 (やや) 12:19, 31 October 2016 (UTC)
OK, but you haven't explained why they might be "useful". My whole point is that people are taking that for granted when, in actual fact, they don't flag up any actual differences anyway. Hence they are arguably, by definition, very much useless, and if anything distracting and confusing. As for Oxford spelling, that's included broadly under Engvar O, which I said should stay – again, the point is not to remove options but to remove different and confusing ways of flagging up what is, in practical terms, exactly the same option. As for dialect, some of these templates create a problem precisely in that their wording contradicts the expectations expressed in WP:FORMAL. If they're not deleted, that wording has to change at least. N-HH talk/edits 12:37, 31 October 2016 (UTC)
I said I think marking one's own work as being written in the variety of English you were using is a good idea. That is how the templates are useful. Hijiri 88 (やや) 13:06, 31 October 2016 (UTC)
I agree entirely with Hijiri88. If more editors marked pages they created with the ENGVAR in use, a lot of time and hassle could be avoided. Peter coxhead (talk) 13:13, 31 October 2016 (UTC)
I know what you said Hijiri88; just like I know that I have said, twice now, that it makes no actual difference whether someone thinks, for example, they are writing on WP in "Indian English" or "British English", or "American English" or "Israeli English", since each pair have the same style conventions in formal writing. As I said at the start, this seems to be more about declaring something for the sake of it rather than about any practical benefit or clarity. I also agree pages should be marked – it's a question of what with. Multiple, variant templates which essentially say the same thing, but still lead to arguments because people want the "right" flag on something, or a limited number of templates which consolidate identical options under one clear, consistent and discrete, but flag-less, heading? N-HH talk/edits 13:21, 31 October 2016 (UTC)
Except that words are spelled differently even within so-called "national varieties" in formal writing. If you have some specific templates that should be deleted (and I think I might agree with you on Indian and Israeli English), that discussion is for TFD, mos MOST. Hijiri 88 (やや) 21:16, 31 October 2016 (UTC)
Agreed, there are always quirks and alternatives within otherwise standard types (eg burned vs burnt), if that's the kind of thing you mean, but we're slightly stuck with that. The point is about the broad principles where there are pretty fixed (and limited) differences, eg -our vs -or and -ise vs -ize. Anyway, I plumped for here for the discussion as ENGVAR sets the basic principles, and refers explicitly to the various "types", and the templates ultimately follow from that. I posted a note on the talk page for the template category, but didn't want multiple fora at once. TFD may be the next step. N-HH talk/edits 13:01, 1 November 2016 (UTC)
Just to comment, briefly, I'd say that all of those are new creations, except 'EngvarB'. EngvarB was created for the purpose of tagging articles that used spellings that were obviously not American, but not identifiable as being specifically British, Canadian, &c., and so hence could not be tagged as 'use British English', &c. These new 'EngvarC' and 'EngvarO' templates, created in September, should be deleted. They serve no purpose and directly defy the existence of EngvarB. RGloucester 23:38, 31 October 2016 (UTC)
From the descriptions I note that:
  • A - "articles that have American English spelling"
  • B - "articles that have non-specific spelling that cannot be identified as American English or Canadian English spelling"
  • C - "articles that have Canadian English spelling"
  • O - "articles that have generic Oxford spelling"
so it appears that British spelling is a "non-specific" usage defined by being not American. Hmmm. Martin of Sheffield (talk) 09:45, 1 November 2016 (UTC)
The ones I was looking at are the ones linked from the Varieties of English templates page, which seem to be slightly different, eg this one. They explicitly refer to American, British/Commonwealth, Canadian and Oxford (although not by name in the latter case) spelling. As I say, they may need some tweaking or even renaming, but I like the way they refer to "spelling" – rather than "[type of] English" or, worse, "dialect" – and don't have flags. Plus as I say, they would limit the proliferation of nation-based templates, in that "Indian English", "British English" etc would all fall under what is currently called "B"; "Israeli", "Philippine", "American" English etc would fall under current A and so on. N-HH talk/edits 12:53, 1 November 2016 (UTC)
Have you noticed though that the first sentence of Template:EngvarB_spelling is "This template may be included on talk pages or edit notices to alert other editors that the associated article is written in EngvarB spelling." where EngvarB spelling links to {{EngvarB}}, that is "non-specific". The whole issue is a minefield though, terming English-English as British-English to appease the Americans is a sure fire way to upset the Scots and Welsh who have their own distinct dialects (and indeed languages). :-o Martin of Sheffield (talk) 15:04, 1 November 2016 (UTC)
As a Scot, I can say with absolute force that standard written English as a written in Scotland is not in any way different from standard written English as written in England, Wales, or Ireland. We are not talking about spoken 'languages' and 'dialects', only standard written forms. Encyclopaedias are not written in dialect. This is beyond the pale. EngvarB, once again, serves to identify articles that use spelling that is identifiably not American, i.e. rooted in British usage, but not clearly British, Irish, &c. It has a separate purpose from the likes of the Use British English template. The other brand new templates DO NOT MAKE sense for that reason, as they duplicate existing templates and defy the purpose of EngvarB. RGloucester 19:28, 1 November 2016 (UTC)
Agreed on the first point, and that's part of the reason for my original suggestion: the proliferation of "national" templates and the declarations sometimes found in them that a particular page is written in "dialect" are at best confusing and at worst misleading, in practical terms if nothing else. See this one, for example. As for issues with the A, B etc ones, as I have said from the outset, and again just now, "they may need some tweaking or even renaming" but the broad principle underlying them and the idea of using them as a basis for a rationalisation – which is what I was suggesting we look at – seem sound to me. That said, we're not really getting anywhere, perhaps inevitably in situations like this, where it's always hard to get agreement – but also partly because the usual nationalist considerations are creeping in and partly because people aren't even reading what others are actually saying properly. I'm resigned to this one remaining the usual confusing mess. N-HH talk/edits 10:07, 2 November 2016 (UTC)
  • Let's have U and non-U. EEng 14:38, 1 November 2016 (UTC)
  • {{EngvarB}} is inserted by a bot. It should be removed wherever found and replaced with the appropriate national template. Hawkeye7 (talk) 12:21, 2 November 2016 (UTC)
  • We need to get rid of most of the ENGVAR templates. They're keep multiplying off into the land of pointlessness and divisiveness, asserting silly fine-tooth-comb distinctions (like Jamaican, Barbadian, etc.) that simply do not exist in written, formal-register English. These hair-splits are really just a major written variety (usually British, though a handful are American-derived) with some minor local vocabulary differences (most of which should be studiously avoided as colloquialisms), and zero grammar or inflectional morphology differences of any kind at the encyclopedic prose level. While I think we'd be okay with retaining an ENGVAR-recognized distinction between US, British, Canadian, Australian, Irish, New Zealand, South African, Indian, and a catch-all Commonwealth, we can probably dispense with the others, and we might even be able to compress this to US, Canadian, and British/Commonwealth, unless we're certain that in encyclopedic writing there's going to be a marked and important distinction between all these slight variants of British/Commonwealth English. I've been warning for some time that we'd start seeing demands for Scottish English, etc., and here we are already. Next it'll be Philadelphia English, and California English, and U.S. Virgin Islands English, and British Virgin Islands English, and "German English" as picked up from NATO air force bases, etc., etc. This has jack to do with writing an encyclopedia; it's indistinguishable from demanding a set of templates like a badge to collect, and special "territorial rights" over articles and topics for nationalist, local, or ethno-cultural pride reasons. I think it is running in close parallel to various other bouts of territorial article-control chest beating WP has been beset with over the last couple of years. Enough already.

    PS: If anyone thought I was making a fallacious slippery-slope argument, they'd be wrong, since we already have templates of all the sorts I just provided hypothetical examples of, including city (Hong Kong), subnational (Scottish), small island (Barbados), and unofficial ESL (Israel, which has only 2% of its population as native speakers of English, mostly recent American emigrés). For starters, any such templates used on less that 100 articles already should just be WP:TFDed as pointless. If a combined British/Commonwealth one isn't practical, they can at least be compressed regionally, e.g. Au/NZ, and the three or four African ones. Even the articles on things like Nigerian English indicate no difference from other dialects other than absorption/invention of some local vocabulary, which is true of all, even very local, dialects. WP needs to delete most of these, and consolidate the remainders in ways that discourage the creation of more territoriality-forking. A side problem we tried to resolve about 2 years ago, and got consensus for here, but then a canvassing effort derailed it later, was to combined the various different sorts of these templates into a single one for each Eng. var., and lower-key than the present huge banners which just seem to serve a "only editors from X are welcome at this article" claim-staking purpose.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  09:54, 3 November 2016 (UTC)

I think I'd go for the narrowest option: are there really any practical differences that go beyond US vs British/Commonwealth, plus maybe Oxford-style and Canadian (which, in effect, can be seen as hybrids of the initial two)? Spelling etc in formal writing in Ireland and India for example almost certainly follows the same rules as standard B/C. I'd happily limit it to those four, and then require evidence of genuine difference before any more are added. N-HH talk/edits 10:32, 3 November 2016 (UTC)
If we separate orthography from word choice, then it would be possible to combine the templates as you say, into broad categories of 'British/Commonwealth' (including the Oxford variant), 'American', and 'Canadian'. Of course, New Zealand and Australia have vocabulary differences, likewise with all the various countries lumped under 'Commonwealth'. For example, 'Lorry' is no longer used in Australia, for example, but is still used in the likes of Hong Kong. But as I say, if we narrow down ENGVAR to orthographical differences like '-ise/-ize' and 'centre/center', which seems logical, this may be a solution. RGloucester 13:19, 3 November 2016 (UTC)
Sure. I'm not meaning to suggest there should be no way to flag an ENGVAR-recognized issue like Oxford spelling, but that can be done with a template parameter. As for word choice, there are differences at least as large, I would think, between Califoria, Texas, and New Hampshire English as between .au and .nz, but we don't need templates about it. It's surely enough to say "the term 'lorry' isn't right for an Australian article; use culturally appropriate terms per WP:TIES" on the talk page or an edit summary. The ENGVAR problems are when someone comes along and says "WTF is all this -ise and -our stuff?" and makes it all Yankeetalk or vice versa. So, agreed we only need templates (if we really need them at all) for that sort of orthography and grammar thing, not vocabulary.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  16:03, 3 November 2016 (UTC)
I agree that the odd word difference is secondary to the systemic spelling differences. Often the boundaries are quite fuzzy there anyway: eg "truck" is more American than "lorry", but it's not a forbidden word in the UK (although something like "sidewalk" for example might jar more, which is why my suggestion for template wording did refer to vocabulary too, albeit slightly more weakly and as the last category, way after spelling; it probably is worth mentioning, as a note to people unaware of such things). Especially when we are often talking about only one or two words in some cases, and often in a specialist context, I certainly can't see the need to create a separate new template each time for a wholly discrete "type" or "variety" of English simply on that basis. Are Irish and British English wholly different things simply because one uses "Prime Minister" both locally and generically, and one uses Taoiseach when referring to the local one? Or Scottish and British because of "outwith" and the slightly different legal terminology? As noted, when that kind of thing does come up, the wording can easily both be switched to the correct term in context, and be glossed if necessary, without great debate anyway, just as we would with a technical term. It doesn't require flags and templates. N-HH talk/edits 14:59, 4 November 2016 (UTC)
Because people have taken ENGVAR as a licence to create such territorial templates, without regard for the purpose of ENGVAR. We need to reign this in, as SMcCandlish says above. There is no 'Israeli English'. RGloucester 13:21, 3 November 2016 (UTC)
And, yes, it's a dispute weapon. The rationale behind the canvassed thwarting of the deprecation of the worst of these templates back-when was, essentially, "we keep having to fight people at these articles, so we need these giant, stern banners to whack them with."  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  16:03, 3 November 2016 (UTC)
  • In Australia, we have our own style guide for English, which covers a variety of things, including spelling, grammatical issues like the placement of commas, words and units used, and date formats. It's not just for spelling, and I get people fiddling with commas and the like. Hawkeye7 (talk) 21:10, 3 November 2016 (UTC)
    • If you're referring to the AGPM, also known as "the Snook book", puhlease, let's take a rain-check on it. Parts of it are quite good, but it's got some shocking ly bad advice; more importantly, it's not recognised as an authority beyond the civil service. Tony (talk) 04:14, 4 November 2016 (UTC)
      • And I agree with the angle above that favours international cohesion and simplicity (RGloucester et al.). This practice of stamping disparate and less-well-known local varieties of English on articles needs to be minimised, although with a little senstivity where necessary. The strongest argument for privileging a small group of standard, homogenous varieties (largely binary US–UK) is that it makes WP more accessible worldwide, and I mean to second-language speakers as well as native-speakers. For example, writing an article on an Indian town in uncompromising Indian English (itself not a single variety) disadvantages many readers who don't know the variety. If that choice were prompted by the notion that such an article is only read by locals, we've lost the key battles of "anyone can edit" (and of course, reaching out to the world). Tony (talk) 04:25, 4 November 2016 (UTC)
        Yep. I've long had this concern about the "branding" of numerous Caribbean articles as various things like "Barbadian English", "Jamaican English", etc., sometimes with a self-conscious attempt to sprinkle them with colloquialisms that mean nothing to anyone but locals. It's probably the same pattern as you observe in India articles.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  05:28, 4 November 2016 (UTC)
I think that many editors forget WP:COMMONALITY, and that it is high time that we give this principle a more prominent position. The guidelines, as written, already discourage such nonsense as described by SMcCandlish above. As an example, in past attempts to justify the creation of a 'Scottish English' template, editors have used the dialectal word 'outwith' as the prime justification. As I said then, 'outwith' is, first of all, not limited to a so-called 'Scottish English', and, second of all, is a dialectal term that should not be used per WP:COMMONALITY, as the standard 'outside' is also present in the usual Scottish lexicon. RGloucester 15:18, 4 November 2016 (UTC)
  • Some meta-analysis on nationalistic style books, colloquial and formal registers, vocabulary versus grammar distinctions, slow-down of dialectal divergence, and what we need ENGVAR templates for:
Extended content
The Australian government has published a style guide, Style Manual for Authors, Editors and Printers. Wikipedians (or anyone else) aren't required to follow it, and MoS isn't likely to be changed by it, any more than it is by The Canadian Style and other "nationalist" guides. The principal value in these is identifying spelling and vocabulary differences, not trying to derive grammatical or punctuation "rules", which are more dependent on target publishing market than anything else. Actual Australians' opinions on the merits of the government book, and their willingness to go along with it, vary widely. There are also alternatives, e.g. The Cambridge Australian English Style Guide, adapted from the UK one by its original editor, and diverging very little between these versions. I have a copy of the most recent editions (that I'm aware of) of both of these, at painful shipping expense (exceeded the cover price of the books). Nothing in them that I've seen suggests .au English is so different from .uk English in formal, written form that they're incompatible enough to warrant big template warnings about which ENGVAR is used, so that British people don't accidentally to Briticisation violence to an Australian article or vice versa. To the extent any non-vocabulary-trend differences can be identified at all, they're actually all covered by the divergences between various competing British style guides. That is, the .au ones do not seem to have hit upon anything "uniquely Australian". If they have, it's so minor and obscure it's unlikely to affect WP editing.

Any fiddling with commas and such is much more likely to be either a) editorial judgement based on writing preferences that pre-date the .au govt. book, or b) MoS compliance. The important dialectal differences between Commonwealth varieties (aside from Canadian, which is a mish-mash of American and British with some French influences) are mostly a matter of regional-culture vocabulary, and most of those are colloquial. Same goes with divergence of Irish English and whatnot from English English.

If any group of dialects was poised to take off in its own direction (in a formal not just colloquial register) any time soon, it was probably the South Asian cluster, since there are actual grammatical differences that seem to be evolving, though these too are widely regarded as colloquial at present (e.g. "The software allows to edit PDFs.", a dropping of the verb's object with which we're all familiar via Asian-sourced technical documentation). But a written divergence on the scale of the historical US<UK split is not likely to happen to South Asian English in our lifetimes after all. Nor is the already-wide colloquial split between British English proper and the English of the British and formerly-British Caribbean likely to transform into a clearly distinct formal Caribbean English.

The Internet has not only slowed dialectal divergence, it's erasing some bits of it, and is even increasing cross-dialectal assimilation of innovations at all levels. For example, aside from noting that British slang like "gingers" for "red-haired people" sometimes becomes assimilated almost overnight into US English via easy availability of popular UK TV shows over the Internet, I've also observed a sharp increase in object-free "allows to" in American and British technical writing, inherited directly from South Asian material over just the last five years or so [the vector probably being "app markets"], though still largely confined to product documentation and marketing.

But neither "gingers" nor "allows to" are encyclopedic writing matters, being too colloquial. The real ENGVAR disputes on WP are usually around mass-conversion (or proposed conversion) of American -ize / -or / -er / -ile / Dr. patterns to or from Commonwealth/British -ise / -our / -re / -ilst / Dr (or Oxford-variant -ize / -our / -re / -ilst / Dr ), and similar "you're doing it all wrong" dialect whitewashing. I think this still gives us basically three global dialect clusters we need concern ourselves about at the templating level: 1) US; 2) British/Commonwealth [ignoring the exact politico-legal definition of Commonwealth of Nations membership], with Oxford variant, and 3) Canadian, to the extent that .ca usage has finally more-or-less settled on an -ize / -our / -re / -ile / Dr. hybrid after numerous national surveys have brought about increased consistency in .ca style guides over the last generation, though not without controversy (especially over Dr. versus Dr and -ize versus -ise, but with near unanimity on how to spell colour and theatre, and no preference for whilst).

 — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  05:28, 4 November 2016 (UTC)

Page move discussion

There is a relevant style/title discussion @ Talk:Mk 14 Enhanced Battle Rifle. Primergrey (talk) 02:16, 7 November 2016 (UTC)

MOS:LQ is self-contradictory

Source text:

Bierstadt Lake is surrounded by a thick pine forest, and is ringed by sedges that give it a very serene appearance.

Wikipedia prose:

A dense pine forest encircles Bierstadt Lake, and the lake "is ringed by sedges that give it a very serene appearance."

This MOS:LQ statement says to put the period inside: "Include terminal punctuation within the quotation marks only if it was present in the original material, and otherwise place it after the closing quotation mark."

This MOS:LQ statement says to put the period outside: "If the quotation is a full sentence and it coincides with the end of the sentence containing it, place terminal punctuation inside the closing quotation mark. If the quotation is a single word or fragment, place the terminal punctuation outside."

Give-and-take is a good thing at Wikipedia, but useful compromise does not mean creating self-contradictory guidelines. How would you go about resolving this problem? RfC? ―Mandruss  19:40, 29 October 2016 (UTC)

This might be just me, but I would prefer to avoid this sort of quote altogether. Either quote or don't. Preferably don't in this case; just paraphrase the whole thing. --Trovatore (talk) 19:49, 29 October 2016 (UTC)
Yeah, my concern is more for the guideline than for the article. I'm not here for article help or writing advice. The point is that we have a self-contradictory guideline, and that does more harm than good. ―Mandruss  19:55, 29 October 2016 (UTC)
Well, but it might be self-contradictory only in a case that shouldn't happen at all, which isn't so much of a problem. It's true though that I wouldn't want to actually ban that sort of quote, so sure, I'll buy that there might be at least a theoretical problem. --Trovatore (talk) 21:51, 29 October 2016 (UTC)
@Mandruss: the two extracts are confusing when read together, I agree. I don't think they are strictly logically contradictory if you pay careful attention to the only in the first of your extracts from MOS:LQ. It does not say "Include terminal punctuation within the quotation marks if it was present in the original material". It is attempting to impose a condition, namely that only if the terminal punctuation was present in the original material should it be included within the quotation marks. It would be better to word the first extract in the negative: "Do not include terminal punctuation within the quotation marks if it was not present in the original material; if it was not, place it after the closing quotation mark."
There's also an issue as to what is meant by "fragment" in the second extract; does it mean that removing any number of words from a sentence always leaves a fragment, or (as I interpret it) does "single word or fragment" imply that the fragment is a small part of the sentence, e.g. just a few words. I'd place the period/full stop inside in your example, but outside in Bierstadt Lake has been described as having "a very serene appearance". But whether this is what was intended or not is another matter! Peter coxhead (talk) 21:48, 29 October 2016 (UTC)
Thanks. Perhaps we can all agree that a guideline is not effective if its application requires close study, parsing, and analysis on its talk page. That would be a good starting point. ―Mandruss  21:55, 29 October 2016 (UTC)
  • I have every confidence you guys will work this out. I just want to say two things:
  • This LQ stuff is certainly one of the most effective uses of editor time anywhere on Wikipedia. Millions of articles have been immeasurably improved by the guidance it gives. Really. I really mean that. No kidding. Seriously.
  • Trovatore's idea that "this sort of quote" should be avoided is nonsense. And that I really do mean. I don't want to sound harsh, but I'm constantly amazed by the narrow ideas people have about what constitutes good formal writing.
EEng 23:23, 29 October 2016 (UTC)
Re bullet 1: For Pete's sake, how is it helpful to be sarcastic and then state unequivocally that you are not being sarcastic? Either the guideline is important enough to be made more helpful than harmful, or not important enough to exist. I don't really care which, but I do feel strongly that one or the other needs to be chosen. ―Mandruss  00:51, 30 October 2016 (UTC)
I was being sarcastic when I implied I wasn't being sarcastic. LQ was devised by people who mistake English punctuation for a programming language. But since we seem stuck with it I hope you guys iron this wrinkle out. EEng 03:24, 30 October 2016 (UTC)
LQ was devised by people who value accuracy - what's inside the quotation marks, including punctuation, should match the original text. If it were devised by programmers we'd have punctuation inside and outside the quotation marks:
Did Darla say, "There I am?"?
No, she said, "Where am I?". — Preceding unsigned comment added by Mitch Ames (talkcontribs) 04:08, 30 October 2016 (UTC)
No, LQ was devised by obsessives whose preoccupation with an imaginary problem has put them on a crusade against forms every good publication uses, and with which everyone is familiar e.g.
"I like vanilla," she wrote.
-- instead insisting on idiocy like
"I like vanilla", she wrote.
or maybe (for all I know)
"I like vanilla.", she wrote.
Every reader understands that punctuation at the boundary of a quotation might be modified in certain standard ways as part of the transition to the non-quoted material. LQ doesn't promote "accuracy" but rather turns the familiar and attractive into the unfamiliar and ugly, for no reason. EEng 07:23, 30 October 2016 (UTC)
When you say "everyone" and "Every reader", perhaps you mean "everyone in the US". Quotation marks in English#British practice suggests that the rest of the world might think differently. Mitch Ames (talk) 08:40, 30 October 2016 (UTC)
I gather you've never actually read Fowler, which the article you link draws on. And as that same article makes clear, British usage embraces both "British" and "American" styles in various contexts, so my original statement stands: readers understand that punctuation at the boundary of the quote might not be exactly as in the original. EEng 17:41, 30 October 2016 (UTC)
@EEng: commenting on your post is, like your original comment, just adding to the time that has been wasted on this issue, but since you set the example, I'll join in. Suppose she actually wrote "I like vanilla." Then, in some versions of LQ, it's correct to write "I like vanilla," she wrote. The comma inside the quotation marks replaces the full stop that was there in the original. Suppose instead she actually wrote "I like vanilla but not in coffee." Then, again in some versions of LQ, it's correct (although misleading as a quotation) to write "I like vanilla", she wrote." Obsessive? Sure. Irrelevant to 99.9% of Wikipedia editors? Sure. Should we waste time arguing about it? No. It's not the issue Mandruss raised, which is lack of clarity. Peter coxhead (talk) 10:42, 30 October 2016 (UTC)
I wasn't suggesting anyone argue about it. See [1]. EEng 17:41, 30 October 2016 (UTC)


  • I agree with Mandruss, and I remember raising this issue here twice, starting in mid 2014. The point about LQ is not clear, so it's hardly a surprise to see that MOS:LQ is rarely adhered to. JG66 (talk) 03:35, 30 October 2016 (UTC)

I'd change MOS:LQ to: Include terminal punctuation within the quotation marks only if it was present in the original material and is grammatically required by the quoted text; otherwise place it after the closing quotation mark. So I'd write:

A dense pine forest encircles Bierstadt Lake, and the lake "is ringed by sedges that give it a very serene appearance".

The full stop is outside the quotes because all though it is present in the original text it is not grammatically part of or required by the quote - the quoted text is not a full sentence, so does not require a quoted full stop. This is consistent with Keep them inside the quotation marks if they apply only to the quoted material and outside if they apply to the whole sentence. In this case the full stop terminates the entire Wikipedia prose sentence, not "only the quoted material". Mitch Ames (talk) 04:24, 30 October 2016 (UTC)

I would tend to go the other way and include the period in the quote. Maybe it's not "required", but putting it outside suggests to me that the quoted sentence did not end there; and it did. In any case, I think what we have here is an option to do it either way, neither of which is incompatible with the principle of LQ. Maybe we can express it in a way that removes the contradiction, without necessarily being more prescritive than necessary to keep it "logical". Dicklyon (talk) 05:16, 30 October 2016 (UTC)
The correct way to show an option either way is: "The period may be placed inside or outside the quotation mark, at the editor's discretion." Or just say nothing at all. ―Mandruss  05:50, 30 October 2016 (UTC)
The point of the guideline is not to "show an option either way", but to specify a single consistent way of doing it. If you wanted to change the guideline to "at the editor's discretion" that would be a separate proposed change to the guideline. Mitch Ames (talk) 05:58, 30 October 2016 (UTC)
I agree, and I was not and am not proposing that. I was replying to Dicklyon's comment. As I said, my only concern is eliminating the apparent contradiction to avoid AT conflict and circular editing, both of which are wastes of editor time. Alternatively, scrap the guideline, but that would be a harder sell I think. ―Mandruss  06:03, 30 October 2016 (UTC)
putting [the period] outside suggests to me that the quoted sentence did not end there" — On the contrary, the point of LQ is that the absence of the period inside the quote does not imply anything about the original text other than what is actually quoted. The source text could have been "... and is ringed by sedges that give it a very serene appearance, reminiscent of ..." and it would not matter; the Wikipedia prose would be just as correct and have exactly the same meaning. Mitch Ames (talk) 05:55, 30 October 2016 (UTC)
  • Comment - In my opinion, people who oppose the existence of a guideline should initiate an RfC for its removal. Otherwise they should refrain from disrupting discussions about its improvement. ―Mandruss  06:13, 30 October 2016 (UTC)

To respond to this entire thread at once:

I'll collapse-box it, so incoming peeps can focus on the proposal below
  • The 'Do an anti-LQ RfC if you want but stop disrupting improvement discussions' sentiment: Or they could just give it a rest, since launching anti-consensus "challenges" every six months or year in hopes they eventually WP:WIN is tendentious. I think that's finally been sinking in, given what happened in Feb. after someone's seven-year campaign on this non-issue finally worse out community patience.
  • Our Quotation marks in English article is a counterfactual OR trainwreck (see previous sentence for a hint as to why). I've been meaning to rewrite it for months, but half the source materials I need are in boxes due to my landlord starting construction and not finishing it yet (and to me procrastinating before that because of the drama surrounding the topic).
  • Even "everyone in the US" [likes typesetters' quotation, TQ, falsely labeled "American style"] would be dead wrong, since it's been Americans who have most promulgated LQ as preferable, not just acceptable, in precision writing. These days, American professors in multiple articles on the topic admit they can't get American students to revert back to TQ unless they directly punish students by assigning bad grades to papers with LQ, which many of them use now as their automatic, natural preference. When a "tradition" has to be coerced it is no longer a tradition but reactionary indoctrination.
  • As for "readers understand that punctuation at the boundary of the quote might not be exactly as in the original" [in TQ]: Readers also understand that in LQ the punctuation inside the quote will be found in the original, removing all doubt. The second option is objectively better practice for a project like WP. It's why LQ was adopted here, and why it has stuck for over a decade in the face of recurrent but misinformed fist-shaking by a few who incorrectly see this as some platform for nationalistic campaigning. They're invariably unaware of LQ's actual history.
  • There is no single British/Commonwealth/International/non-US quotation style, and while most of them were inspired initially by LQ around a century ago, they cannot be equated with it, and they drift further and further apart (most of them are promulgated by competing news organizations that like to set themselves apart from each other, much as the New York Times still maintains its own style guide against the Associated Press Stylebook that dominates North American news publishing). They all have differing rationales for their variances from LQ and from each other. Of all major UK publishers with public style guides, only one actually matches LQ's output, and it does so accidentally, because its rules have nothing to do with LQ's reasoning.
  • '"I like vanilla," she wrote. The comma inside the quotation marks replaces the full stop that was there in the original.' That's a rule from a couple of the British/Commonwealth quotation styles, and has nothing to do with LQ. LQ is just one actual rule: Except for truncation, "..."-indicated elision, and square-bracketed interpolations/alterations, do not change a quotation by insertion into or replacement of its content, including punctuation. Converting a period/stop to a comma inside the quote isn't LQ at all, it's a UK/Commonwealth news thing.
  • The "obsessive programmers invented LQ" fairytale is patently false. LQ was a gradual evolution in the 19th century, in linguistics, philosophy, textual criticism, and other fields with a need for content accuracy. It was adopted (with modifications for expedience over precision, which keep drifting) by publishers pretty much everywhere but the US throughout the late 19th and early 20th centuries. This was followed by subsequent further promulgation in the 20th century by technical and legal writing in the US (the legal situation reversed itself by ca. the 1960s, when the courts started mandating a consistent style and going with TQ because it matched what they were used to in older documents). Quite late came LQ's totally dominant position in online writing, before the Internet was even broadly open to the general public. Programmers did having something to do with that, but it was also advanced by people in many other disciplines, and the basic fact that only some of them were American. More recently, even the last several editions of Chicago Manual of Style, the last bastion of TQ in nonfiction book publishing, have specifically recommended LQ for "computer writing" (on the strength of the same work that popularized LQ as "the Internet style" since the 1980s), without even narrowing what "computer writing" might mean, reinforcing its online use as "sanctioned" by a leading US style guide. CMoS also recommends LQ for several other things, including quoted material in textual analysis and criticism (see if anyone can guess what direct quotation in an encyclopedia often qualifies as), and linguistic glosses (which WP uses literally millions of times), as well as conceding its use in writing about philosophy (see if you can guess what a lot of other encyclopedia material is). It would be nuts for WP to bounce back and forth between multiple quotation styles in the same article, especially when the only "reason" to do so would be to introduce less accuracy and precision, all just because someone wants to argue subjectively about whether a particular segment is philosophical, linguistic, technical, or text-analytical "enough", or because "gimme muh 'Merican quotes". Never going to happen.
  • "If [LQ] were devised by programmers we'd have punctuation inside and outside the quotation marks." There is actually a British journalism quotation style that requires exactly that, I kid you not. It's rare, probably limited to a single publisher. (SlimVirgin first mentioned it here, and I doubted her, but I ran across it in style-guide research back around March, I think.) And, yeah, it has nothing to do with LQ.
  • It is further proof, as if any were needed, that there is no equating LQ and "British quotation", TQ and "American", or making any kind of nationalist pseudo-ENGVAR case. There are at least 10 different British/Commonwealth quotation styles, with different rules and reasons but with mostly similar end results and lot of overlap with LQ. As for the "oh yeah, well there's not 10 American styles, so Americans should get their one style" complaint: Aside from the fact that lots of Americans use LQ when not forced to use TQ, I think a common-sense way to look at this would be: What if tomorrow the Canadian government and various major .ca publishers said "We're going to use French guillemets instead of English quotation marks, in English, just to be Canadian and different; it will be our new national tradition." Would WP adopt that usage here in articles tagged as Canadian English? Not on your life. It wouldn't serve overall reader interests, only a WP:NOTHERE nationalism agenda, and simply lead to complication, confusion, and dispute. If we actually made some "use TQ in American English articles" rule, there would be a massive land-grab by a small number of virulently anti-LQ people to stake US ENGVAR claims on as many articles as possible. I'm hard pressed to think of anything more disruptive and corrosive to collaboration but we all know that's exactly what would happen.
  • LQ is "ugly and unfamiliar". It's been familiar to all English-speakers but Americans for generations, was familiar to well-read Americans that entire time, and now is familiar to all Americans who use the Internet, thus it's familiar to everyone WP reaches. It just may not be preferred by all of them. It's physically impossible for one style guide to please everyone all the time; there are competing would-be rules for virtually every minute aspect of written English, even the most basic matters like whether sentences must start with a capital letter. "Ugly" is just subjective handwaving. For every American who thinks LQ is "ugly" (by no means all of us), there's someone else who thinks TQ is subjectively ugly as well as objectively a constant source of doubt about quotation veracity, the second point obviously being more important.
  • The hand-waving claim that LQ is mostly not complied with has already been disproven. This was actually researched in Jan. this year, or late last year (I forget – check the archives) by LQ's staunchest opponent (the one now not in our company) and by myself. Compliance is actually much higher for MOS:LQ than for many other MoS line-items, even basic stuff like consistent date formatting. A large percentage of "non-compliant" articles are actually mostly compliant, and were formerly compliant, but were just made inconsistent by later additions by new editors who weren't familiar with MOS:LQ (and perhaps a few who know it but refuse to follow it?), so compliance is actually much higher than it looks at first, especially the higher the quality level of the article (= higher likelihood of MoS awareness and gnoming). And it would have to be, for reasons that should be obvious by now.

 — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  15:43, 3 November 2016 (UTC)

Hard to see how some people think MOS is a massive and pointless timesink. If only you would put a fraction of that effort into getting the links-in-quotes reform, which would be a real accomplishment, back on track. EEng 16:38, 3 November 2016 (UTC)
The endless stream of sarcasm from you about how time-wasting MoS is suggests taking a break from it. :-) I said what I needed to (and supported your last proposal) at links-in-quotes, though not in as timely a fashion I would have liked, due to overwork in Real World Land. The entire thing has stalled with too many competing proposals. I don't think anything will come out of it this time; all commentary on it stopped something like a month ago. I would let it archive, then re-open the topic after a break, picking up with proposals that had some support, and try to merge them.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  17:20, 3 November 2016 (UTC)
I appreciate your contribution re links-in-quotes, but I was hoping you could do more to prod the other participants to return. But what you suggest now seems like a good plan. EEng 17:33, 3 November 2016 (UTC)

Proposal

Let's not start the whole argument about LQ itself again. (EEng is quite right to be sarcastic about the time that has been wasted on this.)

Mandruss has identified a lack of clarity in MOS:LQ. It arises for two reasons which require two changes:

  1. In English, constructions with "only if" are hard to understand unless the context is clear, and the "only" is easily missed. ("It rains only if it is cloudy" does not mean "It rains if it is cloudy". Translating to logical symbols, the first means "rain → cloudy", the second "cloudy → rain".) So I suggest changing Include terminal punctuation within the quotation marks only if it was present in the original material, and otherwise place it after the closing quotation mark to Do not include terminal punctuation within the quotation marks unless it was present in the original material; if it was not, place it after the closing quotation mark.
  2. The word "fragment" is unclear in If the quotation is a single word or fragment, place the terminal punctuation outside. I'd like to change it to short phrase and then add When longer terminal parts of sentences are quoted, editorial judgement should be used.

Peter coxhead (talk) 10:26, 30 October 2016 (UTC)

I don't think "short phrase" is appropriate, because that part of the guideline may apply to longer phrases as well. However the existing instances of "fragment" should probably be "sentence fragment".
I think that instead of "... editorial judgement" we should specify that trailing punctuation should be included inside the quotation marks if ... it was present in the original material and is grammatically required by the quoted text. — Preceding unsigned comment added by Mitch Ames (talkcontribs) 14:07, 30 October 2016 (UTC)
  • I concur strongly with four things here:
    1. Agree with inverting the main passage to remove the confusing "only if". I would suggest this:
      "Do not include terminal punctuation within the quotation marks unless it was present in the original material and is needed grammatically or for clarity; otherwise, place it after the closing quotation mark".
      This solves three closely related problems at once:
      • The "if" versus "only if" confusion goes away.
      • Editorial judgement (per Dicklyon and several others above) is retained: If it was in the original, you can include it if it's grammatically needed, or if the meaning might be harmed or dubious without it (e.g. because the phrasing ambiguously seems to indicate a truncation that wasn't one). The "British" (international, Commonwealth, non-American, whatever) systems of quotation that are most similar to LQ also permit this flexibility, naturally.
      • We bury the silly but not infrequent false assumption that if it was present in the original it must be included no matter what. There is no LQ principle that material cannot be removed, only that material cannot be falsely inserted. So, requiring that something be included in "LQ at WP" would not be LQ at all, but WP inventing some new LQ-related style out of thin air. It would also mean nonsensically insisting that a comma or whatever in the original "must" be retained at the end of a quoted bit, even if it mangles the overall sentence quoting the fragment. Self-evidently goofy, yet I keep encountering incorrect assumptions like this about LQ (mostly from people who oppose LQ, which may either explain their opposition, or explain their misinterpretation, or both, since it's a self-reinforcing negative feedback loop based on an error).
    2. Agree that "fragment" should be clarified as "short phrase", "fragmentary phrase" or something similar; that's sufficient clarification, especially if the "and is needed grammatically ..." bit above is also present. It's not really about "sentence fragments", since almost all quotes that are not self-contained, quoted full sentences, without added editorial intros or tails, are fragments (when they're not, it's just accident that they were truncated in a way that happens to still form a grammatical complete sentence that is not the original longer sentence).
    3. Agree it was not at all MOS:LQ contradicting itself, just sub-optimal wording that led a few editors to misinterpret it; and that, regardless, clarifying it is a good idea since it should reduce such confusion and prevent much further disputation.
    4. Agree that MOS:LQ needing a handful of wording tweaks is in no way grounds for re-opening yet another tendentious "challenge" against a decade+ consensus. Someone who wouldn't drop that stick has already been indeffed. The last thing WP needs is to lose even one more editor over "style wars".
 — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  14:08, 3 November 2016 (UTC)
@SMcCandlish: I agree with your proposal except for "or for clarity", which I think will lead to editors unnecessarily including terminal punctuation. Could your provide an example or two where the terminal punctuation is need for clarity but not grammatically? Mitch Ames (talk) 07:07, 5 November 2016 (UTC)
Well, see the entire discussion basically. The concern is quoted statements that do not necessarily appear to be full sentences and which strongly imply editorial truncation but were not actually subject to any. E.g., anything ending in "just because.", as an obvious example. Maybe there's another way to get at this. LQ is "do not falsify quotations", in its shortest encapsulation. Insertion and in-place alteration is not permitted aside from "..." and [bracketed]. Truncation is permitted, and people have leeway in exactly how they do it. For some reason, some people don't think they do, and incorrectly insist that if I say "Chocolate is good", that the only permissible way under LQ to punctuate a sentence-ending partial quotation of this is in the form "SMcCandlish said 'chocolate is good.'", when of course "SMcCandlish said 'chocolate is good'." is also permissible, is less prone to error, and arguably more sensible, since the period is terminating the surrounding sentence and the quoted part is fragmentary. But the "inside" way is permissible, and opposition to it is hair-splitting (i.e., it's a matter of editorial judgement about clarity, not something we need a "rule" about). The only "risk" with the "...'...is good.'" version it is if a later editor expands it, it might come out as "SMcCandlish said 'chocolate is good,' in a Wikipedia talk page post in 2016." – that's an error under LQ, falsely indicating that my original contained a comma and continued with additional detail. Anyway, another example of the "it really is better inside" case is something like: "Johnson stated, "I deny any wrongdoing, malfeasance." Without the period/stop inside the quote, the unusual "and-less" construction would otherwise make it look like a partial quotation of something that continued: "I deny any wrongdoing, malfeasance, ...".

I don't understand your "will lead to editors unnecessarily including terminal punctuation" concern. Can you provide examples of that?  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  12:47, 5 November 2016 (UTC)

anything ending in "just because." — OK that make sense. My main concern is that - given the option - some editors might simply put the terminating punctuation inside the quotes because it seemed clearer to them because they were used to American style. Being "clearer" might be a subjective thing. Mitch Ames (talk) 13:12, 5 November 2016 (UTC)
Well, it is subjective. Everything that MoS leaves to editorial discretion (explicitly or by simply not saying anything about it, which is often a virtue) is subjective by definition. What I'm getting at is, what harm is there in including internal-to-the-quote punctuation if a) it was present in the original and b) makes sense in the context of the larger sentence? I would prefer to leave it on the outside when it's not needed grammatically or for clarity on the inside, especially if the quoted bit is very short, but that's really just a preference.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  12:22, 7 November 2016 (UTC)

Possessives of scientific Latin names

I notice that some Wikipedia articles attempt to apostrophize Latin taxonomic names (e.g., "Salix alba's leaves" rather than "the leaves of Salix alba.") The latter is favored by scientific literature (e.g., Animal Diversity Web). The apostrophe creates a conflict between English and Latin syntax (alba is an adjective) and can result in absurd constructions such as Rattus osgoodi's where osgoodi is already a possessive (Osgood's rat). If there is agreement that such a guideline is desirable, where in the style manual could it be placed?

Ecol53 (talk) 02:44, 3 November 2016 (UTC)

  • I support discouraging English-style possessive form on Latin words. Where it goes, I have no opinion. Dicklyon (talk) 03:07, 3 November 2016 (UTC)
  • As usual, I'd like to see evidence that this is an actual problem in actual articles, and that editors have not been able to work this out for themselves without additonal MOSBLOAT. EEng 03:18, 3 November 2016 (UTC)
  • Consensus formation on common-sense matters of encyclopedic writing doesn't require "proof" of some serious, widespread problem, only agreement that an idea is poor (or good) and should be mentioned. Either high-quality sources do or do not avoid English possessives tacked onto Latin names as a best-practices matter. That is the point to which evidence applies, and evidence has been provided both above and below that they do avoid it. What is your counter-evidence?  :-)  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  09:18, 3 November 2016 (UTC)
  • Support discouraging English possessive suffixes on New Latin taxons, when used as such: of [the] felidae not feldae's. Does not apply when the common name in regular English coincides with it (gorilla, mastodon, lynx, caracal, etc. – there are many in all branches of biological taxonomy) if the name is being used in that spot in the plain-English way. Also does not apply to Latin-derived English words: since felids' divergence from canids.

    Source: I've pored over Scientific Style and Format (8th ed.), our go-to source off WP for scientific writing rules.

    • It has this to say (§ 22.2.3.1): "Genus names and specific epithets and names may not actually be Latin words, but they are treated as if they were and they have Latin endings. [Capitalization stuff elided] Apply these conventions wherever a species name is used, including running text, titles, tables, indexes, and dictionary entry terms." Throughout, it illustrates the "of X" form, never "X's". Hint: 's is not a "Latin ending", it's an English one.
    • It also does not permit English plurals to be tacked on (§ 22.2.3.6): "Names of genera do not have plural forms. If reference is made to a group of species in one genus, write the genus name followed by the abbreviation for 'species' (spp.): ... Iris spp. not Irises ... Similarly, the binomial name is always singular: Escherichia coli was ... but E. coli strains were" (that last illustrating an add-on modifier alternative to the "of X" style. I also comments that some object to this adjectival use of a binomial (i.e., would prefer "strains of E. coli"), but says the adjectival use "seems to be gaining acceptance".
    • I see no evidence in SSaF that it has any problem with English suffixes added to Latin- or Greek-derived words that have been assimilated into English (e.g., it would not formally object to "her tibia's unusual growth", though I can't find any examples of such a construction in there, probably because it's sloppy-looking writing, of the kind even our own editors would usually avoid. It does use Latin plurals for Latin technical terms, e.g. "supernovae", not "supernovas", but does not appear to push this to extremes, and it always says to follow the style guide of the publisher. So, if a journal expected "indexes" instead of "indices" then SSaF would have nothing to say about that. Likewise, if our own MoS guidelines have some rules that don't agree with that style manual, its authors' heads would not explode.
    • SSaF is consistently applying the "of X" style (or modifier circumlocutions) rather than possessive suffixes to: a) taxonomic terms, and b) things that would be awkward when read aloud (e.g., use "of Dr. Cruces" not "Dr. Cruces's" and definitely not "Dr. Cruces' "), and, from what I can tell, c) technical terms generally where something like "her tibia's unusual growth" seems amateurish. It also recommends d) abandoning possessives for eponymic names, e.g. preferring "Crohn disease" and "the Newtonian laws of motion" rather than "Crohn's disease" and "Newton's laws of motion", but it recognizes that this is an uphill attempt to change a lot of pre-established usage (what WP could call the WP:COMMONNAMEs) to a better, less ambiguous practice.
    • SSaF is also clearly drawing the same distinction I just did above about use of a name as a taxon and use of the same name as a vernacular term (§ 22.2.3.6), in reference to the "Iris spp." versus "Irises" material just quoted above: "note that Iris here is a genus name, not a common name, as indicated by the italic and capitalized first letter".
 — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  09:18, 3 November 2016 (UTC)
  • This is the handbook of the US-based Council of Science Editors. I have a copy of the 8th ed. It's very good. Tony (talk) 06:59, 4 November 2016 (UTC)
    • It's international, just US-published. Previous editions were published in the UK (and it was originally the Council of Biology Editors before they expanded and included the physical sciences). I don't want anyone to think it's some "American thing"; we've had too much nationalistic punditry here over the last couple of years.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  12:26, 7 November 2016 (UTC)

Use of en dashes in the Bibleverse template

Regarding a specific example in MOS:ENDASH:

Do not change hyphens to dashes in filenames, URLs or templates like {{Bibleverse}}, which formats verse ranges into URLs.

(Emphasis added.) In my experience with {{Bibleverse}}, en dashes have no effect on the URLs because the URLs only link to the first verse in a range. E.g., there is no effective difference between Genesis 1:1–2 and Genesis 1:1–2—the same URL (https://en.wikisource.org/wiki/Bible_(King_James)/Genesis#1:1 ) is generated in each instance. I’m not familiar with other templates, but this may also be the case elsewhere.

May we reword the example to allow en dashes with {{Bibleverse}}, and any other templates that are unaffected by the inclusion of en dashes? —DocWatson42 (talk) 05:50, 4 November 2016 (UTC)

Just as long as the output/display is en dashes for those ranges. Tony (talk) 06:57, 4 November 2016 (UTC)
Right. That template-specific note was about template output breakage. We should not delete the note, but replace it with an example to which it still applies.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  12:28, 7 November 2016 (UTC)
And when someone does that, they need to say something clearer than "templates like foo, which does bar". How is someone supposed to know which templates are "like" the example template? Like how? EEng 01:53, 8 November 2016 (UTC)
The template documentation says "Use a hyphen, not a dash, to separate ranges. The dash does not work in all operating system and browser combinations." Is the second sentence correct or not? Mr Stephen (talk) 08:48, 8 November 2016 (UTC)

One template that the use of an en dash does break is {{Convert}}, but in that case the hyphen used to indicate a range of values is automatically (ahem) converted to an en dash in the output.—DocWatson42 (talk) 03:29, 10 November 2016 (UTC)

An MoS talk page has been redirected to a wikiproject

I just noticed that Wikipedia talk:Manual of Style/Military history is redirecting to Wikipedia talk:WikiProject Military history. This strikes me as wholly inappropriate for something that has been designated a site-wide guideline and part of MoS. As far as I know it is the only MoS talk page to which such a thing has been done. I would propose that it be given its own actual talk page, per standard operating procedure. (An alternative would be considering the Wikipedia:Manual of Style/Military history page itself to be a wikiproject advice essay and moving it to something like Wikipedia:WikiProject Military history/Style advice, but I doubt that would be a popular option.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  12:56, 7 November 2016 (UTC)

PS: The guideline also opens with the wording "The Military history WikiProject's style guide is ..." which is also inappropriate. Either it's a Wikipedia guideline, or it's a wikiproject advice essay. There is no such thing as a wikiproject-controlled page that's a guideline. Even WP:BLP, WP:BIO, and MOS:BIO are not "claim-staked" by WP:WikiProject Biography, other than having a declaration by talk-page banner that they're within that wikiproject's scope of interest. Other more topical MoS pages like MOS:JAPAN, MOS:ISLAM, etc., also make no such wikiproject "ownership" claims. [I did find one other that does, MOS:COMICS, but it has been slated for revision and cleanup for over a year, because it has naming convention stuff commingled into it, needs to split into separate MOS and NC pages, and has been the subject of a lot of back-and-forth squabbling on its talk page about its specifics, probably more so than any other topical MOS page in recent memory. The MOS:MIL page is under no such cloud of "how do we fix this thing?" turmoil.]

I would suggest that the compromise for the latter issue is to take the approach used at MOS:CUE, which begins with "This is a style guide for articles that come within the scope of WikiProject Cue sports". This clarifies the scope of the guideline, "advertises" the wikiproject to potentially interested editors, yet makes no inappropriate insinuation of special authority imbued in a particular topic interest group of editors.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  13:33, 7 November 2016 (UTC)

Agree on both counts: MoS talk page ought not redirect to WikiProject talk page, and MoS guidelines ought not declare themselves as "WikiProject's style guide".
The wording used in MOS:CUE is OK, but not ideal. MOS ought not depend on WikiProjects to define the scope of a particular guide. I suggest: This is a style guide for articles that come within the scope of WikiProject Cue sports about cue sports. The talk page includes a link to the WikiProject. Mitch Ames (talk) 01:39, 8 November 2016 (UTC)
I also agree that MOS talk pages should not just redirect to a WikiProject. One reason being, while members of a WikiProject doubtless have a lot of useful knowledge, they may be too close to the subject to be the only arbiter of style and need to work things out with regular editors. For instance, if you are well versed in subject X (as most members of WikiProject X probably are) there may well be certain style conventions, used when knowledgeable people write about the subject, that are second nature to you, but which are not necessarily best for our general readership. It's now 3-0 against the redirect so I went ahead and deleted it. Herostratus (talk) 16:22, 10 November 2016 (UTC)
I've also edited MOS:CUE per my suggestion above. Mitch Ames (talk) 01:05, 11 November 2016 (UTC)
G'day, I had a go reworking the military history MOS page. This is my edit: [2] Please let me know if there are any concerns and suggestions for further improvements. Regards, AustralianRupert (talk) 01:14, 11 November 2016 (UTC)

Discussion notice

The question: Which of the following is correct?
1. the President-elect of the United States
2. the president-elect of the United States

The discussion:
Wikipedia talk:Manual of Style/Capital letters#the P/president-elect of the United States

Please comment there. Thank you for your service. ―Mandruss  18:10, 16 November 2016 (UTC)

I agree. But Randy, could you copy this over to the actual discussion? This is just a pointer and asks for comments to go to the linked discussion. oknazevad (talk) 18:59, 16 November 2016 (UTC)

Remove Institutions of the European Union as an example of WP:TIES?

The article is written in British English, and that doesn't seem likely to change, so our saying here that it should be written in either British or Irish English seems inappropriate. The article does not mention Britain, Ireland, England, Scotland, Wales, the United Kingdom or the UK (the only instances of the word "United" are two instances of "United States"). The United Kingdom is on course to leave the EU. If the article has strong national ties to any particular English-speaking country at this point, it is Ireland, but our citing it here as an example of WP:TIES would necessitate changing the variety the article uses. Hijiri 88 (やや) 23:16, 12 November 2016 (UTC)

I agree it's not a good example of TIES, but that's not because of the UK leaving. The EU is a supranational organisation, so doesn't really have TIES as such to any one country, whether any of them are members or not. Plus of course it raises the question, as discussed above, of what the difference betwen formal, written Irish English and formal, written British English is, and why we would want to be slapping either flag on EU-related pages anyway. As it happens, I initiated the discussion about culling ENGVAR templates and rolling back the apparent obsession in TIES with attaching every specific topic to a specific nation in part after coming across this discussion about the EU, not least because it seemed a rather good example of how TIES and the proliferation of templates also in fact create pointless discussion and dispute, out of nothing, as much as they ever help clarify anything. N-HH talk/edits 11:12, 13 November 2016 (UTC)
I agree that the EU was a bad example. The rationale for using non-OUP (i.e. not lang= en-GB-oxendict) British English spelling would be that it is "EU English", as used by the EU itself (and defined in the EU's style guides). If we wanted to accept this rationale, we should explicity include a sentence on ties to supranational or international organizations. If we accept that it is a good idea for an article about Tolkien to use OUP spelling, we should perhaps also say something like "In an article about a supranational or international organization, it is often a good choice to use the variety of English used by that organization. For example, the article European Central Bank uses -ise spellings." If we don't want to accept supranational/international ties, we should perhaps also say so explicitly, possibly after advertising the discussion at relevant projects. --Boson (talk) 12:28, 13 November 2016 (UTC)
I'd be happy with adding something like that after the note re Tolkien. And of course in many cases we're going to be stuck with what the body uses from the title onwards anyway (eg "World Trade Organization"), so it shouldn't be controversial. Unless anyone has any objections? N-HH talk/edits 15:01, 13 November 2016 (UTC)
Just a question: If the preference for one variety of English over another is not obvious (say, in the organization's name), how do we find out which one to use? Does anyone have any pointers?—DocWatson42 (talk) 13:54, 18 November 2016 (UTC)
Well, the EU, for instance has a style guide:
but the preferred spelling may be obvious from looking at the institution's website.
--Boson (talk) 16:11, 18 November 2016 (UTC)
Definitely not a good example of TIES, but not for the reason given. There's little if any difference between British and Irish written, formal English in an encyclopedic register. However, various international bodies like the EU, UN, etc. don't consistently use British English, but have been developing their own (often conflicting) "International English" internal standards (and they often include features of American English). WP should not be written in any of these pseudo-dialects. Treat these topics the same as we would other general topics like Botany or Shoe, and apply the first-come-first-served rule of ENGVAR.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  21:11, 18 November 2016 (UTC)

Predecessors and successors in infoboxes

Comments are welcome on an RfC over at Talk:Michael Portillo (once a prominent British politician). The question concerns whether the infoboxes of politicians ought to contain predecessors and successors, in keeping with other articles, or whether the MoS doctrine of 'less is more' means that such information should be excluded from the infobox. Any thoughts would be greatly appreciated. Specto73 (talk) 23:38, 18 November 2016 (UTC)


Measures in support of MOS:CURLY

I was surprised to find in an audit of my recent edits and a sampling of random pages that approximately 15% of articles contain one or more curly quotes or apostrophes. There have been previous discussions (2005, more 2005, 2010, 2016) on the problems with curlies. While promoting curlies has always been rejected I'd like to know how much support there is for MOS:CURLY in discouraging use of curlies or even a partial purge of curlies from mainspace.

Full disclosure: I use dial-up, 20 year old hardware and occasionally use a text-based web-browser. Just to say that editors like myself exist as an active part of the Wikipedia community. Also, I've been doing a lot of apostrophe-related typo fixes.

I would appreciate input on the following approaches:

  1. To have a bot convert curly apostrophes and quotes to straight apostrophes and quotes in the mainspace. (It would have to exclude articles belonging to categories like Category:Punctuation where curlies should be preserved, and exclude cases with adjacent apostrophes to avoid italic and bold issues.)
  2. To have a bot check recent edits for curlies and, when found, leave a message on the editor's talk page (similar to DPL bot when an editor links to a disambiguation page) alerting them to MOS:CURLY issues and linking to a page with instructions for turning off curlies in popular software packages. (Would have to take measures against spamming.)

Like most, I didn't think that curlies were too big of a problem. However, it could be pervasive, effecting some 700k articles. And it wouldn't hurt to inform the small minority of editors introducing curlies into articles. If it's agreed that there are problems with curlies, why not fix it? I'd appreciate your thoughts. - Reidgreg (talk) 17:22, 26 September 2016 (UTC)

Please, not another bot gnoming about. If this isn't already something AWB raises an alert on, that might be a good idea. EEng 19:03, 26 September 2016 (UTC)
Just to point out the "why" of curliest in the present, it's a combination of lazy copy-and-paste of quotes from websites, and/or editing in a word processor and then copying over to the Wikipedia editing window. In both cases, it's just some careless editing and can easily be done as part of routine gnomish edits. Is it more pervasive that we'd like, sure, but I don't think a bot just for that task is needed. oknazevad (talk) 22:04, 26 September 2016 (UTC)
But who wants somebody constantly looking for curly quotes in articles? PhilrocMy contribs 13:10, 7 October 2016 (UTC)
The editor says "15% of articles contain one or more curly quotes or apostrophes" which seems pretty high to me. Some (unknown) number of articles must contain no quotation marks, so of articles which contain quotation marks, curlies must appear in over 15%. However, yeah, it's not a big deal; I fix them when I see them (and feel like it) but I don't know if its worth worrying about. And, one small advantage of curlies is they signal a possible copy-paste job which alerts me to look for possible copvio. Herostratus (talk) 22:13, 26 September 2016 (UTC)
Thanks for the replies. The percentage did seem high (15.6% of 500 edits and 14% of 50 random articles)), which led me to believe that AWB editors aren't keeping up. Thus the two-pronged approach to clear the backlog and inform lazy careless unaware editors. Copyvio trumps the curly issues, if they can be used as an indicator, though I imagine (or hope) you get a lot of false-positives that way. - Reidgreg (talk) 16:29, 27 September 2016 (UTC)
I don't think that curly quotes are part of AWB's general cleanup. Mr Stephen (talk) 18:01, 27 September 2016 (UTC)

A somewhat related issue I came across today is AWB, as a part of general clean-up, changing &Prime; and &prime; to ″ and ′ respectively. Since these are hard to distinguish from quotation marks visually (curly or straight, depending on the font) I wonder if AWB should be allowed to do this. Jc3s5h (talk) 17:09, 27 September 2016 (UTC)

I've noticed a correlation between curly quotes and copyvios, so it may not be entirely helpful just to "correct" them blindly. --Mirokado (talk) 18:14, 27 September 2016 (UTC)
Agreed. A large percentage of them are due to copy-pastes from other sources. Only a minority are due to people using external editors that auto-curlyize the quotes.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  10:01, 30 September 2016 (UTC)

::I agree too. What should we make the bot do about a possible copyvio? PhilrocMy contribs 13:20, 7 October 2016 (UTC) Actually, I agree with @Reidgreg: on the fact that if we use curlies as a copyvio indicator, there will be a lot of false positives. PhilrocMy contribs 13:25, 7 October 2016 (UTC)

Above: Just to point out the "why" of curliest in the present, it's a combination of lazy copy-and-paste of quotes from websites, and/or editing in a word processor and then copying over to the Wikipedia editing window. / Above: Only a minority are due to people using external editors that auto-curlyize the quotes. Sometimes it's none of these. One article I often edit has had curly quotes since its start. I carefully curlify my quotes when I add them. This change from normal practice amuses me. As far as I know, nobody has objected to either the abundance of curly quotes in that article or to my addition of them. I've never thought either that I ought to decurl the quotes there or that I should curl the quotes elsewhere. Perhaps it would have been better if Mediawiki had provided a preprocessor for the (X)HTML Q tag; it doesn't, and this really doesn't matter, compared with (for example) the endemic PR work that infects articles with "prestigious", "legendary", and similar bullshit. -- Hoary (talk) 00:12, 3 October 2016 (UTC)
Ohconfucius's scripts, which I run, zap the curlies. I think 15% is a conservative estimate. Tony (talk) 00:31, 3 October 2016 (UTC)
If you'd like to zap the curlies within my article, be my guest. But really, why should people bother with this when their intellects are easily up to the removal of "iconic", "is recognized [by whom?] as", and suchlike twaddle? -- Hoary (talk) 01:09, 3 October 2016 (UTC)
@Hoary: Having MediaWiki, or at least en.wp's own site-side CSS and Javascript, do something intelligent with <q>...</q> has been added to my to-do list. I think it would be desirable for numerous reasons to markup up inline quotation with this element and to allow people who insist on curly quotes in their output get them. However, this won't do anything for non-quotation uses of quotation marks, e.g. "scare quotes", song titles, etc.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  07:32, 3 October 2016 (UTC)

@Reidgreg: @EEng: @Herostratus: @Oknazevad: @Mr Stephen: @Jc3s5h: @Mirokado: @SMcCandlish: @Hoary: @Tony1: Hey, maybe we could make the bot put in this template that I made that tells people that there are curlies that need to be replaced, but also says that curlies are an indicator of a possible copyvio, and that the article needs to be checked as well. PhilrocMy contribs 14:29, 7 October 2016 (UTC)

The template looks good though I'd be concerned about putting it on, potentially, 1 in 6 articles. It seems to me that a particularly clever bot could look for curlified text, google it, and then put up {{copypaste}} or a custom template with its results, though this could generate false-positives from sites that mirror wikipedia. I'll try to investigate more as I'm able. I don't suppose if anyone reading this knows whether there are bots looking for copypaste edits? - Reidgreg (talk) 20:26, 15 October 2016 (UTC)
@Reidgreg: Wow, it took you a long time to respond. Now I think we should make the bot remove the curlies, and then make it put in some sort of template about curlies being an indicator of a copyvio. But what template? PhilrocMy contribs 13:06, 17 October 2016 (UTC)
No. Use a category or something. We've got enough shrill templates no one heeds. EEng 13:36, 17 October 2016 (UTC)
@Reidgreg: @EEng: How about a template AND a category? PhilrocMy contribs 16:01, 17 October 2016 (UTC)
@Reidgreg: @EEng: Wait, maybe we could add a parameter to Copypaste for the bot. It would change the template to say that the bot removed curlies, and then say that the curlies were an indicator that what the template says already might've happened. PhilrocMy contribs 16:34, 17 October 2016 (UTC)
I strongly agree with EEng that a hidden maintenance category should be used instead of a loud template at the top of the article. Given that there are 42,000 articles with prominent templates at the top of unreferenced articles in Category:Articles lacking sources from December 2009, and they have been there for seven years, putting a template at the top of articles with curly quotes is extreme overkill.
Anecdotally, I clean up curly quotes where I see them, and it is rare that I see evidence of copyvio. Do we have evidence for that assertion, made above? I find that they are often scattered through the article without a pattern, or copied into citations from titles of articles on the web. I recommend that we first stop the bleeding by working with the writers of automated tools like reFill, which used to copy curly quotes unmodified into citations (and has since been fixed). – Jonesey95 (talk) 15:51, 19 October 2016 (UTC)
The "evidence" is other editors' good-faith statement that they keep finding copyvios this way. I'm one of them. It is okay that your personal experience differs from that of others. It just means your editing doesn't overlap much.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  09:07, 23 October 2016 (UTC)

As an aside, don't forget that (if you are going to start this) there are several other glyphs that should usually be replaced. We have, in no particular order but starting with the commonest

  • ’ right single quotation mark
  • ‘ left single quotation mark
  • “left double quotation mark
  • „ right double quotation mark
  • ` grave accent
  • ´ acute accent
  • ‛ single high-reversed-9 quotation mark
  • ′ modifier letter prime
  • ‚ single low-9 quotation mark
  • ‟ double high-reversed-9 quotation mark
  • ″ modifier letter double prime
  • „ double low-9 quotation mark

plus at least two Arabic glyphs that look very similar, and the guillemots. Some of these come in from cut-and-paste, but some are added at the keyboard. The primes also have a valid (in the MOS sense) use, of course. Mr Stephen (talk) 22:07, 17 October 2016 (UTC)

@Mr Stephen: I can't really start this until it's clear what the bot is going to do if it finds one of these glyphs. Is there anything that you think the bot should do? PhilrocMy contribs 22:49, 17 October 2016 (UTC)
Actually, I know what the bot will do if it finds one of those glyphs. It will put in {{Copypaste}} with a parameter which changes the notice to talk about how the bot found curlies and changed them, and it also will say that curlies are the sign that what the template says happened. See its sandbox and its testcases. Also, @Reidgreg:, can you give us a better statistic of how many articles have curlies? When I do a BRFA, I'll need to say how many articles are affected. PhilrocMy contribs 12:29, 19 October 2016 (UTC)
Sorry, Philroc, for some reason I got a notification from your last ping to me but not the several preceding it. (Drop a note on my talk page if my account is active but I'm not responding.) I was also considering a maintenance category but decided to shelve it until I could investigate more and develop a better proposal. There's no point if this doesn't ultimately make it easier for human editors. I found curlies in 15.6% of 500 edits and 14% of 50 random articles. When I have time I'm hoping to get a bot to investigate a larger random sample and generate some data that I could then follow-up with manual implementation of a fix, and see how it works over time. There has been a bot request on this subject which I added to while opening this discussion - and I'm glad I did, as this is much more complex than I'd first thought. I want to be responsible to the community's concerns, and some significant issues need to be addressed. Discussion of technical implementation might be a conversation for the bot request page or elsewhere, though. I don't want to take up any more time from the busy people here who have already patiently explained their concerns. - Reidgreg (talk) 13:34, 19 October 2016 (UTC)
You don't need a bigger sample. The data you have so far tells us that about 14% of articles have curlies, give or take about 5% or so. I think that tells you what you need to know. EEng 14:24, 19 October 2016 (UTC)
@EEng: That's the problem. I have to get a margin of error less than 5% if I want the BRFA for this to get accepted. PhilrocMy contribs 15:47, 19 October 2016 (UTC)
I didn't realize you were working against an imposed standard. Sounds like you know how to do these calculations, but in case you don't, if you add 50 more to your sample then the give-or-take figure will be at most exactly 5%, and almost certainly far, far smaller. Just for my curiosity, how do you draw the sample? EEng 17:16, 19 October 2016 (UTC)
@EEng: It's not a standard. In fact, there was no standard in the first place. I just don't want a wildly inaccurate value.
By the way, I'm not the one who made the value, Reidgreg did. PhilrocMy contribs 17:32, 19 October 2016 (UTC)
You know what, I'll just take the value as it is. PhilrocMy contribs 17:37, 19 October 2016 (UTC)
@EEng: As I told you earlier, I wasn't the one who got that value. Reidgreg did that. Anyway, from that percent, I have been able to estimate that about 740,000 articles on all of Wikipedia are affected by curlies. PhilrocMy contribs 19:40, 19 October 2016 (UTC)
For what it's worth, an insource search shows 14,000 articles with one of the four basic curly quote marks in article space. Insource searches do not always work well, though (and this seems way too low based on my experience), so someone should run a search on a database dump if we want data we can trust. If the true count really is 14,000, a supervised bot run should be able to handle it nicely. – Jonesey95 (talk) 19:48, 19 October 2016 (UTC)
Well, there's a teensy discrepancy between your 14,000 figure and the earlier estimate of 14%*5,000,000=700,000. EEng 20:12, 19 October 2016 (UTC)
@EEng: Yeah.   Downloading the latest database dump (it's from October 3rd) to Dropbox right now. It should take maybe 6 hours to get my number. PhilrocMy contribs 21:00, 19 October 2016 (UTC)
I make it about 15% (316 in 2000 random pages). Mr Stephen (talk) 22:33, 19 October 2016 (UTC)
Well, now I'm estimating 740,000 - 830,000 articles affected by curlies even though the insource search says about 14,000. I decided to not use the database dump since it would take too long. So which source should we trust? PhilrocMy contribs 00:24, 20 October 2016 (UTC)
Well, if someone would please tell me how the 2000 articles were selected, we could use that figure, but I'm not hearing any answer. EEng 00:32, 20 October 2016 (UTC)
AWB's 'random pages' option. Regards, Mr Stephen (talk) 07:54, 20 October 2016 (UTC)
@EEng: Try pinging people like how I'm pinging you instead of just asking a question. PhilrocMy contribs 01:56, 20 October 2016 (UTC)

@EEng: Now you have your information. So which source should we trust, my estimate from AWB or an insource search? PhilrocMy contribs 12:26, 20 October 2016 (UTC)

Well, I've posted a query at mw:Help_talk:Random_page to see if "random" really means random, but on the assumption that's really true, then based on the 316/2000 sample mentioned, the true proportion is about 16%, give or take about 1% or so i.e. 16%*5,300,000 = 850,000 articles +/- 50,000. (I'm not sure if that was based on just double-quote curlies, or the larger list of curlies Mr. Stephen gives above.) EEng 18:27, 20 October 2016 (UTC)
@EEng: But what about the insource search? PhilrocMy contribs 18:48, 20 October 2016 (UTC)
@EEng: I think we should just go with the insource search because it actually looks at articles, but what do you think?
I'm note sure how I became the arbiter of this, but anyway... At this point, we don't know the technical details of the operation of insource search, and we don't know the technical details of the operation of the random page function, but if I had to choose I'd believe the random sample. Search machinery often has hidden mysteries (doesn't recognize the search string in certain contexts, limits the size of the result set, ...) but even if there's something hinky to the random sample (skewed toward newer articles, skewed toward popular articles, ...) it's very hard to see how any of that would be hugely correlated with presence of curies.
On the other hand, I have to say that 16% seems awful high. Mr Stephen, did you personally examine the 316 you thought had curlies?
Oh, wait... Mr Stephen, were you looking for your full list of curlies? That might explain the difference in results, since the insource search was only looking for the four basic curlies. I wonder if Mr Stephen's list includes something which is legitimately used in a lot of articles. You'd better go back to your 316 and see exactly what character was being used, and how. EEng 19:49, 20 October 2016 (UTC)
@EEng: Speaking of your query about the random article feature really being random, I found your answer. Here! PhilrocMy contribs 21:21, 20 October 2016 (UTC)
OK, then, despite the (very surprising) oddities mentioned there, that tells us that the random feature is fine for our purposes, which leads me to want to believe the 316/2000 result which I analyze above. However, I still won't be fully comfortable until I hear from Mr Stephen about whether he was searching his whole long list of curlies, and what can be gleaned from looking at the actual hits to see what characters are being found and how they're used. To repeat, I suspect that one of those characters has some technical use we're not thinking of. That would have implications both for the estimate of # of articles affected, and for the design of the bot. EEng 22:58, 20 October 2016 (UTC)
Let me ask him for you. @Mr Stephen:, EEng wants to know if you used your whole list of curlies to make the article count. PhilrocMy contribs 23:03, 20 October 2016 (UTC)
@EEng: BUT WAIT. It's 12:09 am where Stephen lives, so he isn't up all night to answers your questions. PhilrocMy contribs 23:10, 20 October 2016 (UTC)
BUT WAIT... Between us we've pinged him three times, so let's just have a nice break until he gets back to us. EEng 23:19, 20 October 2016 (UTC)
OK. PhilrocMy contribs 23:36, 20 October 2016 (UTC)

OK, got that. I will not be at my PC until later today. I will have a look. Yes, I used the longer list but in my experience the great majority of curly quote s are the ordinary ones. Mr Stephen (talk) 10:00, 21 October 2016 (UTC)

I've decided to download the database dump, except on a 1TB hard drive. I'll get back to you later after I search for curlies. PhilrocMy contribs 21:49, 21 October 2016 (UTC)

Here we go then, 25 in 200 pages, so 12.5%. Of those 25, Olia Lialina, Amer Fort and Warren Sonbert used unusual quotes, the rest looked like ordinary curlys to me. Mr Stephen (talk) 22:24, 21 October 2016 (UTC)

  • diff: Olia Lialina: clean up, straight quotes,see WT:MOS using AWB
  • diff: Warren Sonbert: clean up, straight quotes,see WT:MOS using AWB
  • diff: Amer Fort: /* Early history */clean up, straight quotes,see WT:MOS using AWB
  • diff: Stanwick, Northamptonshire: clean up, straight quotes,see WT:MOS using AWB
  • diff: Edward J. Bles: clean up, straight quotes,see WT:MOS using AWB
  • diff: Karl Ludwig d'Elsa: clean up, straight quotes,see WT:MOS using AWB
  • diff: The Sims 3: Showtime: clean up, straight quotes,see WT:MOS, typo(s) fixed: carrers → careers using AWB
  • diff: Broquiès: /* The seigneurs of Broquiès */clean up, straight quotes,see WT:MOS using AWB
  • diff: LSU Communication across the Curriculum: clean up, straight quotes,see WT:MOS, added orphan, underlinked tags using AWB
  • diff: Adelaide Hasse: clean up, straight quotes,see WT:MOS using AWB
  • diff: Mike McCready: clean up, straight quotes,see WT:MOS, typo(s) fixed: tv → TV using AWB
  • diff: John Shepherd (scientist): clean up, straight quotes,see WT:MOS using AWB
  • diff: 1962–63 Duke Blue Devils men's basketball team: clean up, straight quotes,see WT:MOS using AWB
  • diff: Foster's School: clean up, straight quotes,see WT:MOS using AWB
  • diff: Max du Preez: /* Awards */clean up, straight quotes,see WT:MOS using AWB
  • diff: Information search process: clean up, straight quotes,see WT:MOS using AWB
  • diff: Cair Paravel-Latin School: clean up, straight quotes,see WT:MOS using AWB
  • diff: Averruncus: /* top */clean up, straight quotes,see WT:MOS using AWB
  • diff: Hitchin: clean up, straight quotes,see WT:MOS using AWB
  • diff: Ethnic cleansing in the Bosnian War: clean up, straight quotes,see WT:MOS using AWB
  • diff: 2012 Munster Senior Football Championship: clean up, straight quotes,see WT:MOS using AWB
  • diff: Phil Brown (footballer, born 1959): /* Preston North End */clean up, straight quotes,see WT:MOS using AWB
  • diff: 2013 Bengali blog blackout: /* top */clean up, straight quotes,see WT:MOS using AWB
  • diff: Contentment: clean up, straight quotes,see WT:MOS using AWB
  • diff: George Lefroy: clean up, straight quotes,see WT:MOS using AWB
Why is this percentage so wildly inconsistent anyway? PhilrocMy contribs 23:54, 21 October 2016 (UTC)
If what you're saying is that the 316/2000 = 16% and the 25/200 = 12.5% are inconsistent, they're not, because the small sample of 200 has a give-or-take figure of 2.3% "or so", meaning it wouldn't be surprising if it were twice that i.e. 4.6%. And 12.5% + 2*2.3% takes you to 16% and beyond. I conclude that the sample of 2000's estimate of 16% +/- 1% is reliable, and I suspect that there's something hinky about the count returned by the insource search. (All of this, of course, relies on others' actual examination of individual articles for curlies, which I have to take on faith.) EEng 05:14, 22 October 2016 (UTC)
  • On the basis of my extensive gnoming, 15% is very believable. The script I use cleans it up. But that's separate to the issue of whether MOS should prescribe curly or straight glyphs. I recall very strong reasons for the rule to use straight. Does anyone have a link to that discussion, which was years ago? Tony (talk) 06:20, 22 October 2016 (UTC)
    Well, you were involved in the discussion back in 2007, but debate goes back to 2005 and 2006. Hawkeye7 (talk) 06:52, 22 October 2016 (UTC)
EEng, if you go back about 9 paragraphs from the top, I started with an audit of 500 edits from my contributions. (I was working on apostrophe-related typos at the time, so I personally checked those). I understand that sample was not totally random (all articles had an obscure typo), so I checked 50 random articles from "Special:Random". I don't know the algorithm, either, but I got consistent results so I felt 15% (or even more roughly, 1/6th) was a good enough figure for discussion. What I wanted from a larger sampling was not to confirm this number, but to generate a list with other fields, personally check for copypaste, and look for any other correlations that might be used as indicators (and possibly also generate a list of editors making the copypaste violations). But before going there I wanted to check with some WikiProjects for issues I may not have anticipated. Tony1, if you look at the first paragraph of this discussion, I linked to four sections of this talk page's archives discussing straight and curly quotes. - Reidgreg (talk) 19:08, 24 October 2016 (UTC)
I reviewed the 25 edits listed above. (I'm not very good at this, still learning how to use the copyvio tools.) I found 3 cases that look like copyvio: Cair Paravel-Latin School appears to be copied from the school's website, and has had copyvio problems in the past; The Sims 3: Showtime has text matching a gaming site, difficult to tell if it's a copypaste or a backwards copy; and Contentment attributes a source but does not indicate a direct quotation. As for typography, in 7 cases the curlies should have been wikified instead of straightened. (For example, in Stanwick, Northamptonshire I believe some of the curly single quotes should have been changed to double quotes and others to italic markup.) I was hoping to treat the disease rather than the symptom, as it were, but curlies are indicators of a lot of different problems, many of these articles having multiple issues. Curlies seem to be an indicator of general unfamiliarity with Wikipedia practices. - Reidgreg (talk) 14:47, 27 October 2016 (UTC)
  • I think somebody should search a database dump so we can settle this count problem. Does anybody want to do that? PhilrocMy contribs 14:59, 3 November 2016 (UTC)
  • You know what, screw it. I'll just say 660,000 - 830,000 pages affected on the BRFA. Let's get to getting consensus for this bot (finally). So, does anybody want a bot which looks for curlies, then puts in Copypaste with a change explanation which mentions the bot? PhilrocMy contribs 15:08, 8 November 2016 (UTC)

Proposal

Create a Bot to automate the looking for and copypaste change of the curlies listed above (right single quotation mark, left single quotation mark, left double quotation mark, right double quotation mark, etc.) following the guideline MOS:CURLY.

@Lawrencekhoo: Hello? PhilrocMy contribs 12:19, 21 November 2016 (UTC)
Sorry, real life intrudes, I have to retire from WIkipedia for a while. LK (talk) 02:36, 22 November 2016 (UTC)

Retroactive use of transgender pronouns in indirect quotations?

I noticed our article on The Wachowskis implies that Lana outed "her sister" in 2012, four years before Lilly came out. This seems problematic. The relevant section is The Wachowskis#Lana's gender transition, and the text I'm especially concerned about is she and her sister are both generally shy qbout the news media and prefer to maintain their privacy. The use of the present tense in an indirect quotation from 2012 is a problem in general (not the case here, but we don't know if such statements are still accurate) but its use here heavily implies that Lana referred to Lilly as "my sister" at that time.

Is this an IAR scenario, or should we just convert it to a direct quotation (perhaps with square brackets to clarify that referring to Lilly as female is Wikipedia's wording and not the wording used by her sister in 2012), or should we clarify the point here?

Hijiri 88 (やや) 05:21, 21 November 2016 (UTC)

Honestly, we should appoint a committee of Talmudic scholars, Zen Buddhist monks, and Jesuit theologians to think through these conundrums. A couple of phenomenologists and semioticians could assist. EEng 05:54, 21 November 2016 (UTC)
Converting to a direct quotation seems to be the best solution here. Clarification might become unnecessarily wordy. Or do you mean clarifying the point in the MOS? --Florian Blaschke (talk) 01:08, 22 November 2016 (UTC)
Or why not use "sibling", as in the rest of the paragraph? That's the most obvious solution in fact, now that I think of it. (An attentive reader should understand that Lana would not have called Lilly her sister back then, but if there's a way to reduce potential confusion, I say let's go for it.) --Florian Blaschke (talk) 01:13, 22 November 2016 (UTC)
I already changed it.[3] "sibling" looks like a euphemism, and if we used it it wouldn't be clear that Lana didn't use a euphemistic phrase. I checked the source, and obviously she just said "we" most of the time.[4]
It's kinda off-topic, but interestingly enough, in the paragraph in question, either she misspoke or THR misquoted her, since it doesn't make sense unless if it's a choice between making movies or not doing press, we decided we're not going to not make movies was meant to be if it's a choice between making movies or not doing press, we decided we're not going to make movies.
Hijiri 88 (やや) 09:07, 22 November 2016 (UTC)
OK, good solution. But sibling is hardly a euphemism (that would imply that sister and brother are offensive words!), it's simply a handy gender-neutral term that has been in use for over a century. --Florian Blaschke (talk) 16:45, 22 November 2016 (UTC)
Well, depending on the context, referring to someone who identifies as female, but did not do so publicly during the period under discussion and was assigned the male gender at birth, as someone's "brother" would not necessarily be offensive, and the word itself is of course not an offensive word. But don't we use "sister" retroactively for precisely the reason that insistently referring to them as their "original" gender is something that bigots do and is incredibly offensive? By "euphemism" I meant "deliberately ambiguous term consciously used to avoid causing offense". Hijiri 88 (やや) 10:18, 24 November 2016 (UTC)
It's not merely offensive to misgender a person (any person really). It's also wrong, as in plain incorrect. (It's basically like misspelling or mispronouncing one's name.) So the use of the term "sibling" is not only to avoid causing offense, but simply to be correct, because even if Lilly publicly lived as a man back then, she was never truly anyone's brother, and now that we know she is female, in a statement about the present, referring to her as Lana's brother would just be plain nonsensical in addition to offensive (though that's also true about the past, now that we know the truth). --Florian Blaschke (talk) 14:39, 24 November 2016 (UTC)
Yes, that is true now, but in 2012 (almost?) no one knew that Lilly didn't identify as a man, and Wikipedia unabashedly referred to her thus. I don't have a problem with retroactively fixing this, but the problem arises when we are quoting someone else who almost certainly knew more than we did. If Lana had referred to Lilly as her "sibling" when the latter was still publicly identifying as male, in this particular context, this would have been borderline "outing", so I personally doubt she did that (the source itself is unclear). So we shouldn't use wording that implies she used that word. (I'm sorry if I'm making a mistake about the specific biographies of the two individuals in question; I'm going by what our article says and speaking generally about retroactive usage in indirect quotations.) Hijiri 88 (やや) 15:52, 24 November 2016 (UTC)
"Sibling" works fine here.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  23:44, 24 November 2016 (UTC)

We have MOS:HONORIFIC and WP:PBUH, and more generally WP:RNPOV and WP:WORDS, but no guideline specifically discouraging epithets outside Islam-related topics, such as "(the) Holy Bible", "(the) Holy Rigveda", "Lord Shiva" and others, which clearly violate NPOV as they come across as quaint at best to readers who do not adhere to the religion in question. --Florian Blaschke (talk) 00:05, 22 November 2016 (UTC)

It depends. Slippery slope and floodgates and all that. If we remove "Lord" and "Holy", we should remove "Christ" and "Saint" (which in most cases I would probably approve), and then if we interpret RNPOV too strictly then "Old Testament" would have to go and "New Testament" would have to be preceded by "Christian". Hijiri 88 (やや) 09:13, 22 November 2016 (UTC)
That said, singling out any one religion to which we should apply neutral language is absolutely out of the question, and we should be very careful about systemic bias when it comes to Christianity in particular. Hijiri 88 (やや) 09:15, 22 November 2016 (UTC)
Personally I have no problem with epithets used appropriately. Why should religious honorifics be treated as less than secular ones? As an Englishman I don't have a problem with "President Obama" though I never voted for him and he is not my president. Likewise I would expect Americans to refer to "Prince Philip" or "The Duke of Edinburgh". Why then is there a problem with "Jesus Christ" or "Lord Shiva"? Quite apart from the disrespect, such terms serve to disambiguate what are often common names. The problem can get worse, some figures are known almost exclusively by such terms, would it really help to change all references to "The Buddha" to "Siddhartha Gautama"? Martin of Sheffield (talk) 10:01, 22 November 2016 (UTC)
Well, "the Buddha" (when used to refer to the specific figure of Siddhartha Gautama) is less an honorific than an artificial construct developed by non-Buddhist westerners, and is incredibly ambiguous as a term. It should be "Shakyamuni Buddha". :P Hijiri 88 (やや) 12:37, 22 November 2016 (UTC)
Doesn't the definite article disambiguate sufficiently? Your correction is noted, but while "non-Buddhist westerners" may have heard of "Gautama Buddha", I for one would not have associated the name "Shakyamuni Buddha" with him. Please accept my apologies if my earlier post was viewed as offensive; I plead ignorance not malevolence. Martin of Sheffield (talk) 17:35, 22 November 2016 (UTC)
Well, yes, but only if you assume your readers don't live in a traditionally Buddhist society and are familiar with the traditional western construction of "the Buddha". I grew up in Ireland, so I would know what you meant, but if I saw "the Buddha" in the mainspace and it appeared to be referring to the historical Siddhartha Gautama or Shakyamuni Buddha, I would probably change it. And I would definitely change it in the situations where it isn't referring to Shakyamuni but to a different Buddha, and such situations come up more than you might think. Hijiri 88 (やや) 21:09, 22 November 2016 (UTC)
Maybe these are borderline cases (I maintain that the epithets in question are at best unnecessary), but there are certainly some epithets that unambiguously violate RNPOV, such as "The LORD" or "Our Saviour". --Florian Blaschke (talk) 16:53, 22 November 2016 (UTC)
Agreed that those last two are clearly not acceptable in an encyclopedia. Martin of Sheffield (talk) 17:35, 22 November 2016 (UTC)
"President Obama" and like terms are in a different class. It is referring to the post of president rather than adding an honorific like "doctor" or "professor". It would not be acceptable as an article title, but "President Nixon did <foo>" in running text makes it clear that an action of the president is being discussed rather than some other action of Richard Nixon. As for Prince Philip, that is screaming WP:COMMONNAME. What else are you going to call him? That is universally what he is called, even by antiroyalists. That is not the case with religious epithets, at least, not always. Generally, only the believers in that religion will use them. To others they may even be considered offensive. SpinningSpark 18:35, 22 November 2016 (UTC)
Exactly, thanks for clarifying my point. "Bible", "Rigveda", "Jesus" and "Shiva" aren't very common names, and generally not ambiguous in context, so epithets are gratuitous.
"Saint" and analogues in other religions tend to be more acceptable in context (especially because they often accompany common names), even if not usually necessary – it's better to write "Paul of Tarsus" instead of "St Paul", for example (and when repeated, "Paul" suffices anyway). In fact, there are often several hononymous saints, so it's better to use other disambiguators such as toponyms. --Florian Blaschke (talk) 19:32, 22 November 2016 (UTC)
Try RMing Saint Peter again. "Simon Peter" is not a common name, nor is Cephas, and "Peter the Apostle" is about as common as "Saint Peter". Hijiri 88 (やや) 22:00, 22 November 2016 (UTC)
Not sure what your point is. Argumentum ad baculum? ("Peter" is a common name, and "Simon Peter" isn't that common and recognisable, I think, so this example isn't really a counter to what I said.) Anyway, "Peter the Apostle" would clearly be the best title. It's more than awkward that Peter the Apostle redirects to Saint Peter, while Saint Paul redirects to Paul the Apostle. Basically, your example proves my point. --Florian Blaschke (talk) 06:18, 24 November 2016 (UTC)
If you think "Peter the Apostle" would clearly be the best title, why not open an RM? You're not the first one to agree with me on that point. (I say I agree with you, but I've never !voted in any of the previous RMs. That's a rabbit hole I don't much want to go down.) Hijiri 88 (やや) 10:10, 24 November 2016 (UTC)
One of the two which heard John speak, and followed him, was Andrew, Simon Peter's brother. (John 1:40) --Redrose64 (talk) 11:05, 24 November 2016 (UTC)
When Simon Peter saw it, he fell down at Jesus' knees ... (Luke 5:8) --Redrose64 (talk) 12:12, 24 November 2016 (UTC)
Not sure what your point is. I just think that most readers would not recognise "Simon Peter" as readily as "Peter the Apostle". But that's a minor issue anyway; let's not get sidetracked. --Florian Blaschke (talk) 14:08, 24 November 2016 (UTC)
Meh. I think they're all pretty recognizable, and none of them are ambiguous (although I'm sure there have been other saints named Peter). I agree that "Saint Peter" is the least NPOV option. But I also think that we shouldn't be altering the wording of the MOS to explicitly condemn the current title of our article. If this was Hijiripedia the page would be moved by tomorrow (ironic, given the usual meaning of my username), but we can't put the chicken before the egg here. There are too many policies and guidelines (and template documentation pages!) that attempt to be normative on editors' behaviour but were crafted by two or three users years ago and have never been widely observed in practice. Hijiri 88 (やや) 16:00, 24 November 2016 (UTC)
"Saint" and "Holy" are not equivalent; "Holy" is used as an epithet indicating reverence. "Saint" is an official designation of canonization by the Roman Catholic Church, a level above beatification.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  23:42, 24 November 2016 (UTC)
Good point. I also don't object to "Saint" nearly as much as to "Holy" and the like. "Holy Bible" etc. is just not common in contexts that aren't explicitly specific to the religion in question. --Florian Blaschke (talk) 10:57, 25 November 2016 (UTC)

WP:WORDSASWORDS at Cougar (slang) article

Anyone willing to weigh in on a discussion about italics vs. quotation marks at the Cougar (slang) article? See Talk:Cougar (slang)#Quotation mark or italic to denote words as words? When and how to use?. Flyer22 Reborn (talk) 09:29, 29 November 2016 (UTC)

Earth vs. the Earth.

Under Capital letters/Celestial bodies, the example sentence "The Moon orbits the Earth" is given. Under the same heading on the Wikipedia:Manual of Style/Capital letters main page, the same example is given as "the Moon orbits Earth". I removed "the" before "Earth" on this page so it would match the main page but my edit was undone. Seems to me they should match. To me, saying "the Earth" (Earth used as a proper name) is like saying "the Mars". Also, "the" is superfluous since it can be removed without hurting the meaning of the sentence. Just trying to keep things logical & consistent. Unknowntouncertain (talk) 04:51, 23 November 2016 (UTC)

Interesting case, I am of the opinion that both are correct, and both have their uses, for instance, some phrases and idioms use "the earth" i.e. "the face of the earth", others use just "earth" i.e. "what on earth?". All in all both have their uses, perhaps laying out when either should be used could help. Iazyges Consermonor Opus meum 05:01, 23 November 2016 (UTC)

My issue is only with "the" before capitalized "Earth". Saying "the earth" is fine. Unknowntouncertain (talk) 05:34, 23 November 2016 (UTC)

It does seem that in some cases you could use either the Earth or Earth when referring to the planet as a whole. I would think that earth as a synonym for dirt, would not be capitalized, but could be wrong. The Moon is interesting. I presume we don't capitalize the moons of Jupiter, or the moons of Mars. Should we capitalize the Moon of the Earth? Just because there is only one? Gah4 (talk) 06:13, 23 November 2016 (UTC)
@Gah4: I believe under that syntax it would not be capitalized, because it is referring to a moon of earth, which while it implies that it is the Moon, isn't a proper noun, so it would be "the moon of the Earth" or perhaps, "Earth's moon". Iazyges Consermonor Opus meum 06:17, 23 November 2016 (UTC)

Or, better yet, "the moon of Earth". Unknowntouncertain (talk) 06:46, 23 November 2016 (UTC)

Interesting. The Earth, the Moon, the Sun; Earth, Luna, Sol. (Mars, etc). Tony (talk) 00:50, 24 November 2016 (UTC)
Stop talking in riddles, Tony. We're waiting for you to sort this out. EEng 01:10, 24 November 2016 (UTC)
:-) I don't think the matter should be set in stone in our guidelines (there's variation in usage out there). But it should be consistent within an article, unless there's a switch from the proper name Earth, Luna, Sol, to the alternative names for these three objects ("the"). Noting that the name of our galaxy carries a "the" whether or not it's used in a formal, titular sense. Usage has grown inconsistently. Tony (talk) 03:28, 24 November 2016 (UTC)
My take on this is to use 'the earth' (with article, lowercase) in the same situation as you would use 'the world' and to use 'Earth' (no article, capital) in similar situations to when you would use 'Mars' (ie, a particular planet, not just the one you happen to be standing on).  Stepho  talk  03:35, 24 November 2016 (UTC)
Wouldn't that be introducing rules that don't exist outside WP? Curly "the jerk" Turkey 🍁 ¡gobble! 03:52, 24 November 2016 (UTC)
Actually, it is generally held that the use of the article "the" indicates that a noun is not a proper noun and therefore not capitalised. "The" indicates that it is a specific thing but not the name of that thing - ie the dog refers to a specific dog but dog is not capitalised unless the dag is called "Dog". "The Smiths" is capitalised because it is derived from a proper name. Similarly, the Parliament of Queensland or the shortened form: the Parliament can be capitalised (but capitals for this shortened form is expressly deprecated here). Cinderella157 (talk) 05:10, 24 November 2016 (UTC)
You're saying it is generally held that people don't call the Earth the Earth? We have a problem with that assertion—people do. People also say "I often go to the library" (even when there are multiple libraries to choose from) and "I need to use the toilet" (even when there are multiple toilets to choose from). Oh, you're missing an example that's already come up: "the Moon". It's incorrect to say "Moon is big and bright tonight". Curly "the jerk" Turkey 🍁 ¡gobble! 06:17, 24 November 2016 (UTC)
The Moon is not a counter-example to the rule that the names of planets are not used with the definite article, since it is not a planet. Nor is, of course, the Sun. --Florian Blaschke (talk) 07:56, 24 November 2016 (UTC)
There is no "rule" that planets do not take a definite article; "The Earth" does, for example. Curly "the jerk" Turkey 🍁 ¡gobble! 09:25, 24 November 2016 (UTC)

WP:THE#Other cases explicitly says: "Definite and indefinite articles should generally be avoided in cases not mentioned above. For example, ... Earth, not The Earth." Mitch Ames (talk) 06:23, 24 November 2016 (UTC)

That's talking about WP article titles though. In ordinary text and usage, as noted, Earth appears both with and without a definite article. It's not a case of one being right and the other wrong. The fact that people do not say "the Mars" is not evidence of a rule that names of planets are never used with the definite article; indeed, the use of "the Earth" is proof that no such rule exists. English isn't always logical or consistent (and of course this planet is a special case anyway, as far as the human perspective is concerned, for rather obvious reasons). Whether the MOS should express a preference is a different question. I'd see it as needless micromanaging. N-HH talk/edits 08:18, 24 November 2016 (UTC)
What I am saying is that when yo add the definite article, you generally don't capitalise - "The dog called Dog came to me". Furthermore, this appears to be somewhat consistent with the MOS because, "The moon rose ...". Cinderella157 (talk) 09:27, 24 November 2016 (UTC)
And there are exceptions. For example, "The Earth", "The Moon", "The Lord". You'll also want to click through to the Moon article ... Curly "the jerk" Turkey 🍁 ¡gobble! 09:31, 24 November 2016 (UTC)
There are exceptions, as I have indicated, and "The Moon" is not a good example, firstly, for capitalising "the". "The Earth" is equally bad. Capitalisation, if it is to be considered at all, is very contextural and subjective (to a degree) given the position of deprecating caps. Cinderella157 (talk) 10:33, 24 November 2016 (UTC)
We're not talking about capitalizing "the". Curly "the jerk" Turkey 🍁 ¡gobble! 11:10, 24 November 2016 (UTC)
We don't say "the Luna", "the Sol", "the Mars" etc. because these are all names from Latin - and there is no article in Latin. "Luna" may mean "a moon", "the moon", or simply "moon". --Redrose64 (talk) 18:49, 24 November 2016 (UTC)
In Latin, but in English "Luna" cannot mean "a moon", only "the Moon". We don't translate all of the semantics or grammar of a unit of vocab from another language—for instance, we don't say "the dark side Lunae" for "the dark side of the Moon", but we could say "the dark of Luna". The existence or absence of articles in Latin is irrelevant. There are no articles in Japanese, either, but we translate 「御門」 as "the Mikado" rather than simply "Mikado". Curly "the jerk" Turkey 🍁 ¡gobble! 21:55, 24 November 2016 (UTC)
"Earth" and "the Earth" are both conventional usage, as are "the Moon" and "the Sun" (as celestial bodies), "the Oort Cloud", etc., but not "the Mars" or "the Jupiter". Follow actual usage, don't try to make up a robotic, consistent rule. Human language is not consistent.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  23:39, 24 November 2016 (UTC)
The whole point of having a Manual of Style is to guarantee more consistence than observed "in the wild". Also, some writers like having a rule when they are in doubt which variant to use. --Florian Blaschke (talk) 10:55, 25 November 2016 (UTC)
See The Cambridge Grammar of the English Language, pp. 517–518, on "strong and weak proper names". Some proper names are strong and never have a definite article (e.g. "New York", but not "the New York"); some are weak and typically take the definite article (e.g. "the Alps", "the Parthenon"); but "[i]n some names the is optional, so that we have both strong and weak versions." In short, cases like "(the) Moon" and "(the) Earth" fall into the optional article category within the English language. Imposing a rule in this situation would be an unnecessary limitation upon the normal grammar of English. --EncycloPetey (talk) 03:16, 27 November 2016 (UTC)
But that's exactly what we already do for article titles. Style guides come up with fixed rules to avoid endless debates when several options are accepted outside the relevant medium and there is no clear norm. --Florian Blaschke (talk) 10:01, 29 November 2016 (UTC)
And "more consistenc[y]" is not synonymous with "total, robotic consistency".  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  09:00, 2 December 2016 (UTC)

Infobox and other editions

@EEng: Why did you do this revert? Is it controversial in some way? The linked guideline clearly states that infoboxes are neither prescribed nor proscribed. It is saying on a style guideline whether to have them or not is a style issue. Thus, it comes under the genernal proscription of not unnecessarily switching styles and belongs on the list. I saw this as merely summarising existing guidelines and entirely uncontroversial. Do you disagree?

You reverted my edit in its entirety, not just the infobox addition. Do you object to some other aspect of my edit, or is it only the infobox issue? SpinningSpark 11:14, 4 December 2016 (UTC)

I would object to including infoboxes in RETAIN (or making any other undiscussed changes to that section, which can lead to significant and protracted conflict). MoS has long avoided treating the presence or absence of infoboxes as a style matter; they're a content matter, and even ArbCom has treated them as a content matter. It comes down to editorial consensus at a particular article whether an infobox (at all, or a particular one) is helpful or not, and that assessment often changes as the article develops (which, per WP:SUMMARY / WP:SPLIT / WP:SPINOUT, is not always a one-way march to increased complexity at any particular page). There's also the issue that acceptance of infoboxes has generally gone up, site-wide, further removing the feature from being "style" territory, just as various forms of categorization have become accepted (or, in some cases, rejected) and this also is not a style matter, but essentially a content metadata matter. The principal two sources of long-term and heated conflict over infoboxes are a) bogus claims that all articles should have infoboxes (and similar overgeneralizations, e.g. that any infobox is better than no infobox, that an infobox with more details is automatically better than a smaller one, etc.); and b) the equally bogus misconstruing of RETAIN to imply that if any article does not presently have an infobox then no one can ever add one (or conversely, that if a page has long had one then no one can remove it, even if it's really an unhelpful infobox). Virtually every infobox flamewar is based on one or both of these fallacies. And neither of them are MoS matters (except inasmuch as MoS continues to maintain its careful neutrality/agnosticism with regard to infoboxes at all). There is also MOS:INFOBOX, which is about how to design and place infoboxes, but not whether to have or remove one.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  11:50, 4 December 2016 (UTC)
Although I would like to see infoboxes included in RETAIN in some way, I agree with SMcCandlish that this is a change that shouldn't be made unilaterally; unfortunately it would be a very controversial change, and I doubt it would gain consensus if such a discussion took place. Mike Christie (talk - contribs - library) 12:35, 4 December 2016 (UTC)

English vernacular names

In view of the history, I thought I should not be too bold, but I believe the wording

  • "They are additionally capitalized where they contain proper names ... "

is unclear, if not incorrect. I propose changing it to

  • "If they contain proper names (including proper adjectives), that part is capitalized as usual".

The examples already illustrate the main point:

  • "Przewalski's horse, California condor, and fair-maid-of-France".

The capitalization of proper names would normally be obvious, but probably needs to be stated explicitly because of the rules on taxonomic species names. I was trying to think of an example with a proper adjective. However, I see that dog breeds like "Old German Shepherd" are capitalized (all words), so perhaps we should leave sleeping Alsatians lie and not bother with an example. --Boson (talk) 20:04, 4 December 2016 (UTC)

  • Oppose parenthetic addition as redundant and unnecessary. This clarification might be needed if it read "where they contain proper nouns", but it does not. Even if it did, the concise solution would be to change "nouns" to "names" (an edit I frequently make for this very reason). The concept of the proper name already includes (in English – this is not universal among languages) most adjectival derivations, e.g. "Texan hot sauce", "Platonic ideals" (the capitalization is only dropped when the connection of the eponym to the subject has become very attenuated: "like some kind of lynch mob", "their relationship, though close, was platonic"). At any rate, the very examples themselves, like "Przewalski's horse", etc., already make it clear, and there won't be any cases that are not adjectival or prepositional anyway.

    Support syntactic rewrite, but with grammatical correction of "that part is" to "those parts are" (plurality agreement with "they" and the rest of the paragraph, about names in the plural). While "capitalized where they contain proper names" is exact ("Przewalski's horse" and "fair-maid-of-France" only have one spot each where they contain a proper name, "Przewalski" and "France", respectively), the wording is maybe a bit academic and might not be understood by everyone as having the precise meaning intended.

    History: Yes, this was added because it applies to vernacular names but not specific epithets (a change that came about mid-20th century; you can actually find old sources that give things like Andrias Davidianus for Andrias davidianus). There was some concern that people would mix things up, and in fact during discussions about MOS:LIFE, there were occasional posts along the lines "Surely you don't mean it should be 'canada goose'!?!?" While that was hyperbolic posturing, just being clear on the matter nips that kind of stuff in the bud. BTW, the "where they contain proper names" has been there since the original formulation in 2007 [5].

    Breed capitalization: Yes, please don't wake the sleeping "breed caps" giant. While not everyone agrees with it, on WP or in RS, capitalization is being done consistently if informally on WP for standardized animal breeds and plant cultivars, and nearly every regular editor of domestic animal and plant articles would oppose a change to lower-case, leading to a repeat of the "species caps" drama of 2008–2014, but magnified tenfold. The formality level has increased some, with WP:RM routinely siding with capitalization for WP:CONSISTENCY reasons. Maybe this makes it a slow-moving WP:FAITACCOMPLI, but it is what it is. Downcasing would run into complications immediately, such as the fact that the International Code of Nomenclature for Cultivated Plants requires cultivar (plant breed) names to be capitalized. The WP:SSF argument can only be taken so far; in some cases what was a specialist usage has become the dominant one. While no such standard exists for zoological breeding, and publications like newspapers lean toward "German shepherd" not "German Shepherd", this is in flux (capitalization is increasing), and the capitalization is a de facto standard in breeder, fancier, agricultural, etc., publications. Going against it for animals would result in having an inconsistent approach to animal versus plant breeds on Wikipedia, which is highly undesirable. Another issue is that most breeds contain a proper-name element in their names, so most will be capitalized anyway, and it can take nontrivial research to figure out whether one does or does not. It's just simpler and lower-drama to capitalize them. Anyway, I have neutrally(?) analyzed the major pro/con points of that debate here, if you want a look at how unlikely it would be for a clear consensus to emerge to downcase breeds (an idea I initially supported). Anyway, MOS:LIFE's examples are sculpted carefully to address every common sort of "group of lifeforms" (species, genus, landrace, dog and horse "type", etc.), except it studiously avoids standardized breeds/cultivars. Attempting to insert an MoS rule to capitalize them, instead of just leaving it alone, might raise a big stink, from a different quarter than and to a lesser degree than inserting a rule to lowercase them. But we may need to address this eventually; for one thing, it is the sole issue that has held up the expanded MOS:ORGANISMS in draft state for years. Given that ArbCom just heard two or three MoS-related WP:ARCAs back to back, I would suggest leaving it alone longer.
     — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  09:19, 8 December 2016 (UTC)

Do language manuals apply across all projects?

Hi. I recently quoted the principle of MOS:COMMONALITY on a talkpage in the Wiktionary and someone claimed that "MOS:COMMONALITY is a Wikipedia policy. This is Wiktionary." Is this correct? Thank you. Rui ''Gabriel'' Correia (talk) 21:22, 5 December 2016 (UTC)

Yup. Another thing that frequently trips people up is that there's no restriction on WP:OR, or WP:V requirement, on Commons. EEng 22:02, 5 December 2016 (UTC)
Other Wikimedia projects have their own style guides.
Wavelength (talk) 22:05, 5 December 2016 (UTC)
About the only thing that is common to all is the Terms of Use. --Redrose64 (talk) 00:37, 6 December 2016 (UTC)
Wiktionary would be much the better for using the en.WP MOS (plus its own extras), in my view; but it doesn't, and is slightly shambolic for it. Tony (talk) 02:09, 9 December 2016 (UTC)