Talk:Marshallese language/Archives/2019/December

This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Peer review requested

Latest comment: 4 years ago88 comments3 people in discussion

Lately I've been doing a lot of work, both here and at wikt:Wiktionary:About Marshallese. However, I know very well that I'm not qualified to maintain a Marshallese linguistics project by myself on one wiki, let alone two. It's not that I think I do not have the skills to contribute, but I lack a certain ability to view my own work from the outside, criticize and discuss it, and improve it on that basis. And I can see I'm not alone in this sense of an absence of consensus in Marshallese linguistics. In Phonemic Inventory of Marshallese, and some General Properties (Sara Blalock Ng, 2017), Ms. Ng notes other publications addressing Marshallese phonology, by Abo, Bender, Choi, Willson and so forth, that highlights some of the various disagreements that exist in interpreting and isolating the sounds of the language, such as (but not limited to) the disagreement of which phonetic symbols to use for vowel allophones, as well as the fundamental disagreement between Choi (1992) and the rest of whether there are three or four vowel phonemes. And I must admit that Willson's (2003) discussion of "+ATR" vs. "-ATR" (which I'm assuming means "advanced tongue root") in regards to the four vowel phonemes themselves (not just allophones) is something that goes over my head. And in trying to capture the best of all references, I think I may have committed a cardinal sin in my edits at Wiktionary by introducing three separate phonetic pronunciation transcriptions (MED, Choi, Willson) into every word entry, though at least in such a way that it could still easily be undone at the template level, and I basically admit as much confusion in the Wiktionary project page.

And though a lot of what I'm discussing is happening at Wiktionary rather than here on Wikipedia, I think a lot of the problem can be traced to how Marshallese phonology has been addressed (or, should I say, inadequately addressed) here at Wikipedia. This article uses a phonetic pronunciation system informed by all these different publications, but it still confuses the hell out of people visiting this article, as evidenced by random comments I've encountered on sites like Reddit, and others at YouTube and such. The discussion at Reddit (since closed to further input) also mentioned the discussions in years past on this very talk page. I admit one of my weaknesses as an editor is my tendency to increase the complexity of information in my efforts to perfect it, even if doing so makes the information harder to read and understand. I vaguely recall the effects of this being discussed on this very talk page, when the state of this article as I had written up to that point was jargon-heavy and plagued with the appearance (and perhaps also substance) of original research. I've already done some work both here on Wikipedia and over on Wiktionary, trying to simplify some of the beasts I've created, while also hoping I'm not creating more in the process.

For now, there is at least one piece of recent work I've deployed on Wiktionary that I can, with whatever modifications are necessary, also redeploy on Wikipedia, and that's wikt:Module:mh-pronunc. That module makes it possible to drastically simplify Marshallese pronunciation templating by writing template arguments as a much simpler code format and doing away with the convoluted token system used in my earlier pronunciation templates. And I learnt Lua on the fly to make it happen—it helps that I was already a well-practiced JavaScript programmer.

Yet all this work still doesn't mean much if I'm not actually improving the quality of information on either wiki. I want feedback, I want critiques, I want collaboration, I want to be more than a one-person wikiproject working on a neglected topic. In the absence of meaningful peer review, my edits, no matter how good or bad they are, are effectively the final word on these wikis until someone else takes notice, which can take years. This is an injustice to the topic and to the quality of its wiki work. - Gilgamesh (talk) 12:52, 27 October 2019 (UTC)

@Gilgamesh: Let me first note that your work has not gone unnoticed so far: Marshallese is on my watchlist (as are – surpise, surprise – most Austronesian languages), so I have been aware of your epic (forgive the pun!) editing work. My main interest is historical-comparative linguistics, but descriptive work such as phonetic/phonological studies on Austronesian languages are just as interesting for me. My own work with the languages of Micronesia always has focussed on the non-Oceanic languages Palauan and Chamorro, but after reading this plead of yours which I can fully relate to, I will gladly take up the task of reading my way through all sources you have consulted in order to review your presentation of the material, and maybe to contribute as well. As for OR, as an (ex-)academic researcher I have in theory little tolerance for it on Wikipedia, yet, some degree of synthesis is hard to avoid for topics which have not yet been covered much in tertiary sources. In any case, great job! –Austronesier (talk) 18:58, 27 October 2019 (UTC)

Thank you. - Gilgamesh (talk) 19:37, 27 October 2019 (UTC)

You know, I do like historic and comparative linguistics, too. Though obviously there's no written record of Marshallese before the colonial period, it is still an Austronesian language, and so many of its words can and have been logically linked to reconstructed proto-language forms. Some other editors over at Wiktionary have filled out a few etymology sections for words, but I'm not exactly privy to these references myself. There are words whose origins I've been curious about, like io̧kwe (hello; goodbye; love), and aelōn̄ (atoll; country; non-atoll island), and em̧m̧an (good; to be good). I already know that āne (islet) is the Marshallese reflex of Proto-Austronesian *banua. And to this day I still haven't identified why sources keep repeating that the language is also called "Ebon," when I can't even find the associated term in the Marshallese-English Dictionary under that or any other similar spelling I've been able to think of. - Gilgamesh (talk) 19:56, 27 October 2019 (UTC)

One thing that hinders my progress in describing the language, here to a lesser degree but especially at Wiktionary, is that I don't have the best grasp of the peculiarities of Marshallese grammar. For example, it seems that Marshallese doesn't actually have adjectives, but only verbs that can be used adjectivally, which the MED marks as "v. adj." I'm not entirely sure how this works, but I suspect they are stative verbs. And I'm reluctant to expand too many more word entries on Wiktionary without paying due attention to their grammatical inflections. The MED goes to the effort of describing these at a basic level, but I would be doing the language a disservice if I edited too deep into areas I don't understand very well. Besides the MED, there's also Practical Marshallese (Peter Rudiak-Gould, 2004), a free language tutorial book. But whereas books like these are for lay people (and it even discourages readers from delving too deep into the technical details in the MED so as not to be confused), I must admit I'm not exactly a lay person, and I process a lot of information differently. Reading a book like that, my mind branches in so many different directions, wanting more technical details, not less, and feels strain when I can't find what I'm looking for. I know there's a certain immersive cultural context that can't necessarily be ignored when learning a language, but for me it tends to be secondary to learning the ins and outs of the language's structure—satisfying grammatical curiosities, etc., even if it's not the order most people learn languages. This same characteristic also means I tend to write highly technical articles, and sometimes need help making them more digestable to lay people. - Gilgamesh (talk) 16:00, 28 October 2019 (UTC)

This comment will be a long one, but there's a lot more information quality issues I want to air out.

One issue I want to tackle, again more for Wiktionary's sake but also for here, is adopting (or readopting) one transcription system of phonetic (as opposed to phonemic) IPA pronunciation. MED (1976), Choi (1992) and Willson (2003) fundamentally disagree on vowel symbols.

Currently Wikipedia is using more or less what is described in the MED, but I was reminded while rereading that section that it describes sounds in pseudo-IPA, not true IPA. For example, it uses symbols like /tʸ/ and /pᵚ/ instead of /tʲ/ and /pˠ/. For these reasons, I treated the MED's IPA-like symbols in a soft manner, not to be taken absolutely literally, and this led me to also, and without mention I had done so, ignore its symbol [ə] for the back unrounded allophone of the close-mid vowel phoneme, and use [ɤ] instead, which fit orthogonally with the other vowel allophone symbols it uses besides [ə]: [æ ɛ e i ɑ ʌ (ə) ɯ ɒ ɔ o u]. In retrospect, this was inappropriate of me, at least because it had the effect of slightly misrepresenting the vowel system the MED actually uses, all for the convenience of making that one modification to it and saying in the article that the MED system would be used for the rest of the article.

Though since reading the pronunciation section of Rudiak-Gould (2004) and recently picking up Ng (2017), I realized I was doing the same thing most of the other publications were doing—improvising a set of pronunciation symbols to my satisfaction, not the language's. Rudiak-Gould (2004) doesn't use pronunciation symbols, but instead describes the pronunciations in layman's terms, though in a reasonably exacting manner, like when he says the spelling ā is pronounced "halfway between pet and pat", which can easily be inferred as [æ̝] or [ɛ̞]. In this sense, all the phonetic transcription symbols are equally correct and equally imprecise in describing the sounds of a language whose vowels exist in a phoneme cloud of sorts, much like the lexical sets of English vowels.

So, up until this point, this article has been using MED vowels with that one modification. But what can we use, and what should we use? The point Ng (2017) brought up that comes to mind is that different sources use different symbols for vowels, but that a plurality or majority of them use the same symbols advocated by Willson (2003), who herself did not come up with those symbols but was citing an earlier source I don't have access to. But on the other hand, the MED system is used in the MED, which was, is and remains the only complete Marshallese dictionary in existence, and that makes it notes especially influential. And yet on a third hand, I can't fully ignore Choi's (1992) reasoned conclusion that while there may be a historic or underlying mechanism for four vowel phonemes, he could only acoustically detect the existence of three vowel phonemes.

There's also the issue of the phonemic transcription of vowels, which has some basis in the references (in this case Willson's (2003), but was also was also rather improvised on my part), as I changed Willson's /ɐ ɜ ɘ ɨ/ to /a ɜ ɘ ɨ/ for use in this article. And either way, these vowels' true articulations are never actually central vowels as their symbols imply, but only represent the vowel phonemes' relative heights in a vertical vowel system. And again, this isn't the only way of looking at the vowels, as the book Spoken Marshallese (Byron W. Bender, 1969, this time a family heirloom I found on our bookshelf downstairs) prefers to analyze the four vowel phonemes primarily as fronted and unrounded, with the view that the palatalized (or "light") consonant phonemes that triggers these vowel allophonemes can be thought of as the default secondary articulation of Marshallese consonants, and the velarized and labiovelarization (both "heavy") consonant phonemes viewed as modifications (though there are actually no palatalized dorsal obstruent or nasal phonemes at all to help fill out this pattern). I mean, I think there may be some wisdom in this palatalization-default view, as Proto-Micronesian */w/ became Marshallese /j/, Proto */t/ became /tʲ/, etc. So, if Bender is right, it may be appropriate to treat /æ ɛ e i/ or /ɛ e ɪ i/ (whichever set of symbols are preferred—Bender did not use IPA symbols to describe these) as phoneme symbols and not just as allophone symbols.

You're right that some degree of synthesis can seem inevitable in trying to write Wikipedia articles based on references that often don't share conventions with one another. So might it be all right to just...arbitrarily choose something, or at least come to a certain consensus to arbitrarily choose something for Wikipedia (and perhaps also Wiktionary) to use?

Vowel phonemes:
- Bender's front vowels (presumably either /æ ɛ e i/ or /ɛ e ɪ i/)
- Choi's /ɐ ə ɨ/
- Willson's /ɐ ɜ ɘ ɨ/
- the ad hoc /a ɜ ɘ ɨ/ which Wikipedia had already been using
- something else
Vowel allophones:
- MED's [æ ɛ e i ɑ ʌ ə ɯ ɒ ɔ o u]
- Choi's [ɛ e i a ʌ ɯ ɔ o u]
- Willson's, Ng's, others' [ɛ e ɪ i a ʌ ɤ ɯ ɔ o ʊ u]
- the ad hoc [æ ɛ e i ɑ ʌ ɤ ɯ ɒ ɔ o u] which Wikipedia had already been using
- something else

And for that matter, what is the most neutral way to describe the obstruent consonant allophones? Knowing that all these different styles of pronunciation are possible in free variation, and some styles differ by gender, social position or lyrical mode (Marshallese singing can sound very different from Marshallese speech). This affects all the obstruents in how voiced they are, but it especially affects /tʲ/ whose allophones can also be affricates or fricatives, and can be palatalized alveolar, or alveolopalatal, or sometimes even true palatal (a dorsal place of articulation for a phoneme whose defining characteristics include being coronal).

Use only voiceless symbols in all positions, like in the phonemic transcription.
Reflect the common habit, cited in several of the references, of voiceless consonants at the beginnings and ends of phrases and when geminated, but voiced consonants after vowels and seemingly also after nasals. Use only [tʲ, dʲ] for /tʲ/.
Do the above but use [zʲ] as the principal voiced allophone of /tʲ/, which in modern times seems to be more common than [dʲ]. I can't vouch for how accurate this is, though. When I was growing up on Kwajalein in the 1980s, it was anecdotally reputed that the Marshallese speakers on nearby Ebeye pronounced [dʲ], including in the name Kuwajleen (Kwajalein) itself. But in most of the YouTube videos I've listened to, I hear [zʲ ~ ʑ] in these positions.
Do similar to the above, but use alveolopalatal symbols [t͡ɕ ʑ t͡ɕː], except in the final position with [tʲ̚] because final obstruents tend to be voiceless stops and unreleased.
It is common in song, and suggested to also be common in the speech of women, to pronounce all obstruents as voiced, and /tʲ/ as a fricative in most cases, even word-initially.

Virtually any decision I make unilaterally on these options will have the inescapable appearance of bias.

And while browsing a bit more through Bender (1969) right now, I just now noticed something I'd been overlooking for years, the ignorance of which has reflected in all my wiki edits on Marshallese to date: His non-IPA-based phonemic transcription system (which is reused for the MED) doesn't just recognize the sequences {yiy} and {yi'y}, but also a third sequence he transcribes {'yiy}. Before, I'd understood that {yiy} was pronounced with a full vowel [i], while {yi'y} was the same vowel pronounced as a glide [i̯], and I thought {'yiy} was just a typo for {yi'y}. But it's not that simple. {yi'y} is actually pronounced with a full vowel phoneme like {yiy} in the Rālik (western) dialect, but as a glide in the Ratak (eastern) dialect, while {'yiy} is pronounced as a glide in both dialects. This is embarrassing...now I have to update pronunciation templates on Wiktionary again before I consider redeploying them on Wikipedia. This is another reminder that I can't be a one-person wikiproject. Obviously I will have to look more into these two-dialect considerations baked into Bender's transcription.

To (perhaps needlessly) add to the confusion, the original 1976 print version of the MED and the widely cited online version of the MED use slightly different versions of Bender's transcription system. They can be summarized that 1976 uses the degree symbol {°} as the labiovelarized secondary articulation symbol, while online uses the more IPA-style {ʷ}, and also 1976 uses the symbol {q} where online uses {kʷ}. This should be noted in the article's section decribing Bender's transcription system. But apparently there is some justification for the online version's creeping changes—unlike the 1976 print version which does not change, the online version has continually added new word entries, grammatical data and usage examples, and the online version's maintainer made these alterations to Bender's system apparently on his own initiative. I'm not sure exactly how much leeway to give 1976 or online though. While 1976 (and online by extension) can be criticized for using pseudo-IPA instead of true IPA phonetic symbols in its chapter explaining the language's phonology, and 1976 and online have diverged on Bender's transcription accompanying word entries, the online version uses a very non-standard variation of the Marshallese alphabet for orthographic examples, using ḷ ṃ ṇ ñ ọ (with dots and tilde) throughout instead of the standard ļ m̧ ņ n̄ o̧ (with cedillas and macron), seemingly because ḷ ṃ ṇ ñ ọ displayed better in common Times New Roman at the time the online version was being set up, and has never changed and probably can't be since even the HTML anchors for word entries use the ḷ ṃ ṇ ñ ọ spellings, and these are extensively referenced in Marshallese linguistics on both Wikipedia and Wiktionary even if articles otherwise use the standard ļ m̧ ņ n̄ o̧ letters. To be clear, the original 1976 always used the standard ļ m̧ ņ n̄ o̧. Marshallese has long been a headache to properly display in Unicode, since so few font designers ever gave it any real consideration.

Whew! That was a lot to cover. - Gilgamesh (talk) 10:55, 29 October 2019 (UTC)

And this is why we double-check our references. I didn't pay close enough attention to the {yiy/yi'y/'yiy} distinction as described in Bender (1969). I wish I could provide a link to scans of the page directly, but this particular book still appears to be commercially sold, so nearly all its pages are not available for online viewing. So I'll paraphrase.

When a word begins with {yiy} (without apostrophes), they are pronounced differently in the two dialects.
- In the Rālik dialect, the vowel is "dwelling on", and is pronounced as if it were {yiyiy}, a long monophthong.
- But in the Ratak dialect, it becomes a "passing over lightly" vowel, and is pronounced as an asyllabic glide like the Y in English yes.
Words that instead start with {yi'y} have the vowel sequence pronounced as "passing over lightly" in both dialects.
But words that start with {'yiy} have the vowel pronounced as "dwelling on" in both dialects.

How could I misread something like that? - Gilgamesh (talk) 16:06, 29 October 2019 (UTC)

Sometimes after I've slept on a question, better sense prevails.

Vowel phonemes: /ɐ ɜ ɘ ɨ/.
- My reasoning? The majority of references use central symbols for vowel phonemes. And most of the references recognize four phonemes rather than three. And both Choi and Willson use /ɐ/, with only me using /a/ and I was never really reviewed for it before now. It helps that /ɐ/ is vertically nearly equidistant between cardinal [ɑ a] and [ɛ̞ ɔ̞]. Closer to [ɛ̞ ɔ̞] than [ɑ a], but /ɐ/ is still the most referenced, stable pick.
Vowel allophones: [ɛ e̞ e̝ i ɑ ʌ ɤ ɯ ɔ o̞ o̝ u].
- My reasoning? Since the front and rounded vowel allophones don't fall neatly on the open, open-mid and close-mid lines, but are a little vertically offset, both [æ ɛ e i ɑ ʌ ə ɯ ɒ ɔ o u] and [ɛ e ɪ i a ʌ ɤ ɯ ɔ o ʊ u] are equally correct and equally imprecise. And since Choi only recognizes three phonemes and nine allophones, we can kill two birds with one stone and use separate uptacks and downtacks for for [e] and [o] with the understanding that they can be separate or merged, and it happens that [e̞ e̝ o̞ o̝] are fairly accurate to how Bender (1969) and Rudiak-Gould (2004) describe them anyway.
- And why MED's [ɑ] instead of Willson and Choi's [a]? [ɑ] is a back unrounded vowel like the other [ʌ ɯ], and though it's not the symbol Choi (1992) and Willson (2003) use, it is what the MED uses, and matches Bender's (1969) description of the open vowel becoming like English "ah," and Rudiak-Gould's (2004) description of it being like the "o" in "cot." Of course this still doesn't mean it is exactly cardinal [ɑ], and it's still not cardinal [a] (a front vowel), so [ä] or [ɑ̈] (a true central vowel) may actually be more accurate (it is how many North American English speakers realize the "ah" and "cot" vowels anyway), but since there's even less certainty about this vowel's backness than there is for the reported vowel height of [e̞ e̝ o̞ o̝], it makes more sense to stick with the [ɑ] symbol that already matches its other back unrounded allophones and not make it more complicated than it has to be.
- And why Willson's [ɤ] instead of MED's [ə]? [ɤ] is a back unrounded vowel like the other [ʌ ɯ].
- It could actually be that /ɐ ɘ/, which Willson describes as being -ATR (compared to /ɜ ɨ/ being +ATR), may genuinely have relatively more centralized allophones, depending on what "ATR" actually means, which of course is undertested supposition I can't lean on. It should be noted that Choi, who doesn't recognize four phonemes, uses only [ʌ] compared to MED's [ʌ ə] and Willson's [ʌ ɤ].
- What I would give for a proper monophthong chart graphic from any of these sources. Ng published some chart graphics, but is merely following Willson and the others she referenced and blindly assigns all vowel symbols at their exact cardinal chart positions, including placing [a] as a completely front vowel that we know it isn't.
Obstruent allophones: Voiceless at phrase edges and when geminated, and semi-voiced medially.
- My reasoning? Several of the references (Choi, Willson, Rudiak-Gould) cite it as the typical everyday speech habit. And real people spend more of their time just speaking rather than reciting prose or singing lyrics.
- And what of /tʲ/'s allophones? [tʲ] for voiceless is most uncontroversial. But I'm still less certain of the voiced allophones—[dʲ] or [zʲ]? It may ultimately be moot, since the language's own speakers treat [tʲ dʲ sʲ zʲ] as all the same sound. I'm inclined to stick with [zʲ] for the time being.
Bender's transcription: The online version.
- My reasoning? The transcription system was revised between Bender (1969), MED (1976) and MED online, and the latter reference is most current and actively being updated. But certainly mention the transcription system's history in the article.
- It may also be worth contacting the online maintainer and asking if the changes made to Bender's transcription system are alterations the site made on its own, or are alterations Bender himself has since made and shared with the site. In either case, the site hasn't explained the discrepancy.

The question is, do my fellow editors agree with my positions? I want to hear their reasoned input so we can hopefully reach a consensus. - Gilgamesh (talk) 05:48, 30 October 2019 (UTC)

This link will partially answer your last bullet point (sorry I can't go into all the other details yet, still haven't finished the necessary reading). –Austronesier (talk) 11:27, 30 October 2019 (UTC)

So Bender and Trussel are working closely together on this. All right.

And this answered another question I had: "Why aren't there more editors fluent in Marshallese? Why isn't there a Marshallese version of Wikipedia yet?" Because internet access there is actually really expensive, and it's never exactly been an affluent country. I mean, over the decades half the Marshallese population has moved to the United States under the Compact of Free Association, but the families they raise there tend to gradually lose their language and speak only English. If you really want to work with a large number of speakers, there's really no substitute for people who actually live where people speak Marshallese every day. - Gilgamesh (talk) 13:14, 30 October 2019 (UTC)

Question + comment: Where does Willson (2003) use the symbols /ɐ ɜ ɘ ɨ/ to represent the four vowel phonemes? After a first glance at her paper, I can only find the binary notations [+hi, +ATR], [+hi, -ATR], [-hi, +ATR], [-hi, -ATR]. In the MOD, Bender uses the notation /a e ẹ i/. –Austronesier (talk) 16:59, 30 October 2019 (UTC)

Oh! You're right, Willson (2003) doesn't use those symbols anywhere. Ng (2017) doesn't use them, either? So where did I remember those symbols? I know Choi uses /ɐ ə ɨ/... I can't find /ɐ ɜ ɘ ɨ/ anywhere. Now I wonder if I OR'ed them into existence and then misremembered where they came from. And now, honestly...I don't know which phonemic symbols to use. It seems logical that at one point I used /a ɜ ɘ ɨ/ because they represent equidistant vowel heights, and maybe I thought I'd originally modified it from /ɐ ɜ ɘ ɨ/ because I was remembering Choi's /ɐ ə ɨ/. I'm afraid you may be seeing me push my mental notes to the limit. I must have synthesized so much at different times over the past decade that I don't even remember what I synthesized. Thank you for bringing this to my attention. So, honestly, which phonemic symbols would you use for the four? The same as the front vowels, like Bender, but in IPA? Central vowel symbols, like Choi, but four of them? Or something else? - Gilgamesh (talk) 17:55, 30 October 2019 (UTC)

Also, Bender actually used a e ȩ i with a cedilla in the MED (1976), but the online version categorically replaces all cedillas with underdots. Now that I know they hope to distribute the dictionary in the Marshalls on CD-ROM, I guess it makes sense that they try to present their work in text that is easy to render with readily-available default fonts. - Gilgamesh (talk) 20:12, 30 October 2019 (UTC)

Seriously, in light of what you just pointed out, /ɐ ɜ ɘ ɨ/ appears like rather arbitrary OR now. But I think I have an idea of how to replace it. In Choi (1992) section 2.2, he writes:

Bender (1968) reports four degrees of vowel height in Marshallese - high, upper-mid, mid and low. There is some question regarding the upper-mid series. Bender (1968: 23-24) states that many of the contrasts involving the upper-mid series can be eliminated, based on alternations associated with the high and mid counterparts. Even when these contrasts are phonologically eliminated, however, there are a number of minimal pairs (albeit the frequency of such pairs is low) between the upper-mid series and the high and mid series. Some of these minimal contrasts were checked with a native speaker who judged some of the pairs as distinct and some of the pairs as homophonous. The confusion relating to these vowels could stem from a historical process that is currently in progress. This is, of course, entirely speculative. Regardless, given the uncertain status of the upper-mid series of vowels, they will be excluded from the current study.

I only reread this section yesterday, and realized it was never as simple as him recognizing only three vowel phonemes. Choi eliminated one of the vowel phonemes to simplify his particular study. And while there is uncertainty as to whether the vowel inventory can be safely scrunched to three, even he does not solidly conclude that there are only three, just that this nebulous and possibly moribund fourth phoneme is more unhelpful than not to the acoustic study his publication focuses on.

So I have an idea. If Choi used /ɐ ə ɨ/ and completely ignored the phoneme between /ə/ and /ɨ/, then /ɘ/ is the one cardinal central vowel symbol between them that can represent the phoneme he excluded. It's still slightly synthetic to use it, but less so than /a ɜ ɘ ɨ/ (three arbitrary symbols) or /ɐ ɜ ɘ ɨ/ (two). The alternative is following Bender's lead and using whatever consensus we arrive at for the front vowel symbols. a e ȩ i established a precedent, but Bender's system is very much not IPA. - Gilgamesh (talk) 02:24, 31 October 2019 (UTC)

Ng (2017) follows Bender in choosing the front realizations to represent the phonemes, but squeezes them into the relatively high positions /ɛ e ɪ i/, corresponding to Bender's (1968) and Willson's phonetic front series [ɛ e ɪ i].

The current use of /ɐ ɜ ɘ ɨ/ in WP indeed looks like OR, inspired by Choi's choice (/ɐ ə ɨ/) of the central position for an abstract phonemic representation. As far as I can see, Choi is the only researcher to do so. (Note also that he does so quite casually in his examples where we literally have to extract them; in Table 2.3. he simply calls them /high/, /mid/, /low/).

Using central vowel symbols is an arbitrary choice; they are in no way more neutral or unmarked than other vowels. And this practice is quite counter-intuitive, since they do not correspond to any real phonetic realizations. Of course, the choice of phonemic symbols is principally arbitrary, and not bound to IPA symbols, cf. Hale (2000) using wingdings. The intuitive choice for most linguists is to pick one of the common realizations and take its IPA notation. Often, one shifts to cardinal vowels for typographic convenience, like the classic /a e i o u/ for [a ɛ i ɔ u] in Spanish, or /a e i o u/ for ([a e i o ɯ] in Japanese.

Personally, I recommend Bender's notation for the phonemic representation, instead of resorting to OR (as in the current state) or an OR-ish extension of Choi's notation. And out of Bender's three historical choices (1968: ampersand; 1976: e cedilla; MOD: e underdot), I opt for the MOD notation. –Austronesier (talk) 10:10, 31 October 2019 (UTC)

A very rational assessment. I can see why Choi used central vowel symbols, since it's a common convention when describing the height-only vowel phonemes of other vertical vowel system languages such as Adyghe or Ubykh. But maybe for Marshallese it really does appear arbitrary, since the "light" (palatalized vowels) are treated more or less as "default." All right, agreed, front vowel symbols for the phonemes. Still, what symbols to use for the front vowels (and the back vowels, and the rounded vowels)... Bender (1968) might have specified [ɛ e ɪ i], which was reused by Willson and then by Ng, but the MED (1976) and the MOD use [æ ɛ e i]. Since those are Bender's more recent collaborations since his 1968 publication, maybe it can be considered less controversial, too. I mean, when Ng cited [ɛ e ɪ i] as the majority choice, she was largely citing references including Willson that were repeating Bender (1968) anyway, right?

(Come to think of it, Willson (2003)'s vowels in the article's vowel chart ought to be relabeled Bender (1968)—I've just never actually seen that particular work because it's not available for free. Most of the references I can access are digital, and the only Marshallese linguistics texts I have in physical print format are family heirlooms from my mother's library: Bender (1969, different from his 1968) and the MED (1976). I'd been using the MOD for some time before discovering we even had the 1976 edition, and that's only because there was some question as to whether my mother's books would be divided up among the family when she passed away in 2003.)

And [ɒ ɔ o u] for rounded vowels, per MED/MOD.

And, this may sound odd, but... I know MED/MOD use [ɑ ʌ ə ɯ] for the back vowels, and Bender 1968/Willson/Ng use [a ʌ ɤ ɯ], but my own synesthesia strongly prefers [ɑ ʌ ɤ ɯ], so that all four allophones can be back vowels, and thus all three columns of vowels can each contain the same vowel "hue." It's not like either [ɑ] or [ɤ] are individually controversial symbols for these allophones—it's just that neither source used both at the same time.

As for ẹ...yeah, I guess that does no harm. The standard alphabet still uses cedillas, though, and it would look odd for the Bender-format phonemic guides to use ḷ ṃ ṇ when directly paired with words spelt with ļ m̧ ņ using cedillas, such as in Wiktionary entries. It helps that neither ȩ nor ẹ are actually in the standard Marshallese alphabet, so it matters even less what ẹ actually looks like as long as it's visually distinct from e.

Okay, we're making some progress. I look forward to your next reviewing comments. - Gilgamesh (talk) 13:46, 31 October 2019 (UTC)

I have download access to JSTOR, so here's the vowels (phonemic and phonetic) from Bender (1968:20):

Bender (1968)
	Phoneme	front	back
	Phoneme	unrounded		rounded
High	/i/	[i]	[ɨ]	[u]
High-mid	/&/ (sic!)	[ɪ]	[ᵻ]	[ʊ]
Mid	/e/	[e]	[ə]	[o]
Low	/a/	[ɛ]	[ɑ]	[ɔ]

I think that you can actually read it online for free on your "shelf" on JSTOR if you have an account: [1].

Note that Bender here actually uses three central vowel symbols and only one back symbol for the back/unrounded series, and even four central vowel symbols in Bender (1963) (open access). –Austronesier (talk) 15:19, 31 October 2019 (UTC)

Huh...so where exactly did Willson get her vowels from? - Gilgamesh (talk) 17:38, 31 October 2019 (UTC)

Oh, yeah. From what I recall from my mother's bookshelf, in Bender (1969), a e & i was from an early version of Bender's transcription system, and was never itself actually IPA. He'd modified it by the MED (1976). Since I have both books, I can cite them when I describe the history of Bender's system and the symbols he used in a bit more detail than the article currently covers. - Gilgamesh (talk) 17:46, 31 October 2019 (UTC)

Let's see...

Phoneme	/pʲ/	/pˠ/	/tʲ/	/tˠ/	/k/	/kʷ/	/mʲ/	/mˠ/	/nʲ/	/nˠ/	/nʷ/	/ŋ/	/ŋʷ/	/rʲ/	/rˠ/	/rʷ/	/lʲ/	/lˠ/	/lʷ/	/j/	/ɰ/	/w/	/æ/	/ɛ/	/e/	/i/
Bender (1969)	p	b	j	t	k	q	m	ṁ	n	ṅ	n̈	g	g̈	d	r	r̈	l	ƚ	l̈	y	h	w	a	e	&	i
MED (1976)	p	b	j	t	k	q	m	m̧	n	ņ	ņ°	g	g°	d	r	r°	l	ļ	ļ°	y	h	w	a	e	ȩ	i
MOD	p	b	j	t	k	kʷ	m	ṃ	n	ṇ	ṇʷ	g	gʷ	d	r	rʷ	l	ḷ	ḷʷ	y	h	w	a	e	ẹ	i

- Gilgamesh (talk) 18:04, 31 October 2019 (UTC)

I want to turn to the phonetic realization of vowels first, before discussing consonants. I'm a bit torn here between being meticulously sticking to the sources, and wanting to add a pragmatical grain of synthesis, as you did. The MED/MOD representation as [æ ɛ e i ɑ ʌ ə ɯ ɒ ɔ o u] seems to be a good starting ground to me. The only major flaw is – as you have correctly noted – the inconsistency in using central [ə] instead of back [ɤ]. The latter has a precedent in Willson (2003, 2008), so [æ ɛ e i ɑ ʌ ɤ ɯ ɒ ɔ o u] would be a slightly synthecized alternative which we can use in phonetic transcriptions (here and in Wiktionary), with appropriate explanatory notes.

So much for symmetric environments. The big question is what to do in asymmetric environments. Willson's solution is to posit diphthongs which glide from the initial to the final target. She cites Choi (1992) and Bender (1968) for this, but the facts are not that easy. E.g. jok (/tʲekʷ/) is transcribed by Willson as [tʲeokʷ], while Bender has [tʲ^e̯ə̯okʷ]. Choi transcribes lim /lʲimˠ/ as [l^jiɨ^ɯm^ɤ], where Willson would write [lʲiɯmˠ]. So both Bender and Choi glide horizontally over three vowel positions, with Choi generally positing the central value as "peak". For want of a uniform treatment in the lit, choosing the least clumsy (= Willson's) solution seems ok to me. The major synthesis then would be to apply Willson's approach to the (modified) MOD values. This basically is the status quo of the current version of the WP article. –Austronesier (talk) 12:54, 1 November 2019 (UTC)

Generally agreed, and I've been gradually updating wikt:Module:mh-pronunc as we go. But as I've been rereading Choi's (1992) Marshallese acoustic studies and studying more of Ng (2017), it's been observed that vowel-glide and vowel-glide-vowel sequences seem to have different secondary articulation contours, with more emphasis on the center of the arc and less on the sides. I mean, I've never sat down and methodically read Choi (1992) word for word (I'm a scatterbrained skimmer), but he appears to get into this topic (at least) on page 72. And though Ng strikes me as a tad naive about the vowels (at least in their vowel charts), she notes on page 6, apparently citing Bender (1968) wrote, how vowel-final words don't behave in the typical way, which seems to have echoes in what Choi is saying. In light of this, I wrote this experimental code in the module:

		-- experimental
		if true then
			-- convert tied diphthongs after non-glides
			-- and before glides into monophthongs
			text = gsub(text, "("..C0..".)"..V..TIE.."("..V.."ɦ)", "%1%2")
			-- convert tied diphthongs between glides after vowels
			-- into monophthongs
			text = gsub2(text, "("..V.."ɦ.)"..V..TIE.."("..V.."ɦ)", "%1%2")
			-- convert tied diphthongs before non-glides
			-- and after glides after vowels into monophthongs
			text = gsub(
				text, "("..V.."ɦ."..V..")"..TIE..V.."("..C0..")", "%1%2"
			)
		end

I don't expect you to understand the Lua code itself or the regex syntax, and honestly this code makes even more sense in context to the whole module code anyway, but I think the human-readable comments in this code snippet should illustrate what I'm trying to do here. Basically, by treating these vowels as lingering more strongly towards their glide nucleus, it may be safe (and certainly more readable) to omit the weaker vowels from transcription entirely in these sequences. For instance, tāākji /tˠɐjɐktʲɨj/ can be phonetically transcribed [tˠæːɡ̠(ʌ͡ɛ)zʲi] instead of [tˠɑ͡æː͡ɑɡ(ʌ͡ɛ)zʲi]. Like what I suggested further up in my rambles, of phonetically transcribing mour as [mʲourˠ] instead of [mʲe͡ou͡ɯrˠ], since you can hardly even hear any [e] or [ɯ]. In these cases the orthography seems to gravitate to this nucleus as well. For an extreme example, this algorithm reduces Nuwio̧o̧k 'New York' /nʲiwijæwæk/ from [nʲi͡uː͡iæ͡ɒː͡ɑk̠] to the much more digestible [nʲuiɒːk̠]. I can't say if this is truly a good idea or if it can be justified with any practicality against scrutiny for OR. Which is why I shouldn't be the only one deciding whether this is a good idea.

I've also considered changing the way the epenthetic vowels are transcribed. Before, I put them in parentheses because they can be safely omitted without changing a phrase's meaning. But also, importantly, one of the references (okay, I'm ashamed right now for not remembering which) clarifies that, while Marshallese doesn't have the most phonemic syllable stress system, if it indeed has one at all, its vowel-consonant patterns can still influence stress patterns, yet epenthetic vowels do not affect these patterns at all. I'm not sure whether they could be considered asyllabic, or if a mora-timed language like Marshallese even has a strong sense of syllables outside of song, but it seems tempting to phonetically transcribe them as phonetic, for example [tˠæːɡ̠ʌ̯͡ɛ̯zʲi]. My main problem is that parentheses can be visually loud and distracting when used so much in IPA transcription (especially when there are also so many tie bars), and it would appear to improve presentation to omit them, but in a way that doesn't imply they are full vowels. Honestly, I would reduce the epenthetic vowels completely to superscript letters if I could, and while there exist hard-coded IPA symbols for [ᵆᵋᵉⁱᵅᶺᵚᶛᵓᵒᵘ], meaning a transcription like [tˠæːɡ̠ᶺᵋzʲi] could be an option, there isn't a hard-coded superscript character that corresponds to [ɤ]. Years ago I toyed with the idea of just using a superscript schwa [ᵊ] for all epenthetic vowels, since their vowel heights don't really matter anyway, but I vaguely recall another editor complaining that it would be inappropriate to use [ᵊ] unless the epenthetic vowel was genuinely a mid central vowel, and it also appeared too much like OR because none of the available references we were discussing had used such a transcription. The objection seemed a tad nitpicky, but then as now, I wanted to seek more consensus (not less), so I dropped it. So, a transcription like [tˠæːɡ̠ʌ̯͡ɛ̯zʲi] seems like the least worst option. - Gilgamesh (talk) 14:39, 1 November 2019 (UTC)

Frankly, I have just read my way to CVC enviroments which obstruent C's. Glides, vowel sequences and epenthetic vowels are still on my "to grasp"-list. –Austronesier (talk) 16:10, 1 November 2019 (UTC)

That's fine, take your time. :) But just a basic rundown:

While all the consonants have a secondary articulation that colors their neighboring vowels, the glide phonemes /j ɰ w/ are special—their role is largely to color their neighboring vowel and then be silent, all while occupying an extra mora for being there. In this way they are more invisible than true glides, as /j w/ only become literally [j w] allophonically in certain environments mostly at the beginnings and ends of words, while /ɰ/ never surfaces as [ɰ], but its presence neatly explains its effects on vowels, so is still recognized as a phoneme.
Since Marshallese syllable structure is CVC, CVCVC, CVCCVC, etc., every word begins and ends with a consonant, and no two vowels directly neighbor each other...on the phonemic level. In truth, when a single glide phoneme comes between two vowels, the glide vanishes and a phonetic vowel sequence occurs. When the two vowels have the same height, they effectively form a long monophthong. So /æjæ/ becomes [æː], /æɰæ/ becomes [ɑː], etc. (Vertical vowel systems are fun, aren't they?)
Epenthetic vowels are a kind of fleeting prosthetic vowel that occurs between two consonants in a consonant cluster that, for whatever reasons in the language, do not form a stable sequence of sounds. Marshallese is limited in what kinds of consonant clusters it allows without the presence of this epenthetic vowel. (This is sort of like the epenthetic schwa that forms between certain consonants in the Irish language and in Hiberno-English, like how they pronounce 'film' as /ˈfɪləm/.) A minority of possible clusters are stable like /nʲtʲ/ (nasal-obstruent cluster with the same primary and secondary places of articulation), and other certain clusters instead undergo some kind of contact assimilation in both primary and secondary articulation. (This is similar to the complex relationship of consonant clusters in Korean phonology.) In Marshallese, these epenthetic vowels and contact assimilations occur even across word boundaries in uninterrupted speech, which means epenthetic vowels can appear in the spaces between words, too.

I don't expect that to fully explain it, but just maybe help you know what to recognize when you see it. - Gilgamesh (talk) 17:29, 1 November 2019 (UTC)

I just made some major edits up through the Vowels section. I also overhauled the vowel table to reduce misrepresentation of the references, this time not even trying to clean up the mess that is Marshallese vowel research. - Gilgamesh (talk) 15:14, 3 November 2019 (UTC)

I would still like to reach a consensus on the most appropriate IPA conventions for describing long and final vowels. An approach resembling Willson's (2003) tied diphthongs throughout may have accuracy, and in the past I preferred this maximally technical approach, but it has since proven very difficult for article readers to read, even if they otherwise have basic knowledge of IPA transcription. The various references also sometimes dabble in simplified phonetic transcriptions, especially when doing so more resembles Marshallese orthography's more limited use of vowel symbols in words. [nʲæ͡ɑɑ͡ætʲ] and [nʲæ͡ɑː͡ætʲ] for naaj are difficult for readers to swallow, while [nʲɑːtʲ] is much easier to digest. Maybe the tied diphthongs are still appropriate for short vowels between non-glide consonants. But whatever approach we use for any of the vowels, it has to be well-crafted, well-referenced and supported by consensus. - Gilgamesh (talk) 09:46, 4 November 2019 (UTC)

I just noticed your /gɪmɪəbɹeɪ̯k/ edit message in this talk page from a few days ago. I understand. Take your time. :) - Gilgamesh (talk) 19:17, 4 November 2019 (UTC)

I am primarily struggling with the analysis of long vowels as /VC_iV/ and final vowels as /VC_i/ with the "virtual" glides C_i = /j w ɰ/. Unlike with the short vowels, we are leaving "first-level" phonemics and enter the realm of transfomational "deep-structure" phonemics. What I mean here is, the phonemic analysis of [tʲeokʷ]/[tʲ^e̯ə̯okʷ] (WIllson/Bender) as /tʲekʷ/ represents a lesser degree of phonemic abstraction than the analysis of [nʲɑːtʲ] as /nʲaɰatʲ/. But this is my own personal observiation. All sources operate on a single phonemic paradigm, so this is what we have to work with.

I am definitely sympathetic to your proposal for a simplified phonetic transcription of long vowels as in [nʲɑːtʲ], because long vowels have a well-defined and stable quality in their peak (Choi's "medial F2 target"), only slightly on- and off-gliding at the edges. And we are in good company: this convention is consistent with Table 2.8. in Choi (1992), where long vowels are not framed by on- and off-glides in phonetic transcription, unlike short vowels in asymmetric environments.

The same holds mutatis mutandis for initial and final vowels, i.e. /mʲaɰ/ could be spelled [mʲɑ] rather than [mʲæ͡ɑ] (the audible pronunciation is probably [mʲæ̯ɑ]).

Btw, I still prefer Bender's /a e ȩ i/ or /a e ẹ i/ for the phonemic notation. –Austronesier (talk) 11:28, 5 November 2019 (UTC)

Thank you for your reflection. I suppose we have no choice but to use the single paradigm. And yes, Bender's phonemes are remarkably clear once you understand how they work. An IPA phonemic transcription largely piggy-backs off them. But the orthography is still more phonetic than phonemic, and while Bender (1969) encourages memorizing the patterns in his own transcription, Rudiak-Gould (2004) takes the opposite view and encourages students to completely ignore it. It would be more perfect if Marshallese had a purely phonemic orthography following Bender's lead. But as long as educators can't agree on how to teach learners of Marshallese, it helps to at least have a phonetic IPA transcription. I mean, it would help if it also followed the deep-structure leads of how modern Marshallese orthography writes long vowels and long diphthongs. (My mind was a little blown when I finally realized that Jālwōj {jalwȩj} and Jālooj {jalȩwȩj}, both meaning 'Jaluit,' are more or less pronounced the same way because of epenthesis, but one spelling follows the shallower paradigm [tʲælʲ(e͡o)o͡etʲ] and the other spelling follows the deeper paradigm [tʲælʲe͡oː͡etʲ] or [tʲælʲoːtʲ].) But I guess sometimes, when you are forced to decide between informing and simplifying, you still need to inform. I'll see if I can find a way to make the single vowel paradigm more digestible, perhaps by alternating above and below tie bars, or ditching the gemination symbol for long vowels which is helpful for pure long vowels but rather visually confounding for long diphthongs. Something like...[tʲælʲe͡oo͜etʲ], so that each single vowel or tied pair can be clearly analyzed as the unit it is. - Gilgamesh (talk) 15:17, 5 November 2019 (UTC)

I would stick to good old [V:] for a simple reason: a phonemic notation is an abstraction of a phonetic transcription. So actually it is prima la fonetica e poi la fonologia. With Marshallese however, it appears to be the opposite: we are facing the unusual task of how to re-translate a relatively coherent phonemic system into a transcription of quite challenging phonetic fatcs. But we are not compelled to adjust the phonetic transcription just in order to make it easier relatable to the phonemic abstraction. That means, we don't have to write [e͡oo͜e] instead of [o:] for /ȩwȩ/ only because we write [e͡o] and [o͡e] for short /ȩ/ in CʲVCʷ environments. After all, /ȩwȩ/ is "only" a theorical abstraction of audible [o:]. –Austronesier (talk) 15:50, 5 November 2019 (UTC)

So you would prefer something more like [tʲælʲoːtʲ]? - Gilgamesh (talk) 16:37, 5 November 2019 (UTC)

Exactly. To sum up my preliminary proposal:

- [æ ɛ e i ɑ ʌ ɤ ɯ ɒ ɔ o u] for short vowels in symmetric environments, per MED (with /ə/ → /ɤ/)
- the corresponding long vowels for /V_iC_jV_i/ (C_j = /j w ɰ/), per Choi but using MED sounds
- ~~tie bar vowel sequences for short vowels in asymmetric environments, per Willson but using MED sounds~~
- /a e ȩ i/ for the vowel phonemes, per MED.

I still have to consult all sources for how to deal with final and initial vowels, and different-height vowel sequences (/V_iC_jV_k/ with C_j = /j w ɰ/ and i≠k). –Austronesier (talk) 17:09, 5 November 2019 (UTC)

Excellent. Once I have the algorithm adequately tweaked at Wiktionary, I intend to bring the module to Wikipedia as well, and finally improve the rest of this main article and update the pronunciation guides in articles. - Gilgamesh (talk) 17:23, 5 November 2019 (UTC)

As of this writing, the algorithm (subject to change) converts tied diphthongs to monophthongs gravitating towards a glide phoneme, so that (not real words) [(w)o͡etʲ] becomes [otʲ] and [tʲe͡o(w)] becomes [tʲo]. And if glides occur at the beginning or end of a phrase, the vowel's secondary articulation always gravitates to it: Initial [o͡e] becomes [o] and final [o͡e] becomes [e]. But when a vowel comes between two glides, currently the algorithm is right-leaning, which the orthography also seems to do, with Nuwio̧o̧k {niwiyawak} leaning right for every vowel until the last one which leans left. Since I finally read Bender (1968) and it reveals epenthetic vowels also sprout up within clusters containing a glide (including a cluster of two glides), this right-leaning vowel behavior also occurs in an epenthetic vowel between two glides. But there are limits to this approach, as obviously awa {hawah} 'hour' cannot just convert from [ɑ͡ɒɒ͡ɑ] to [ɑɑ] as it would delete all evidence of the /w/ glide. So, currently, the behavior is a compromise, becoming [ɑ͡ɒɑ] instead.

And in regards to the automatically-generated height of epenthetic vowels, I realized the right-leaning monophthongization of GVG is a tad weird when considering that currently the epenthetic vowel height between two glides is left-leaning, also looking at Bender's (1968) examples and especially how sequences like iyyV actually phonetically expand to iyiyV no matter what the vowel on the right is. This behavior of assuming the entire height of a neighboring vowel only seems to apply to epenthetic vowels neighboring glides: V₁GCV₂ becomes V₁GV₁CV₂ and V₁CGV₂ becomes V₁CV₂GV₂. The heights of epenthetic vowels between two non-glides still assumes a height transitional between the two nearest vowels: V₁CCV₃ becomes V₁CV₂CV₃. - Gilgamesh (talk) 19:03, 5 November 2019 (UTC)

By the way, wikt:User:Erutuon over at Wiktionary has set up testcases to show the results of wikt:Module:mh-pronunc in its current state. On that page, above the code, it displays a table of hundreds of Marshallese words in Bender's system, IPA phonemic and IPA phonetic. You don't have to understand Lua—just look at the table. And maybe we can use that as a basis on which to continue discussing the matter, and any changes I could algorithmically code into the module. I've been editing it repeatedly for days anyway. - Gilgamesh (talk) 16:54, 5 November 2019 (UTC)

I have noticed one potential inconsistency: jain {jahyin} produces [tʲɑːinʲ]/[tʲɑæinʲ] (long + short vowel), while jāij {jayij} yields [tʲæitʲ] (short + short vowel). Is this behavior supported by the sources? Further, I would suggest to add a syllable breaker between vowels (e.g. [tʲæ.itʲ]). I know this is redundant, since short vowels in asymmetric contexts are tied, but I think it will add some clarity. And probably [tʲɑ.inʲ] is more realistic than [tʲɑːinʲ].

Finally, are you sure that awa {hawah} is pronounced without phonetic [w]? In all other contexts, medial {-awa-} corresponds to -ọọ- in spelling. The spelling awa suggests that there is an audible [w] or [u], so {hawah} might actually be wrong. Do you have resources to check this? –Austronesier (talk) 12:26, 6 November 2019 (UTC)

{ja-h-yi-n} is four morae, {ja-yi-j} is three, per Willson (2003). Since epenthesis actually occurs in clusters with glides (again, thanks to Bender (1968) cross-referenced with the MED), I treat {jahyin} as a word that expands to {jah(a)yin} in speech. That produces a raw phonetic analysis of [tʲæ͡ɑ(ɑ͡æ)in], basically [tʲæ͡ɑː͡æin]. And since, again, transcriptions like that can be very hard to read, both [tʲɑæinʲ] and [tʲɑːinʲ] are possible simplifications. (That being a topic we still need to hammer out, but it doesn't hurt to test algorithms.)

And ⟨w⟩ in Marshallese tends to be used rather liberally, especially in a language where it's common for people to write it without any diacritics, including electronically on computers designed for Americans, so spellings like wu (wū) are even clearer than u (which could ambiguously be u or ū). And it's not a surprise that ai or āi form long diphthong sequences, but the spelling uwa in Kuwajleen also forms a long diphthong sequence for the same motivation that uwa can't be confused for ūa—though the spelling Kuajleen may still be encountered in writing (see the MED's concordances). Bender's transcription is meant to inform on pronunciation, and its use in the MED indicates the pronunciation of awa is {hawah}, phonemically /ɰæwæɰ/. In a word like this, it is the usual case that none of the glide consonants surfaces phonetically (since it has /ɰ/ at the edges and a single /w/ between two vowels), yielding a raw phonetic analysis of [ɑ͡ɒɒ͡ɑ], or [ɑ͡ɒː͡ɑ]. So awa is not actually a two-syllable word, but it's not always about syllables as Marshallese is a mora-timed language, and the vowel sequence still essentially captures the the three basic sounds in English /ˈaʊɚ/. For another example of ⟨w⟩ behaving in a weird manner, there is the word wau, meaning 'Mother Hubbard dress,' and based on the word Oahu which was the Hawaiian island where the missionaries who introduced the dress came from. The MED indicates wau as {wahwiw}, which is phonemically /wæɰwiw/, phonetically [ɒ͡ɑ(ɑ͡ɒ)u], which could conceivably be simplified to [ɒ͡ɑːu] in transcription—Marshallese basically took the word Oahu, stripped out the h which doesn't correspond to a Marshallese sound, then adapted it into a word that maintained the three vowels even if not as three syllables. - Gilgamesh (talk) 16:32, 6 November 2019 (UTC)

I should probably add, I know Willson didn't use the term "mora," but she did use the symbol μ in her diagram, which is conventionally used for "mora." And it's common for CVCVC words with a glide in the middle to form single (long) syllables in speech and in song. A good example in song is a cover of the song Ij Io̧kwe Ļo̧k Aelōn̄ Eo Aō at https://www.youtube.com/watch?v=A3L9xayHoyE with lyrics provided. - Gilgamesh (talk) 16:59, 6 November 2019 (UTC)

Another point I keep forgetting to bring up, is my observation that Marshallese singing tends to be handled very differently from Marshallese speech. Not only is it common for obstruent voice to be more consistent (like in Ij Io̧kwe Ļo̧k Aelōn̄ Eo Aō where none seem to be voiced), and especially in pop music is it common for /tʲ/ (including geminated or final) to always be pronounced [sʲ] or [zʲ],[2][3][4] but it also seems more common to omit epenthetic vowels in song (also observed in Ij Io̧kwe Ļo̧k Aelōn̄ Eo Aō). - Gilgamesh (talk) 17:13, 6 November 2019 (UTC)

I've given this some additional thought. [tʲɑ.inʲ] and [tʲɑːinʲ] are actually both realistic, but for different reasons. [tʲɑ.inʲ] is more like how one would sing or chant the word, breaking it into CVC.CVC sequences and ignoring epenthetic vowels. But [tʲɑːinʲ] is closer to how it is spoken organically in epenthetic CVC(V)CVC fashion. Perhaps using parentheses for the epenthetic vowels is ultimately a better idea after all: [tʲɑ(ɑ)inʲ]. It would allow the IPA transcription to simultaneously elegantly capture how the phonemes are broken down as well as how they normally flow together. [mʷɑ(ɑ)zʲɛ͡ʌlˠ] [kʷuɒzʲ(æ)lɛɛn, kʷuɒzʲ(æ)lɛːn] [bʌɔkʷ(ɑ)ɑk]. Though double vowels are geminates, even Bender (1968) in some places uses double vowel notation to phonetically analyze them. One word especially that comes to mind is kaarar {kaharhar} 'drive a car', which is an inflection of kaar {kahar} 'car.' It is common to synthesize verbs from nouns by duplicating the last two morae. This much I knew for years. But it wasn't until reading Bender (1968) that it was clear that the word isn't just [kɑːrˠ.ɑrˠ], but actually pronounced [kɑːrˠɑːrˠ], or, in another sense, [kɑɑrˠ(ɑ)ɑrˠ]. - Gilgamesh (talk) 03:39, 7 November 2019 (UTC)

This...actually potentially adds a layer of complexity to pronunciation. In my current algorithms, there are at least two principal situations where I leave tied short vowels intact in transcription: Between two non-glide consonants, and as solitary CVC utterances between two glide consonants.

Take Metwe 'Midway atoll,' which is analyzed as {mȩtwȩy}, phonemically /mʲetˠwej/. In a phonetic analysis, a syllable-isolating approach could render it [mʲe͡ɤdˠ.o͡e]. With epenthetic vowels in parentheses, it is [mʲe͡ɤdˠ(ɤ͡o)o͡e]. But since [ɤ͡oo͡e] forms a long vowel, with different on-glide and off-glide behavior, the whole word can be reanalyzed as [mʲe͡ɤdˠoe] according to the current state of the long vowel simplification algorithm (again, which needs to be hammered out and completed). If that were the case, it wouldn't make sense to mix the parenthetical and vowel simplification approaches at the same time, as a naive reading of [mʲe͡ɤdˠ(o)e] might incorrectly syllable-isolate it to [mʲe͡ɤdˠ.e]. One would almost have to separate words like these into two metric modes, one isolating and one combining. But in a sense, both Bender's transcription and a phonemic IPA transcription already do that—just without notes on how to sing words.

Ultimately, it may not be within our scope either at Wikipedia or Wiktionary to cover the isolating mode as it is sung, chanted or enunciated, any more than a French dictionary optionally indicates every normally silent schwa that is pronounced aloud in song. Consider how the French phrase Frère Jacques is two syllables [fʁɛʁ.ʒɑk] when spoken, but four syllables [fʁɛ.ʁə.ʒɑ.kə] when sung, yet for each word, [fʁɛʁ] and [ʒɑk] are the dictionary pronunciations, not [fʁɛʁə] or [ʒɑkə]. Though this is effectively the opposite situation from Marshallese which has more sounds in speech than in song.

I mean, it all makes more sense to me, now: Jālwōj was never an alternate form of Jālooj, but just the opposite. Jālwōj is made up of individual morphemes jāl and wōj, compounded as Jālwōj, and then fused together as the newer inseparable word Jālooj that refers specifically to Jaluit atoll rather than just a compound of jāl and wōj. So while Jālwōj and Jālooj are normally pronounced identically in speech, they would necessarily be syllable-isolated differently for song, as CVC.CVC Jālwōj can be isolated into neat CVC syllables whereas CVCVCVC Jālooj cannot. - Gilgamesh (talk) 04:31, 7 November 2019 (UTC)

I've been thinking that, for lack of a better source, it may be all right to use the newer orthography as a cue for long vowel simplification.
If you look at the old orthography, it had far more inconsistent spelling, with some words (like emon {yem̧m̧an}) and so many common place names (Bikini {pikinni}, Kwajalein {kʷiwajleyen}, Rongelap {rʷegʷļap}, Wotho {wettew}) straight-up taken from old colonial transcriptions. Double vowels were often ambiguously written once, double consonants were often ambiguously written once, vowels were a mess, but until the 1970s there was no competing system.
The new orthography, which as far as I can tell is contemporary with the MED (1976), is still not perfectly morphophonemic, and still has its various flaws, but it has relatively far greater internal consistency of phoneme-to-spelling conversion. As such, most vowels are spelt in a fairly consistent manner, though, being a more scientific orthography, epenthetic vowels are not written at all except in certain fused words like Jalooj. In most cases, when long vowels and sequences of vowels are written out, I think it's fairly safe to follow them as cues in pronouncing a word. That is not to say that IPA phonetic vowels should be tweaked manually on a spelling-by-spelling basis, and various words still have homophonous alternate spellings, especially where a second spelling remains historically influential into the new orthography (M̧ajeļ/M̧ajōļ, io̧kwe/iakwe, etc.) But the spellings in general appear to follow cardinal patterns, which I think I mentioned before:

Leave short diphthongs between two non-glides /CV͡VC/ as tied diphthongs.
Leave short diphthongs between two glides /GV͡VG/ in isolated one-syllable utterances as tied diphthongs.
Gravitate towards a glide at the beginnings /GV͡VC/ and ends /CV͡VG/ of words in initial GVC or final CVG sequences, as evidenced by so many words like āt {yat} and mo̧ {maw}.
Gravitate towards a glide nucleus /CV͡VGV͡VC/, evidenced by so many words like naaj {nahaj} and kāān̄ {kayag}.
Otherwise lean towards the right in most cases where there are vowel-glide-vowel-glide sequences /V͡VGV͡VGV͡V/, evidenced by words with such sequences like Nuwio̧o̧k {niwiyawak} [nʲi͡uu͡iæ͡ɒɒ͡ɑk ~ nʲuiɒɒk].
Initial /GV͡VGV/ leans left, as evidenced in words like aelōn̄ {hayȩlȩg}, but only if these two vowels /GV͡VGV/ have different secondary articulations, recognizing the awa {hawah} dilemma and preventing *[ɑ͡ɒɑ] from being reduced to *[ɑɑ]. Note that by this point, this vowel /GV͡VGV/ will have already been simplified by one of the previous rules, and will lean right if the next consonant is also a glide (as is the case with awa).

So that covers my vowel simplification algorithm so far, and I've given it some additional thought. The spelling awa tries to square the circle by recognizing the glides at the beginning and ends and inserting a ⟨w⟩ to indicate symbolically that it gets rounded in the middle without having to commit to a vowel symbol. It works well as a disambiguator, but you're right that awa ends up looking very different from o̧o̧.

We could just go ahead and do [ɑwɑ], but it would be ambiguous since it might be read as [ɑu̯ɑ] following the cardinal pronunciation of the IPA symbol [w] as equivalent to [u̯]. That's not a problem with the highly underspecified phonemic IPA transcription where most of the symbols cannot be assumed to be equivalent to their cardinal values, but the phonetic IPA transcription has to be more careful.
Another problem is that [w], as a consonant, pretty much always forms part of a syllable break when it occurs between two vowels, which makes one alternative [ɑɒ̯ɑ] less than ideal. Another possible alternative is [ɑ̯ɒːɑ̯], following the same logic where you suggested that [nʲɑːtʲ] is actually more like [nʲæ̯ɑːæ̯tʲ]. The dilemma there is that even if the most fleeting off-glides can occur, fully-articulated back unrounded glides are said to never actually surface in Marshallese, which seems to make [ɑ̯ ʌ̯ ɤ̯ ɯ̯] seem just as unlikely as their corresponding consonantal values [ʕ ʁ ɰ]. (Though this no-literal-[ɰ] rule comes from at least one of the references, I've always been at least a tad skeptical that this is an absolute rule, so there's potentially room for additional reference-mining here.)
That leaves the diphthong interpretation [ɑ͡ɒɒ͡ɑ] which could be conceivably (albeit unprettily) simplified to [ɑ͡ɒɑ] or [ɑɒ͡ɑ]. (I must fully admit that I'm on the most OR-heavy ground at this point, even if the earlier points were increasingly far more confident.)

So that's my whole vowel simplification algorithm in a nutshell. - Gilgamesh (talk) 11:56, 7 November 2019 (UTC)

I actually like [ɑɒ̯ɑ] for awa. The nonsyllabic diacritic makes a vowel into a consonant, so [ɒ̯] can serve as a syllable boundary. It's similar to [ˈruskəɪ̯ə], used as the transcription of ру́сская on the Russian Wiktionary (compare to [ˈruskəjə] on the English Wiktionary), and more examples from English Wiktionary of a nonsyllabic vowel between two vowels can be seen here. It would be most appropriate if the rounded part of the vowel sequence is indeed shorter or less prominent than the two unrounded parts. — Eru·tuon 22:03, 7 November 2019 (UTC)

So you don't think [ɑɒ̯ɑ] would necessarily have to be seen as more than one syllable? (Oh, and hello, Erutuon. Didn't expect to see you chime in here on this Wikipedia talk page, though I appreciate it.) - Gilgamesh (talk) 01:20, 8 November 2019 (UTC)

Oh, no, [ɑɒ̯ɑ] would be two syllables. I wasn't reading closely enough; I thought it was supposed to be two syllables. It seems like a very unusual vowel sequence for one syllable: a triphthong in which the middle is rounded. — Eru·tuon 05:35, 8 November 2019 (UTC)

Well, it is a little unusual, but at the same time not that unusual. The English word hour itself (which Marshallese awa was loaned from) was until recently one syllable, and broke into two because of the same kinds of language processes that are also causing words like meal and nail and owl to become pronounced with two syllables in many accents [ˈmiː.əl, ˈfeɪ.əl, ˈaʊ.əl]. Since the Great Vowel Shift, hour went from being [ˈuːɹ] to [ˈəuɹ] to [ˈaʊɹ] to [ˈaʊ.əɹ]. John C. Wells' proposed lexical sets did not allow for words like hour and fire being one-syllable words owing to how commonplace it has become to pronounce them with two syllables, but it's also telling that both words traditionally could and can still be treated as one-syllable words in song and poetry.

The Marshallese word awa is phonemically analyzed as {hawah} or /ɰæwæɰ/, and phonetically analyzed in its most basic form as [ɑ͡ɒ‿ɒ͡ɑ]. And since it's a one-syllable word (a single glide between two vowels forms a mora break but not necessarily a syllable break), that usually means something more like [ɑ͡ɒː͡ɑ], or perhaps [ɑ̯ɒːɑ̯] (that's the less-than-certain quantity at the moment), because Marshallese vowels tend to change on-glide and off-glide behavior whenever they they neighbor (otherwise invisible) glide consonants /j ɰ w/ (the same consonants that tend to diappear between the phonemic IPA and phonetic IPA, if you ever noticed in the very nicely-organized tables you set up at wikt:Module:mh-pronunc). A short vowel neighboring two non-glide phonemes has the conventional behavior of transitioning smoothly from one secondary articulation to other, which is why you see a lot of both [æ ɑ ɒ] (with the same secondary articulation at both beginning and end) and [æ͡ɑ æ͡ɒ ɑ͡æ ɑ͡ɒ ɒ͡æ ɒ͡ɑ] (where the secondary articulation of the beginnings and ends differ) in the phonetic transcriptions in those nonglide-vowel-nonglide environments. But when vowels neighbor any of the glide phonemes /j ɰ w/, the vowels gravitate more strongly towards the nucleus of the glide and spend a larger portion of time there, including in CVGVC words like awa, though that word is not just CVGVC but actually GVGVG, too. And such sequences can grow even more complex, like CVGVGVGVC in the word Nuwio̧o̧k {niwiyawak} /nʲiwijæwæk/ [nʲi͡u‿u͡i‿æ͡ɒ‿ɒ͡ɑk → [nʲuiɒːk] 'New York'. Welcome to the strange world of vertical vowel system linguistics, and in this particular vertical vowel language, the consonant phonemes /j ɰ w/ usually never actually surface as consonants, but instead make themselves known in how they color the backness and roundedness of the vowels they neighbor—the vowels themselves flow together instead of forming a true syllable hiatus. Actual syllables break where there is sufficient tension to do so, including between non-glide consonants, and I suppose after a sequence of enough vowels (Nuwio̧o̧k is obviously at least two syllables).

I was also mentioning how Marshallese orthography tends to make liberal use of ⟨w⟩ as a disambiguator in situations that would otherwise be difficult to spell clearly with fewer vowels or with vowels alone, as in the word awa itself. It's never really been the easiest language to write down accurately, as the Latin alphabet isn't really conveniently designed for vertical vowel systems. Marshallese's more closely-related non-vertical vowel system languages like Pohnpeian and Gilbertese, and even most of its somewhat more distantly-related languages like Fijian, Malay and Hawaiian, all use more conventional vowel systems and consequently have all been much, much easier to write in Latin letters. Bender's transcription system ({niwiyawak}, etc.) is a mostly one-to-one phonemic system, but its peculiar height-only a/e/ȩ/i vowel system tends to confuse the hell out people only familiar with the more common a/e/i/o/u-oriented vowel systems of colonial languages like English, Japanese, German or Spanish. - Gilgamesh (talk) 09:27, 8 November 2019 (UTC)

I looked at your vowel simplification rules and considered the examples in the Wiktionary module documentation page, and wondered if it was too complex. It seems like a glide's effect on the vowel shouldn't be lost in the phonetic output. So if there's a glide on one side and a non-glide consonant on the other, the vowel should be colored by the glide. And perhaps because palatalized consonants are less marked than velarized or rounded consonants, the velarization or rounding should win out in the output, if there are no orthographic biases to apply.

So I applied those ideas in the sandbox module, by removing much of the previous vowel-processing code and inserting code that processes each vowel in order, and looks at the surrounding consonants, their articulation and whether they are glides. This improved the output of aaet {hahayȩt} for instance: the second h glide was being ignored ([ɑæetˠ]) but the new code fixed this ([ɑːetˠ]). The epenthetic vowels are not being handled correctly though. — Eru·tuon 20:09, 8 November 2019 (UTC)

My experimental module has a different way of resolving the articulation of the vowels, and it more closely matches the orthography in certain cases: Ādkup {yadkʷip} /jærʲkʷipʲ/ → [ærʲ(ɔ)ɡʷupʲ] (my module) vs. [ærʲ(ɛ͡ɔ)ɡ̊ʷu͡ipʲ] (main module), Aelok {haye‌̧le‌̧kʷ} /ɰæjelʲekʷ/ → [ɑelʲokʷ] vs. [ɑelʲe͡okʷ], Aelōn̄in Ae {haye‌̧le‌̧gin hayey} /ɰæjelʲeŋinʲ ɰæjɛj/ → [ɑelʲɤŋinʲɑːɛ] vs. [ɑelʲe͡ɤŋɯ͡inʲɑːɛ], Al‌̧ajka {hal‌̧ajkah} /ɰælˠætʲkæɰ/ → [ɑlˠɑzʲ(ɑ)ɡɑ] vs. [ɑlˠɑ͡æz̥ʲ(æ͡ɑ)ɡ̊ɑ].

Unfortunately it doesn't always handle long repeated vowel–glide sequences well: hence Nuwio‌̧o‌̧k is [nʲuːɒːk] – the /j/ vanishes entirely! I'm trying to think of a way to fix this in a one-pass approach. I'm not sure if this will end up being a practical approach that leads to improvements in the main module though. — Eru·tuon 20:45, 9 November 2019 (UTC)

I just wanted to say again, I really appreciate all your review. It's been transformational to have a mind other than my own putting this much thought into improving Wikipedia's handling of the language. From all that I've seen over the years, not even having as much knowledge of Austronesian languages as you, I get the impression that Marshallese phonology is one of the more unusual as far as Austronesian languages go. - Gilgamesh (talk) 17:36, 7 November 2019 (UTC)

@Gilgamesh~enwiki: I can confirm that Marshallese is rather unusual in an Austronesian and even in a Micronesian context. Many languages of Micronesia have a complex morphophonology, which can be handled in an elegant manner by positing an underlying phonemic level with transformational rules which derive phonemic surface forms. But in none of these languages, the step from surface phonology to phonetic realization is as complex and bewildering as in the case of Marshallese.

As for the "W"-question, Bender (1968, p.18) is quite enlightening. Translated into our current notation, he states that /wVC/ is realized as [wVC] if C is unrounded. As an example he gives wōp {wẹp}, which he transcribes as "[wᵻp]", whereas the current algorithm would yield [o͡ep], right? While the auditory distance between [w] and [o̯] is not too wide, I think we run into problems with low vowels. E.g. wañ {wag} which would be [waŋ̍] per Bender (1968) but [ɒ͡ɑŋ̩] in the current algorithm. My field linguist instinct tells me that [ɒ͡ɑŋ̩] (or [ɒ̯ɑŋ̍]) with rounded onset but stable F1 is a bit strange. In other words, I don't think that Bender misheard [ɒ̯] as [w]/[u̯].

[ATTENTION OR]: I have tried to capture the pronunciation of three words in the recording of the New Testament which is available online: wa {wah}, pilawā {pilahway}, and awa {hawah}. I hear something like [wɑ], [piʎɑ:wæ] and [ɑ:wɑ]. The semivowel is not just rounded but clearly raised. And the first a of awa sounds exactly like the a in pilawā. I suspect that awa actually is {hahwah}. [OR END]

–Austronesier (talk) 16:08, 8 November 2019 (UTC)

Yes, this certainly makes Marshallese complex...but I should also say it makes it fun, too. :)

Now, let's see... Huh, so the weird initial wVˠCʲ spellings the newer orthography uses are actually...[wVˠCʲ]? Wow, that is weird. Does that mean we use the back unrounded symbols for the vowel? And actually, the current algorithm as of this writing has been yielding initial [o͡epʲ] as [opʲ]. But this is why I reached out for help—we need information like this, and now I can further improve the algorithm.

I suspect the ⟨w⟩ is still mostly invisible in words like Kuwajleen which have commonplace alternate spellings (even in the new orthography) as Kuajleen. I mean, I totally believe that it must be pronounced and raised in more environments than I was assuming, but a lot of Marshallese writing is still usually expressed in few or new diacritics, and pretty much zero cedillas or underdots (which weren't part of the old orthography anyway). The newer orthography since the 1970s can look quite nice, and people can write in it manually and such, but it was always extremely ambitious in the machine age of computers and computer-printed signage, etc., with until recently a dearth of compatible fonts, robust text rendering environments, etc. Although part of this may be a naive confirmation bias on my part, as I came from Kwajalein, not Kuwajalein, so I must accept the possibility that the ⟨w⟩ is physically pronounced in Kuwajleen, though anecdotal evidence from my childhood still treats kuwaj as one audible syllable in Marshallese. (By comparison, Ebeye's name was historically mangled upside down and sideways. Originally and officially Epjā, and now locally Ibae, which is just Epjā—or rather Ebeje in the old orthography—filtered through layers of Germans and Americans not maintaining a pronunciation guide.)

For now, I'll investigate what can be done with the W situation. - Gilgamesh (talk) 17:23, 8 November 2019 (UTC)

Okay, check Erutuon's demo table at wikt:Module:mh-pronunc now. I overhauled a lot of algorithm rules, especially for vowels at the beginnings of phrases. The long vowel sequence rules are mostly the same, but with one change: /ɰVGV/ is now left-leaning instead of right-leaning. Also (and this is tentative), if a glide finds itself surrounded by two vowels both leaning away from and don't share its secondary articulation, a literal [j] or [w] is inserted, which also seems to be reflected in orthography anyway. This affects awa-like words, including Awai 'Hawaiʻi.' - Gilgamesh (talk) 20:18, 8 November 2019 (UTC)

There is one outstanding issue in particular on my mind at the moment. The way Marshallese orthography writes word-initial vowels before non-glides tends to be very consistent. But two words raise an inconsistency: eok {yȩkʷ} and ekwe {yekʷey}. Since the orthography treats {e} and {ȩ} identically, it's not a difference in vowel height. ekwe has an alternate spelling of eokwe, but the fact that the ⟨o⟩ can and does just disappear like that suggests that its presence in words like eok has less to do with its being pronounced, and more to do with the orthography's way of spelling such rounded non-glide consonants that come at the ends of words. The rules of thumb are:

C before a rounded vowel (where C means any appropriate rounded consonant letter).
Cw at the beginnings of words before a non-rounded vowel.
C after a rounded vowel before another vowel, as in m̧uļe {m̧iļʷȩy}, with the exception that {kʷ} and {gʷ} remain kw and n̄w in these environments, as in jukwa {jikʷah}.
C without w at the end of a word, but it must follow a rounded vowel spelling: o̧C/oC/uC.

And yet this last rule appears to be waived in words that start with a non-W glide, a vowel, a rounded non-glide vowel and another vowel, as in ekwe. I'm skeptical that the first vowel is actually pronounced differently, as ekwe vs. eokwe being alternate spellings of a phonemically identical word strikes me as similar to M̧ajeļ vs. M̧ajōļ. I mean, it's possible they really are just different surface realizations of the same phonemic sequence in free variation, but it all makes me suspect that words like eok are really just [ekʷ] a lot of the time. - Gilgamesh (talk) 20:50, 8 November 2019 (UTC)

I honestly wonder how Marshallese came to be a vertical vowel language. They're rare in the world, not just among Austronesian languages. Besides a few other outliers like Irish and some interpretations of Russian or Mandarin, vertical vowel languages are most common in the Northwest Caucasian languages where they are actually the rule rather than the exception. And somehow I'm highly skeptical that the Marshalls received an impromptu settlement of Ubykhs in ancient times. There's established research and literature on language evolution topics like tonogenesis and cheshirization, but how does a vertical vowel system motivate itself into being, from a non-vertical vowel ancestor? - Gilgamesh (talk) 21:14, 8 November 2019 (UTC)

@Austronesier: I think I may have just solved the case of your mystery {hahwah}. This is all very much OR, but I'll do my best to back my reasoning. Now, in one of the recent module updates at Wiktionary, I changed the algorithm's multi-vowel-sequence behavior to simplify into monophthongs that lean left after {h}, right? And though this conveniently fixed a few problems (for instance, you were hearing [ɑːw], not [ɑɒw]), I realized that it is actually entirely logical for that {h} phoneme, /ɰ/, to demand neighboring vowels lean towards it, maximizing its available time as a vowel. Why? Because a literal [ɰ] never surfaces in the spoken language even when [j] and [w] frequently can, so my hypothesis is that the reason it never surfaces is that /ɰ/ tends to keep itself stretched out as a vowel on both available sides so that it can never be abbreviated and isolated enough to become any kind of truly phonetic [ɰ]—the phonemic /ɰ/ is "greedy" for vowels. So, the sequences of multiple vowels are indeed mostly right-leaning, but after {h} they must be left-leaning, taking priority away from any glide across the vowel to its right. This may be at least one reason while the {pilahway} you also heard had a clearly articulated [w]—because the vowels on both sides of it [ɑwæ] were leaning away from it and neither agreed with the the secondary articulation of the /w/, and so that it had to surface as an full approximant to keep from disappearing entirely. So this hypothesis of a "greedy" /ɰ/ brings me to the {hahwah} you were hearing. I realized was probably actually still {hawah}, and it has to do with epenthetic vowels. See, one of the first things I ever learnt about epenthetic vowels in Marshallese (per Willson 2003) is that they happen even across word boundaries in uninterrupted speech. And since consonant clusters containing one or more glides also trigger epenthetic vowels (per Bender 1968), that means glide-triggered epenthetic vowels can happen at word boundaries, too. Since {hahwah} would necessarily expand to {hahawah} when pronounced uninterrupted, my guess is that what you were hearing was actually {-a-hawah}, with the first vowel in awa being compensatorily lengthened by a word-boundary epenthetic vowel connected to the previous word. Now, I can't say for certain if this is what's really happening, since I'm relying on what you described in your own OR, but it might just explain things. - Gilgamesh (talk) 00:53, 9 November 2019 (UTC)

I think I've made significant progress on the module. The phonetic pronunciations have never looked better than they do now. Just in case you're wondering, some pronunciations are paired with a different version of the script hosted in its sandbox, which Erutuon set up to easily compare between two different versions of the script. For now, the sandbox version goes an additional step in radically simplifying short vowels between non-glide consonants, following regular patterns found in the written orthography. The beauty of this is that while the vowel reductions are not the most phonetically accurate, they are also not necessarily wrong, because of free variation. If it ever became desirable to stick to a phonetic IPA transcription that was simpler to read, this is that. Currently, these radically vowel-reduced forms are not at all deployed in Wiktionary entries (and aren't necessarily going to be—this is quite experimental), and only appear in the browsable table at wikt:Module:mh-pronunc. What is your opinion? Linguistically, personally, aesthetically, any opinions at all. - Gilgamesh (talk) 20:59, 10 November 2019 (UTC)

@Gilgamesh~enwiki: Sorry I can't go much into detail yet because of other projects (WP and most of all non-WP), but I think you are approaching a very handy and concise transcription here. I just wonder about one "inconsistency" which I have noticed at a qiuck look: M̧ajōļ {m̧ahjeļ} yields [mˠɑːzʲɛ͡ʌlˠ]/[mˠɑːzʲɛlˠ] with "ɛ", while Nōļ {neļ} appears as [nʲɛ͡ʌlˠ]/[nʲʌlˠ] with "ʌ", inspite of near-identical environements. –Austronesier (talk) 08:19, 11 November 2019 (UTC)

I'm glad you asked. For starters, the more concise transcription goes by regular patterns reflected in the new orthography, the same one promoted by the MED. And though Erutuon put M̧ajōļ in the table, it is actually a common holdover spelling from the old orthography with modern cedillas added (it was traditionally spelt Majōl), and the new orthography recommends M̧ajeļ instead. So in effect, M̧ajeļ and M̧ajōļ have become alternate spellings of the same word in an orthography where alternate spellings have become rarer. Secondly, {CʲeCˠ} spellings between non-glides are a rather special case. In most cases, the vowel would be spelt ō. However, if the palatalized consonant at the left is one of either {d}, {j}, {m} or {p}, the spelling e is used instead. I don't know why the orthography does that, as historically Nōļ was anglicized as 'Nell.' Could be based on one of Bender's analyses, one of which we already discussed seemed to inform spellings like waj wōj wūj. Without reading a more thorough reference or direct access to Mr. Bender, we don't know. I also can't provide a direct non-original reference for the d j m p rule, as this was based on my own direct large-scale statistical analysis of spellings in the MED specifically looking for ō vs. e patterns in both {CʲeCˠ} and {CˠeCʲ} non-glide spellings, back when I was trying to write a similar script years ago (not on the wikis) that could automatically convert phonemes to orthographic spellings with a high degree of accuracy. I can confirm personally that M̧ajōļ is most definitely an irregular spelling according to that statistical analysis, and also a secondary spelling according to the way the MED treats it. I wish I could give you a hard reference that states the d j m p rule outright, but no, it's something I had to synthesize, because I was not comfortable with not being certain when to use ō and when to use e. I can say that the other non-glide pattern, {CˠeCʲ}, universally prefers ō in the MED, even after the consonant-like {yi'y}, though not necessarily after an ordinary {y} glide when it more often becomes e (and, as we discussed, transformational "deep-structure" patterns tend to take over with long vowels and sequence of vowels), and alternate spellings between non-glides that use e are usually holdovers from the old orthography, as in wōtōmjej vs. otemjej 'all,' with the old orthographic spelling appearing in the inscription on the Flag of Bikini Atoll. - Gilgamesh (talk) 10:03, 11 November 2019 (UTC)

I've given thought to what you said about the simplified vowel transcriptions being handy and concise. And, in truth, if we had to be honest, littering Marshallese phonetic transcriptions with so many short diphthongs may have been accurate, but it was always going to confuse ordinary people. After all, they are not diphthongs by primary nature, but diphthongs by secondary nature, literally—the secondary articulations of their nearest consonants. And they're not quite like similar-sounding Old English diphthongs like io [i͡u], eo [e͡o] and ea [æ͡ɑ], because though like Marshallese diphthongs those are vowel-height-harmonized, they actually were phonemic diphthongs that evolved from earlier non-vowel-height-harmonized phonemic diphthongs. Marshallese short diphthongs always seemed more like an accident of circumstance; a curiosity of the vertical vowel system creature. As such, while it is certainly notable to discuss the behavior of Marshallese short diphthongs as a matter of phonology, it may ultimately be more helpful to use concise phonetic transcriptions that stick with one of the allophone vowels. This is the approach the newer orthography went with, and while one may still question the wisdom of writing such a language phonetically when a phonemic approach like Bender's might suffice, it can't really be denied that the newer orthography is adequately readable and does a halfway decent job approximating phonetics. Using, for the most part, just one vowel symbol per short vowel may be acceptable phonetically as long as it is understood that other vowel realizations are possible in free variation. [mˠɑːzʲɛlˠ] vs. [mˠɑːtʲɛ͡ʌlˠ] vs. [mˠɑːdʲʌlˠ]. [ɑelʲɤŋ] vs. [ɑ͡æelʲe͡ɤŋ] vs. [aelʲeŋ]. [wɤdˠɤmʲ(e)zʲetʲ] vs. [o͡ɤtˠɤ͡emʲ(e)tʲetʲ] vs. [odˠemʲ(e)dʲetʲ]. None of them are actually wrong, per se—those single vowel symbols are just approximations that lean more towards legibility than accuracy without actually sacrificing meaningful details. I would be very tempted to make the more concise phonetic transcriptions into the general practice. I don't ultimately know if it's the better idea, but I would be tempted, just so that another YouTube or Reddit commenter doesn't lament that Marshallese IPA is so difficult to digest. The language's complexity can be beautiful, but that same complexity also makes it formidable to study, and I wouldn't be against making the learning curve of study at least somewhat gentler. - Gilgamesh (talk) 18:35, 11 November 2019 (UTC)

My personal thoughts (aka OR) on short diphthongs vs. single vowels:

In cases like nōḷ {neḷ}, the onglide [ɛ̯] in [nʲɛ͡ʌlˠ] may well be perceived as a concomitant feature of the palatal [nʲ] preceding a back vowel [ʌ], so the transcription as [nʲʌlˠ] is quite natural. The same also would hold in the opposite direction: tōl {tẹl} → [tˠʌlʲ] with an "implicit" offglide [ɛ̯].

However, this does not explain the {d j m p}-rule. In words like peḷ {peḷ}, a transcription as [pʲɛlˠ] implies that final [lˠ] triggers an [ʌ̯]-offglide, thus giving more weight to [ɛ] as the syllabic element. But then, back on-/off-glides as concomitant feature of velarized consonants are not uncommon: e.g. Irish naoi [nˠi:] and Arabic طين [tˁi:n]/[tˠi:n] have an audible back on-glide preceding [i:]; English bell [bɛɫ] usually comes with an [ʌ̯]-offglide, pretty much like in Marshallese peḷ.

Thus, the single vowel-transcription is not only more "digestable" for students of Marshallese and other people interested in this subject, but also quite natural. Eventually, the perceived difficulty of Marshallese phonology is —among other things—an artefact of a rather cumbersome transcription which had been upheld in WP for quite some time. Actually, vertical vowel systems are simple; it is the latitude of phonetic realizations (e.g. [ɛ] vs. [ɛ͡ʌ] vs. [ʌ]) which creates superficial complexity.

The crucial point now is: what are the rules for treating one part of the short diphthong as syllabic peak and the other part as concomitant on-/off-glide of the adjacent consonant? And if we can formulate rules (such as the {d j m p}-rule), what is the phonetic motivation behind them? –Austronesier (talk) 13:28, 12 November 2019 (UTC)

Well, first of all, you'll have to understand that this is statistical analysis of the established orthography in the MED, and thus to some degree OR on my part, just like the d j m p rule. I also cannot promise 100% accuracy, as I continue to refine it until it reflects orthographic spellings with the most regularly. (I wish orthographers would leave even more of their notes behind.) Even still, just because it is reflected in the orthography does not necessarily mean it is always reflected the same way in speech, because of possible differences in dialect, register, free variation, etc.—this is only going by regular patterns in the orthography. Anyway, assuming the first consonant is either a non-glide or {yi'y} and the second consonant is a non-glide...

Before a rounded consonant, [ɒ, ɔ, o, u] are preferred over [æ, ɛ, e, i, ɑ, ʌ, ɤ, ɯ].
After a rounded consonant, [u] is preferred over [i, ɯ].
After a rounded consonant and before a velarized one, [ɔ, o] are preferred over [ʌ, ɤ].
After a rounded consonant and before a palatalized one, [ɑ, ʌ, ɤ] are preferred over [æ, ɛ, e, ɒ, ɔ, o]. (The same rule Bender noted at the beginnings of words after /w/.)
Before a velarized consonant, [ɑ] is preferred over [æ, ɒ].
After a palatalized consonant and before a velarized one, [i] is preferred over [ɯ].
After a velarized consonant and before a palatalized one, [i] is preferred over [ɯ], again.
After [pʲ, tʲ, mʲ, rʲ] and before a velarized consonant, [ɛ, e] are preferred over [ʌ, ɤ]. (The d j m p rule.)
After a different palatalized consonant and before a velarized one, [ʌ, ɤ] are preferred over [ɛ, e].
After a velarized consonant and before a palatalized one, [ʌ, ɤ] are preferred over [ɛ, e].
After a velarized consonant and before a palatalized one, [ɑ] is preferred over [ɒ].

And there you have it, straight from my algorithm's section on radical simplification. I would prefer a reference that actually spells all this out, but lacking that, it is what it is. But now that this is known, it could actually be set up in a spelling chart. Something like...this:

↓V→	d j l m n p			b(w) k ļ m̧(w) ņ n̄ r t			k(w) ļ(w) ņ(w) n̄(w) r(w)
d j m p	ā	e	i	a	e	i	o̧	o	u
i l n	ā	e	i	a	ō	i	o̧	o	u
b(w) k ļ m̧(w) ņ n̄ r t	a	ō	i	a	ō	ū	o̧	o	u
k(w) ļ(w) ņ(w) n̄(w) r(w)	wa	wō	u	wa	o	u	o̧	o	u

And of course, a different set of rules are used if a vowel neighbors a glide /j, ɰ, w/, either at the beginning or end of a word or in vowel-glide-vowel sequences. - Gilgamesh (talk) 14:49, 12 November 2019 (UTC)

You know, I think I'm beginning to see why the Marshallese layman's tutorials by Bender (1969) and Rudiak-Gould (2004) had polar opposite advice when it came to Bender-style pronunciation guides, and it's not just because Bender designed it. Back in 1969, there was only the old orthography which was extremely inconsistent, and Bender's transcription was critical to providing a consultable phonemic guide for each word. But by 2004, the new orthography with its greater regularity had become a teaching aid, and that tutorial recommended sounding out each word as it is spelt and completely disregarding Bender's phonemic transcriptions in the MED. But both orthographies are still in use, though, so in any given public use of written Marshallese you're likely to see mostly one or mostly the other or a mishmash of both. - Gilgamesh (talk) 15:06, 12 November 2019 (UTC)

Actually, I thought of an even better way of implementing that table, using only a few representative letters.

↓V→	d j l m n p			b k ļ m̧ ņ n̄ r t			k(w) ļ ņ n̄ r
d j m p	dāj	dej	dij	dam̧	dem̧	dim̧	do̧n̄	don̄	dun̄
i l n	lāj	lej	lij	lam̧	lōm̧	lim̧	lo̧n̄	lon̄	lun̄
k ļ ņ n̄ r t	taj	tōj	tij	tam̧	tōm̧	tūm̧	to̧n̄	ton̄	tun̄
b m̧	baj	bōj	bwij	bam̧	bōm̧	būm̧	bo̧n̄	bon̄	bun̄
k(w) ļ ņ n̄ r	kwaj	kwōj	kuj	kwam̧	kom̧	kum̧	ko̧n̄	kon̄	kun̄

- Gilgamesh (talk) 15:21, 12 November 2019 (UTC)

Actually, I could expand this even a little further, to include GVG, GVC and CVG sequences occurring in isolation.

↓V→	d j l m n p			t			*k r			b k ļ m̧ ņ n̄*			k(w) ļ ņ n̄ r			ā e i			a ō ū			o̧ o u
d j m p	dāj	dej	dij	dat	det	dit	dar	der	dir	dam̧	dem̧	dim̧	do̧n̄	don̄	dun̄	dā	de	di	da	dō	dū	do̧	do	du
i l n	lāj	lej	lij	lat	lōt	lit	lar	lōr	lir	lam̧	lōm̧	lim̧	lo̧n̄	lon̄	lun̄	lā	le	li	la	lō	lū	lo̧	lo	lu
k ļ ņ n̄ r t	ņaj	ņōj	ņij	ņat	ņōt	ņūt	ņar	ņōr	ņūr	ņam̧	ņōm̧	ņūm̧	ņo̧n̄	ņon̄	ņun̄	ņā	ņe	ņi	ņa	ņō	ņū	ņo̧	ņo	ņu
b m̧	baj	bōj	bwij	bat	bōt	būt	bar	bōr	būr	bam̧	bōm̧	būm̧	bo̧n̄	bon̄	bun̄	bwā	bwe	bwi	ba	bō	bū	bo̧	bo	bu
k(w) ļ ņ n̄ r	kwaj	kwōj	kuj	kwat	kot	kut	kwar	kor	kur	kwam̧	kom̧	kum̧	ko̧n̄	kon̄	kun̄	kwā	kwe	kwi	kwa	kwō	kwū	ko̧	ko	ku
ā e i	āj	ej	ij	āt	et	it	ār	er	ir	eam̧	em̧	im̧	eo̧n̄	eon̄	iun̄	ā	e	i	ea	eō	iū	eo̧	eo	iu
a ō ū	aj	ōj	ūj	at	ōt	ūt	ar	ōr	ūr	am̧	ōm̧	ūm̧	*ao̧n̄	*ōon̄	*ūun̄	*aā	ōe	*ūi	a	ō	ū	*ao̧	*ōo	*ūu
(w) o u	waj	wōj	wūj	wat	wōt	ut	war	or	ur	wam̧	om̧	um̧	o̧n̄	on̄	un̄	wā	we	wi	wa	wō	wū	o̧	o	u

I've seen words begin with {yak} that is either āk or eak without a clear pattern.
The asterisked entries are ones I've never actually seen, and possibly may not even occur, but I extrapolated from the ōe seen in the unusual name Āne-jaōeōe.

- Gilgamesh (talk) 15:48, 12 November 2019 (UTC)

Okay, I figured out the āk vs. eak pattern; they are not alternate spellings, and the pattern seems to be real.

āk or ākk occurs if the consonant (single or geminated) occurs before a vowel within a word.
eak occurs in isolation or if the consonant comes before another consonant, whether the resulting consonant cluster causes epenthesis or not.

Here are all the relevant words I could find:

ākā {yakay}
ākiki {yakiykiy}
ākil {yakil}
ākilkil {yakilkil}
ākkil {yakkil}
ākūt {yakit}
eak {yak}
eakeak {yakyak} ([æɡæːk] {yak(a)yak}, because of epenthesis?)
eakļe {yakļȩy} ([æɡʌ̯lˠe] {yak(e)ļȩy}?)
eaklep {yaklȩp} ([æɡʌ̯lʲepʲ] {yak(e)lȩp}?)
eakpel {yakpȩl} ([æɡʌ̯pʲelʲ] {yak(e)pȩl}?)
eakto {yaktȩw} ([æɡʌ̯dˠo] {yak(e)tȩw}?)
Anomaly: eaktuwe {yakitiwey}, but the phonemic transcription may or may not be a typo. ...And it also appears that way in the 1976 edition (I just checked). Even if it is a typo, epenthesis would still occur at kt.

I'll have to write a new algorithm rule for this. I suspect of all these words, solitary eak [æ̯ɑk] is the only one pronounced with [ɑ] as the first dominant surface vowel, if indeed that is what we're dealing with. - Gilgamesh (talk) 16:28, 12 November 2019 (UTC)

Anyway, back to short vowels between non-glides. The impression I get from reading Choi (1992), is that even if there are possible circumstances where vowels neighboring glides may linger more towards a glide nucleus, vowels between two non-glides still smoothly transition from one vowel to the other in a truly equal diphthong. In that sense, the orthographic spellings preferring one or the other may truly be arbitrary, which is why until now my algorithm has kept treating them as tied diphthongs by default. Single vowel transcriptions for these diphthongs may be concise and convenient, but are not more accurate. Like I said, I would be tempted to go with the more concise convention, but I don't know if it's wise, so it's not a decision I would feel comfortable making unilaterally. - Gilgamesh (talk) 17:38, 12 November 2019 (UTC)

Since you've had such a supportive tone over the idea, I decided to change the module's default mode to show single vowel symbols between non-glides. And just for comparison, I enabled a feature in the sandboxed version of the module to display full diphthongs for all vowels. And the difference at the table at wikt:Module:mh-pronunc is...night and day. Just looking at all those tied diphthongs fills me with some stress, now. It must have been a similar stress that has filled bewildered readers of this article. The condensed transcription is most definitely friendlier to read, and the underlying short diphthongs can be explained in a section of the article rather than universally shown each and every time. - Gilgamesh (talk) 18:58, 13 November 2019 (UTC)

Also, to make the primary module's phonetic output still slightly more easy on the eyes, I simplified homorganic nasal-obstruent consonant clusters to only indicate their secondary articulation once, yielding [mbʲ mbˠ nzʲ ɳdˠ ŋɡ ŋɡʷ]. It's a slight improvement that required no original research. - Gilgamesh (talk) 19:25, 13 November 2019 (UTC)

One thing that gets me (and has for a long time), is the cluster [ɳˠdˠ] being homorganic. One's retroflex, the other's dental. The source is from Bender (1969):

an n-sound made with the tongue tip curled back to touch the roof of the mouth (no close equivalent in English); has a dark r-color

...and...

tongue touches the upper teeth instead of the gum ridge behind them; no puff of breath; more like English d singly between vowels, but like d t of had to when double

We know that this cluster is not just considered homorganic, but is an assimilation result (/nʲtˠ/ assimilates to it), and I would be very tempted to phonetically transcribe it [ndˠ], except that we can't assume the nasal isn't still retroflex in this position. In fact, I hadn't known the nasal was retroflex at all until I read Bender (1969). Since that got me paying attention, Rudiak-Gould (2004) also seems to agree:

like English n, but with the tongue pulled back and raised at the back of the mouth, giving it a 'darker' sound

But Rudiak-Gould (2004) says something else about ⟨t⟩ that makes me wonder, and appears to contradict Bender (1969):

at the beginning or end of a word, or when there are two t's in a row, like English t, but with the tongue pulled back and raised at the back of the mouth, giving it a 'darker' sound; everywhere else, like d but with the tongue as described above

Is it possible both consonants' surface realizations are retroflex after all? [ɳˠ] and [ʈˠ~ɖˠ]? - Gilgamesh (talk) 01:27, 14 November 2019 (UTC)

These tables are absolutely great. Fantastic job! I have taken a look at the two transcription systems in the module: [nʲæ͡ɑɑ̯͡æ̯nʲæ͡ɑ] and [kɑdˠʌ̯ɡɯ͡izʲi̯͡ɯ̯mˠɯ͡uu͡itʲ] vs. [nʲɑːnʲɑ] and [kɑdˠʌ̯ɡizʲi̯mˠuːtʲ] is indeed like night and day. I think we have to be very aware of the impact we WP-ians can have on the public perception of a certain topic. Many (not all) of the aforementioned reddit/youtube language buffs rely on info gleaned from WP, so WP involuntarily but inevitably has contributed to a reputation of Marshallese as being complex/weird/etc. not because of the complexity described in Bender (1968) or Choi (1992), but because of the "visual" complexity of transcriptions like [ɛ̯ɛzʲ e̯e͡ɤdˠ ɑ̯ɑ͡æmʲ mʲe͡ou͡ɯrˠ] in WP/Wiktionary.

Going into detail: Yes, I agree that "single vowel transcriptions for these diphthongs may be concise and convenient, but are not more accurate". But tied short diphthongs eventually also represent just a discrete approximation of the actual range of phonetic realizations. Listening to the Online Bible, I am still amazed by the latitude of the perceived backness with the same word: e.g. for ningning I hear everything from [i] to [ɨ] to [ɯ] (but hardly [i͡ɯ]). Given this latitude, a phonetic transcription leaning towards the spelling convention of the MED (in spite of the inconsistencies that become apparent in the table, like lam̧/lōm̧/lim̧ ) is a legitimate choice, and an ideal way to avoid OR, since any alternative approach for a phonetic transcription would potentially lead to choices not directly supported by our sources.

Btw, your table shows an interesting constraint: there is no contrast between /yVy/ and /hVy/, and between /wVw/ and /hVw/ (except for onomatopoeic ōe).

The cluster {ņt} is clearly homorganic. In the MOD, both are desrcibed as "heavy dental". My guess that both are apical, maybe alveolar, but not "retroflex". I recommend to write [nˠ] for ņ. –Austronesier (talk) 12:38, 14 November 2019 (UTC)

So, orthographically-informed short vowels it is.

And yes, Marshallese spelling tends to have the highest level of bias for a ō i in spellings for conflicting palatalized vs. velarized secondary articulations. i was also by far the more common spelling for [ɯ] in the old orthography before ū became more methodically applied for its monophthongal allophone (old Kirijmōj vs. new Kūrijm̧ōj). One would almost have the impression that years of Marshallese orthographers have seen the back unrounded allophones are the "most" default representations, as seemingly agreed by Bender's 1968 characterization of the allophones as moving from a central position towards the front or towards rounding in his vowel table (page 20 diagram)...until his 1969 tutorial a year later treated the front allophones as the relaxed articulations. And using more centrally-situated vowel symbols fits with the general tradition of describing vertical vowel system vowel phonemes using central vowel symbols, as Choi (1992) did and which is the norm when phonemically describing Northwest Caucasian languages. I generally trust Bender's characterizations, and I generally trust that his later characterizations are more authoritative than his earlier ones, but I admit I feel a certain lasting comfort zone in using neutral-appearing central vowel symbols for any vertical vowel phonemes. And when those symbols are completely different from all the allophone symbols, at least then it's easy to quickly tell which transcription mode (phonemic vs. phonetic) is being used. If future shifts in consensus ever drove us to readopt such a system in the future, I'd probably recommend /ɐ ə ɘ ɨ/ resembling Choi's (1992) transcription, with /ɘ/ being chosen for the phoneme between /ə/ and /ɨ/ that Choi completely disregarded, simply because /ɘ/ is the one canonical vowel symbol directly between /ə/ and /ɨ/ on the IPA vowel trapezium . But now I'm rambling on with my musings, and I'll get back to the topic at hand.

Not retroflex, eh? I know much of this has been an exercise of adopting your recommendations, but I find it more liberating than having to singularly own each and every editorial decision. Okay, so...not retroflex, I can go with that. So what, in your opinion, does the "dark-r colored" description mean? Free variation? Dialectal variation? Mischaracterization? A typo? If we can agree it is not retroflex, then the transcription discrepency vanishes. And yet my obsessive-compulsive nature still can't help but wonder what Bender meant in that brief description 50 years ago. - Gilgamesh (talk) 13:57, 14 November 2019 (UTC)

And actually, I was wrong on one front. The {hVCʷ} sequences, which I theorized might be spelt *ao̧n̄ *ōon̄ *ūun̄ in isolation, actually can and do occur (at least some of them), but just not in isolation. For example, akwāāl {hakʷayal}. As it is currently written, my algorithm would phonetically render *ao̧n̄ as [ɑkʷ] anyway, and akwāāl as [ɑɡʷæːlʲ]. The algorithm currently gives a neighboring {h} full priority to influence a surface vowel, whether on the vowel's left or right, which generally guarantees {h} can never be squeezed into any kind of surface glide that would have to be phonemically-contrastive, considering such a thing is said not to happen for {h} (at least not in the same way it can for {y} or {w}). Āne-jaōeōe is currently given a phonetic transcription of [ænʲeːzʲɑɤe̯ɤːe̯], and is the only Marshallese word on Wikipedia as of this writing whose phonetic transcription ends with an explicit surface glide, and that's both to preserve the evidence of both of the name's {h} phonemes while preventing them from becoming surface glides. Just goes to show that just because a phonemic combination is theoretically possible, that doesn't mean it typically occurs in practice. - Gilgamesh (talk) 14:21, 14 November 2019 (UTC)

I think the module is...mostly ready for daylight on Wikipedia. I mean, I'd have to copy it here and edit relevant templates to make use of it, but it may just be ready.

I made one last tweak. Though the surface height of epenthetic vowels between non-glides is not phonemically significant, I was never that confident interpreting the references to mean that the vowel's height was transitional between the heights of the two nearest vowels. But it is at least clear that the height is predictable from the two nearest vowels, if not necessarily transitional. I decided to read Bender's (1968) notes on excrescence, and I observed the patterns in the epenthetic vowels he gave words. In all cases, the vowel height appeared be the max() of the neighboring vowel heights, and not more open than [ɛ ʌ ɔ]. The evidence in question are these words (adjusting Bender's phonetic transcription to the module's):

{m̧m̧anm̧ȩn} [mˠʌ̯mˠɑnʲɤ̯mˠɤnʲ]
{jerbal} [tʲɛrˠʌ̯bˠɑlʲ]
{dapilpil} [rʲæbʲilʲi̯bʲilʲ]
{najnȩj} [nʲæzʲe̯nʲetʲ]
{m̧akm̧ȩk} [mˠɑɡɤ̯mˠɤk]
{kitagtag} [kɯdˠɑŋʌ̯dˠɑŋ]

- So, where each vowel has a vowel height (F1) from 1 to 4, the epenthetic vowel's height between two non-glides is:

max(F1[leftVowel], F1[rightVowel], 2)

Do you think that is a sound assessment? - Gilgamesh (talk) 15:14, 14 November 2019 (UTC)

It's not really that I am sure that ņ is not retroflex, but it's just that I'm not sure whether it is. The truth must lie somewhere at or between 1969's retroflex and the MED's dental. And when resorting to OR again, I just don't hear N's which would be as distinctly retroflex as let's say in Swedish (barn [bɑːɳ]). So basically, I'd recommend neutral [n] to stay on the safe side.

For your comfort about the representation of the vowel phonemes: even in NW Caucasian descriptive phonology, plain /a/ is conventionally used, which is strictly speaking not central. The wider the range of phonetic realization is, the more arbitrary the choice of a symbol becomes (← it this syntactically correct? You know I am en:L2...). In Japanese e.g., the final nasal is usally spelled /N/. So just think of our vowels here as archiphonemes with no default realization. –Austronesier (talk) 17:05, 14 November 2019 (UTC)

True, strictly-speaking, cardinal [a] is not central, but because almost no languages don't distinguish an open front unrounded vowel from an open central unrounded vowel, the symbol is often used for either one, as with the /a ɜ ɘ ɨ/ system this article was going about using before your involvement. Three times in the IPA's history, [ᴀ] was proposed as the canonical symbol for the open central unrounded vowel, and it was rejected all three times, with the most recent rationale that [a] is unambiguously sufficient enough to fill that purpose when needed. I only just recently read that the Hamont dialect of the Limburgish language is said to be relatively exotic in having a three-way distinction between front, central and back open unrounded vowels, and it uses [a] for the central open unrounded phoneme, but [æ] for the front open unrounded phoneme (even though that symbol's cardinal use is as a near-open vowel). Basically, if an open central unrounded vowel symbol is needed, and it has to be more open than [ɐ], then [a] usually fits the bill, though [ɑ] can as well (technically any one of [ä ɐ̞ ɑ̈], but whatever symbol is selected often goes unmarked). Some of the Northwest Caucasian languages use /a ə/ as symbols for vowel phonemes, but Adyghe uses /a ɜ ə/, and it becomes clear that all three symbols are intended to appear neutral. And in an interplay of [æ ɑ ɒ] symbols like in the MED's Marshallese, /a/ by contrast appears more neutral-looking than the others. So yes, in vertical vowel system phonemics, it actually may be relatively reasonable to use /a/ alongside any of /ɐ ɜ ə ɘ ᵻ ɨ/ (as required) as a representative central symbol for a horizontally-neutral vertical vowel phoneme. (In any event, in the future, if such an approach was used, I would still recommend /ɐ/ instead.)

If I were structuring it methodically, I would probably say "The wider the range the phonetic realization is, the more arbitrary the choice of symbol becomes." But your wording worked fine, too, and I didn't notice anything syntactically amiss until you pointed it out, and I doubt most native English speakers would have noticed or complained, either. English already has a history of prescriptionist vs. descriptionist approaches to the language, and a prescriptionist approach will often reject uses that don't fit a perceived prestige use, even if a rejected use is also indigenous. Since I lean towards the descriptionist camp, there's a wide variety of possible English grammatical and lexical constructs I would be willing to accept just the way they are, rather than trying to change them to something more polished. Ultimately, in studying linguistics, my interest includes studying languages as they actually exist, wherever and however they exist. I will find some languages and language forms more fascinating than others, but I try not to be in the business of saying one form deserves to exist and the other does not. Though I am extremely guilty of occasionally correcting people's mispronunciations in certain areas (Marshallese place names, Gaelic mythology names, or a few stray words—"bestiary is /ˈbɛstiəri/, /ˈbɛstiˌɛri/ or /ˈbiːstiˌɛri/, not /bɛsˈtaɪəri/").

So, if I had to guess, you are from Indonesia? Even then, it has one official language but many unofficial ones, so it would be a stab in the dark trying to guess what your native language is. Most of the Indonesians I know online are artists who mainly use English in online art communities.

I was already aware of the Japanese final moraic nasal being transcribed /N/, and the gemination mora as /Q/. Using fully uppercase levels as cardinal phonemic symbols seems unideal (I generally prefer to use them for variables, like /CVG/), and I already know that /N Q/ are often realized as [ɴ ʔ] in isolation, so in a purely phonemic transcription, I would probably be inclined to render them /ɴ ꞯ/. (And strangely 'ꞯ', the Unicode small capital Q, does not render in any of my fonts...? Apparently this particular Unicode character was assigned in 2018, so maybe it's too new for the fonts I have.) - Gilgamesh (talk) 19:07, 14 November 2019 (UTC)

Well, good guess, and close to the mark (long-time L2 in Indonesian—standard plus two regional varieties—and one regional language of Indonesia; partial descendant of a neighboring AN speaking country). But this pioneer description (Hernsheim 1880) of Marshallese in written in my mother tongue. The final vowel of the IPA in my user page is iconic for my local dialect.

There is no full coverage of the quality of epenthetic vowels in the lit, but they are definitely there (see how Hernsheim (p.19) deplores the "missionary" spelling for not wrtiting out Marshallese epenthetic vowels, which according to him creates a "hardness alien to it [=Marshallese])". Your algorithm correctly predicts all examples in Bender (1968), so that's fine for want of a more comprehensive treatment in our sources. The explanation in this WP article is not too specific ("In most situations, the vowel height of an epenthetic vowel is transitional between the two nearest vowels"), but that's actually good so it remains well-sourced.

Another thing: I'd fully support to revive the page "Marshallese phonology", ideally moving it to "Marshallese phonology and orthography". We could then place the full text of the current sections 4 to 6 in that article (with an approriate introduction that outlines the history of research), and leave hatnotes and condensed summaries in this article. –Austronesier (talk) 10:08, 15 November 2019 (UTC)

I'm afraid I can't understand much German, but I recognize enough to analyze Südhessischer as "South Hessian." The term "Hessian" is well-established in English since revolutionary American times if, at least for no other reason, because of the Hessian mercenaries who fought for the British crown, and many of whom later permanently settled in the early United States, in places like New Jersey. (If I'm not mistaken with my mental notes.)

How ironic that Hernsheim analyzed and deplored the orthographic practice that Bender would later also analyze and specifically reinforce as orthographically necessary.

As for a separate phonology article, this article is already pretty long, and I originally moved the phonology section to its own article for that reason. But someone else later moved it back into the article and turned the phonology page into a sectional redirect. I didn't get the impression it was an issue of the article itself not getting rather long, but of the language itself not being considered notable enough to justify its own separate phonology page. But there are other languages with a low number of speakers and scarce online use that still have their own separate phonology pages, like Icelandic phonology. I might support the re-separation of the Marshallese phonology (and perhaps orthography) sections at a later point if there is broad editorial consensus allowing it, but for now I want to focus on revamping the article itself to a better overall state of quality. - Gilgamesh (talk) 10:59, 15 November 2019 (UTC)

One observation: whenever short vowels are written as tied diphthongs, long vowels also fall back into the "pluriphthong" transcription: [mˠɑːzʲɛlˠ ~ mˠɑɑ̯͡æ̯zʲɛ͡ʌlˠ ~ mˠɑːzʲʌlˠ]. Based on Bender (1968) and especially Choi (1992), I'd recommend to leave long vowels unchanged regardless of the treatment of short vowels: [mˠɑːzʲɛlˠ ~ mˠɑːzʲɛ͡ʌlˠ ~ mˠɑːzʲʌlˠ]. –Austronesier (talk) 13:16, 15 November 2019 (UTC)

Will do. - Gilgamesh (talk) 15:28, 15 November 2019 (UTC)

All the informed checking of my edits has been extremely, extremely helpful, both from you on the linguistic side and from User:Erutuon. Obviously there is still more that needs to be done besides just cleaning up some IPA and templating. I guess some of those orthography charts would be nice to adapt, too. For now, I'm personally trying to focus on just a few things at a time so I don't tempt my scatterbrain with too many simultaneous targets. - Gilgamesh (talk) 15:59, 15 November 2019 (UTC)

I expanded the section on Bender's orthography. I've also entered that sort of information zombie regurgitation mode, where I'm writing new sections but not leaving references (though easily referenceable) or even really thinking in layman's terms. My edits to this article from years ago used to suffer heavily from this before the occasional other editor tried smoothing them over. But in this case today, I do recall that most or all of the information concerning special sequences in Bender's orthography came from Bender (1969), yet ask me for page numbers and my brain will spontaneously take a nap. - Gilgamesh (talk) 18:21, 15 November 2019 (UTC)

Okay, took a break, ate something, and I realize I really should have taken my own earlier advice not to think in too many different directions at once. I'm not entirely sure how much of my edits in the Bender's orthography section can be salvaged or reworked (some of it, at least), and for now I really should focus on IPA and templating as I originally intended. - Gilgamesh (talk) 19:09, 15 November 2019 (UTC)

I have a dilemma involving the phonetic expansion of {niwiyawak} Nuwio̧o̧k 'New York' vs. {jȩw(ȩ)yiw} joiu 'soy sauce' vs. {m̧aļew(e)yeļap} M̧aļoeļap 'Maloelap atoll'. Currently, the default rule for vowel-glide-vowel-glide sequences is that the second vowel assumes the second glide's secondary articulation. Nuwio̧o̧k neatly encapulsates this: [nʲuwiɒːk]. Of course, there are exceptions even to this rule. A /jVCʷV/ sequence (where the first vowel is rounded and second vowel is not rounded) has the first vowel becoming a front vowel, for words like {yekʷey} ekwe [ɛɡʷɛ] 'okay' and {yanȩy(ȩ)way(a)tak} Ānewātak [ænʲeːwæːdˠɑk] 'Enewetak atoll', so that they don't instead become [ɛ̯ɔɡʷɛ] or [ænʲeowæːdˠɑk]. For joiu and M̧aļoeļap, the default rules yield [tʲowei̯u] and [mˠɑlˠɔwɛːlˠɑpʲ], but I'm not confident about these results. Intuitively (though with some admitted naivete), I would expect [tʲoːi̯u] and [mˠɑlˠɔː(ɛ̯)ɛlˠɑpʲ]. So I crafted a tentative rule where /VwVjV/ sequences (where the first vowel is round and the second vowel is front) have the second vowel become round. This changed the behavior of joiu and M̧aļoeļap as expected, but it changed Nuwio̧o̧k to [nʲuːi̯ɒːk], which is incongruent with the spelling—if it were actually [nʲuːi̯ɒːk], I'd expect a spelling of Nuuio̧o̧k...right...? If [nʲuwiɒːk] is correct to begin with, it makes me suspect [tʲowei̯u] and [mˠɑlˠɔwɛːlˠɑpʲ] may have always been correct and did not need an exceptional rule. The rules have served practically every Marshallese word well, given only the phonemes, but my confidence in the results declines the more exceptions to the rules are involved, because I increasingly craft the rules to suit the a minority of words I can't easily find in audio. - Gilgamesh (talk) 01:49, 17 November 2019 (UTC)

I think I may have solved part of the problem. According to the Naan (good luck trying to browse that beast of a .docx file), words like Jālwōj {jalwȩj} and Jālooj {jalȩwȩj} are not strictly homophones after all, at least in careful speech. The pronunciation it prescribes for Jālwōj is a short vowel syllable, then a full epenthetic vowel, then another short vowel syllable, closer to something like [tʲælʲo̯wɤtʲ] or [tʲælʲu̯wɤtʲ] than to [tʲælʲoːtʲ]. It seems that, per Bender (1968), an epenthetic vowel in this circumstance does indeed produce a long vowel in many words, like kaarar {kaharhar} becoming [kɑːrˠɑːrˠ] as if it were {kaharahar}. But in other circumstances, as with Jālwōj, I can only guess that a certain "reluctance" of the epenthetic vowel to form part of a long vowel allows the following glide and vowel to take on a reflex associated with a word-initial environment, allowing a full [w] to congeal. The epenthetic vowel is still there, and still non-syllabic, but directly neighbors a glide that has fully surfaced. The implication that epenthetic vowels neighboring glides can behave differently from full vowels neighboring glides, means that {niwiyawak} Nuwio̧o̧k actually may be in a different class from {jȩw(ȩ)yiw} joiu and {m̧aļew(e)yeļap} M̧aļoeļap, and if I treat them as different cases, I can avoid logic rippling between one class of words and the other. In finding out all these things, I've tried to rewrite portions of the module's algorithm, only to run into some murky bugs. I can probably still find a way to implement this behavior, but it may take some time.

This has also made me consider how strange notation like [tʲælʲo̯wɤtʲ] or [tʲælʲu̯wɤtʲ] can be because of sequences like [lʲo̯w] or [lʲu̯w], which are basically equivalent to [lʲo̯u̯] or [lʲu̯u̯], but the first vocalic entity is more of a non-syllabic vowel than a glide, and the [u̯] is more of a glide than a non-syllabic vowel. Yet this distinction feels artificial, so this condition seems dubious to me. (Am I right to doubt this?) A non-syllabic vowel between consonants feels like a different creature from a semi-vowel neighboring vowels, yet in many cases they are notated identically in IPA. And when a non-syllabic vowel actually neighbors a semi-vowel, the logic appears to break down. (Doesn't it?) So even if I can find a way to notate this ([tʲælʲo̯wɤtʲ] or [tʲælʲu̯wɤtʲ] may be perfectly adequate), how is a sequence like this even realistically stable, in any language? Of course, I could be overthinking it all. At this point, there is only one thing I can say for certain: I have a headache. - Gilgamesh (talk) 00:46, 18 November 2019 (UTC)

Epenthetic vowels appear between non-syllabic segments, so it is natural from a purely phonetic viewpont to treat them as weak, but yet fully syllabic vowels. The IPA extra-short breve is actually more appropriate here than the inverted breve: ([tʲælʲɔ̆wɤtʲ]. Unfortunately, our sources are not consistent here: Bender (1968) writes them with an inverted breve, while Willson (2003) spells them out without any diacritic. –Austronesier (talk) 11:01, 18 November 2019 (UTC)

I can see why Bender notated them as non-syllabic, though—to emphasize that they can never affect rhythm or receive stress. Naan also sidesteps the issue entirely by notating them all as [ə], a symbol it does not use for any other vowel.

On another matter, I've seen evidence of generational inconsistencies that may or may not render a lot of critical research by Bender (1968) and the MED (1976) to be somewhat outdated.

Ss a mini-dictionary and teaching aid, Naan seems to gloss over many phonological details by seemingly discouraging phonetic analyses that go much deeper than the orthography, an approach similar to Rudiak-Gould (2014) who categorically discouraged any use of Bender's phonemes as a reference. Except for [k kʷ ŋ ŋʷ w], Naan does not recognize a distinction between velarized and rounded consonants, treating them as phonetic sequences of velarized consonants involving rounded vowels or [w] and treating the rest as unimportant. So while Naan has been somewhat useful as a source of additional vocabulary examples not found in the MED or MOD, and it occasionally does acknowledge the MED as another reference, it is ambiguous enough that some of its examples I can't easily convert to Bender values. I mean, it wasn't difficult to determine that O̧kōnjo̧ 'Arkansas' is {wawkenjaw}, or that Nujiiļōn 'New Zealand' is {niwjiyiļen}. But Tubaļu 'Tuvalu' is a bit more opaque, as it could be {tiwbaļiw}, {tiwbahļiw} or {tiwbahļʷiw}—I don't always feel comfortable just guessing and picking one as that introduces unnecessary OR. There's also prescribed epenthetic vowels in places where Bender (1968) and H. Willson (2003) described regressive assimilations, like {jt} becoming [zʲədˠ] instead of [tˠː].

I know that language change in the half century since Bender's influential analysis is a natural and likely occurrence, as the ways younger generations analyze their language has inevitable differences from the way older generations expressed it. Naan seems to evidence the possibility that spelling pronunciations (based on the new orthography) may be more common than before, and that educated pronunciations introduce more epenthetic vowels where they previously did not occur. Of course, this is also largely supposition on my part—Naan was written by Nik Willson, who, unlike Bender, Choi or Heather Willson (no relation?), is not describing Marshallese the way it is (or at least the way it was) on a phonological level in the same manner as linguistic papers, but like Rudiak-Gould focuses more on teaching it on a purely phonetic level. Their scope and emphasis seems to affect both their published materials in ways that limit their usefulness. Linguists' analyses, even as recently as Ng (2017), still reference a lot of Bender (1968), and for the most part they reinforce him, so it would not seem safe, from a Wikipedia referencing standpoint, to cast much doubt on Bender's central conclusions. But Rudiak-Gould (2004) and Naan (2014) somewhat trouble me as they do seem to contradict, eschew or deliberately ignore a lot of Bender's data (including the entire vertical vowel system Bender himself had used to drastically uncomplicate earlier linguists' grammatical analyses), and what Rudiak-Gould and Naan describe could reflect a somewhat changed, more modern Marshallese language, but these sources leave out enough details as to make it difficult to construct an adequately updated linguistic model of the language. - Gilgamesh (talk) 15:05, 18 November 2019 (UTC)

I find myself toying with phonetic transcriptions for the phoneme /tʲ/. Currently the article uses [tʲ] for voiceless and [zʲ] for voiced. Different sources have also described these allophones as being alveolopalatal ([ɕ] per Choi (1992), [t͡ɕ] per Naan (2014)), or straight-up like English SH/ZH (Rudiak-Gould (2004), who doesn't use IPA). I've also seen YouTube Marshallese tutorial videos describe it as being Z. When I listen to Marshallese speech, the reality sounds a lot more nuanced: It doesn't sound like a full-on alveolopalatal, like Standard Polish Ć/DŹ/Ś/Ź or Standard Mandarin J/Q/X, but more of a palatalized alveolar. And the voiceless allophone I hear is usually neither pure plosive [tʲ] nor full affricate [t͡sʲ], but between these two as a plosive with a sibilant release [tˢʲ]. The (partially) voiced allophone is often indeed [zʲ], and I hear this a lot among female speech samples, but most male speech samples appear to be saying [dᶻʲ]—this difference is suggested to be an indicator of "feminine" vs. "masculine" Marshallese speech, though I won't pretend I'm certain the difference is that simple. In any event, considering the strong association of /tʲ/ with loaned sibilant consonants and sibilant realizations in Marshallese, a part of me thinks it would be more intuitive to indicate it with some kind of sibilant-related symbol, which bare [tʲ] is not. Problem is, there isn't much published precedent for notating it [tˢʲ], or for that matter even [t͡sʲ], and while there is precedent for transcribing it [t͡ɕ], that symbol appears to overshoot the target. Ultimately it may not matter whether it's notated as alveolar, alveolopalatal or generic postalveolar—not only because Marshallese does not make that distinction, but also that when I was growing up, it was common for us as English speakers, when pronouncing Marshallese names and terms, to pronounce J as English /d͡ʒ/ and non-initial J in less well-assimilated loanwords as English /ʒ/ (like in iroij /ˈiːrɔɪʒ/ from irooj {yirewej} [irˠɔːtʲ]), overall implying a primary perception (whether heard or learnt) that J was postalveolar, as in English. It was also said anecdotally that Marshallese speakers pronounced Kwajalein (English /ˈkwɑːdʒəlɪn/) as "Kwadalein," though in retrospect this must have been referring to [kʷuwɑdʲɛ̯lʲɛːnʲ] in those speech registers that use a sound closer to [dʲ] than to [zʲ]. - Gilgamesh (talk) 16:18, 19 November 2019 (UTC)

I know it's been a little while since I last spoke here, but I've spent every day working on this, mostly at Wiktionary. I want wikt:Module:mh-pronunc in a more efficient condition before I consider adapting it for use on Wikipedia. Worst case scenario would be, it's used in several template embedments on an article like Kwajalein Atoll and it noticeably lags page refresh and unnecessarily taxes the servers. I want to prevent that from being an issue, so I'm having to rewrite most of it. - Gilgamesh (talk) 16:43, 29 November 2019 (UTC)

I finally reached a significant milestone in rewriting wikt:Module:mh-pronunc, and it runs much much faster than it did before, and may now be ready to adapt for deployment in Wikipedia articles with a large number of Marshallese term headings, such as Kwajalein Atoll, without significantly slowing down article load times. - Gilgamesh (talk)