The origin and development of Chinese tones

by Ash Henson

Note: this is an exploration of where tones come from in Chinese.

What you will learn here:

Where tones come from in general
  • The tonal relationship between Hindi and Punjabi
  • Where tones come from in Chinese

  • How Mandarin tones are related to Middle Chinese tones
  • The relationship between the tones in Mandarin and the tones in Cantonese
  • The modern theory of tonogenesis (how tones originated)

One of the things that drew me to Chinese was tones. At the time, I was studying engineering and planning on learning Japanese, but the idea of a language where the meaning changes when the tone changes was deliciously intriguing. I have since learned that Japanese has pitch accent, which is actually even more complicated than tones, but that's a story for another day.

Where do tones come from?

It's commonly assumed that Chinese has always been tonal, but new data has come to light. Linguists have discovered that tones arise from simplification of the syllable. What that means is, when a syllable loses one or more of its sounds, one of the things that can happen is the birth of a tone. Now, I'm going to use some linguistic terms here, but if that isn't your thing, fear not. Each one will be explained. You can still get the gist even if you don't know what they are, so it shouldn't be a problem even if you ignore the footnotes.

It is a little known fact in most social circles that consonants (= sounds like b, g, t, etc. Not like a, e, i, etc.) at the beginning of a syllable (what linguists call onsets) and ones that come after the main vowel (a.k.a., post-nucleus consonants) can leave behind a difference in pitch on their way out the door.

Take the case of Hindi and Punjabi:

Hindi words with a voiced, aspirated consonant1 correspond to Punjabi words that are unvoiced, unaspirated, but have a tone. We'll come back to this in a second, but first, let's review the concepts of aspiration and voicing.

In Mandarin, t-, p-, k-, ch-, q- are aspirated, while their partners d-, b-, g-, zh-, j- are not. If you say those out loud, you should hear the difference, and the difference is a puff of air. So, t- and d- are basically the same sound, but t- has a puff of air, and d- does not. Same for: 

  • p- vs. b-
  • k- vs. g-
  • ch- vs. zh-
  • q- vs. j-

Linguists commonly use h to denote aspiration. It's like the little puff of air that comes out when you say t- is an h trying to escape. I used to think this was just a convention, but in Taiwanese Mandarin, people often delete t's. Like:
 我要把它放那裡 (我要把它放那里). Wǒ yào bǎ hā fàng nàlǐ. “I want to put it over there.”

The standard pronunciation of 它 is (note: t- is aspirated), but in quick speech, it often becomes . So, even when the t- is deleted, its aspiration (represented by the h) remains.

To understand voicing, let's look to English for some examples. The th in "this" is voiced, but the th in "thought" is not. If you aren't a native speaker of English, use to listen to some examples.

Going back to our Hindi & Punjabi example:

# Hindi Punjabi Meaning
1 rāhī rāī (H) passenger
2 lābh lāp (H) profit
3 ghoṛā kòṛā (L) horse
4 ghar kàr (L) house
5 dhol tòl (L) drum


If you don't like looking at phonetic symbols, just skip to the next paragraph (though I recommend that you tough it out). Starting with word #2: The Hindi word lābh corresponds to lāp in Punjabi, which loses the h (i.e., loses the aspiration) and voiced b becomes unvoiced p.

The first word just loses its h.

Note the (H) next to those first two words. That indicates a high tone.

In word #3, the h is lost, and voiced g becomes unvoiced k.

In word #4, the gk and h → Ø (i.e., h disappears).

In word #5, the dt, and h → Ø.

The last three words have a low tone, marked with an (L).

What does all this mean?

Well, first of all, note that Punjabi isn't derived from Hindi. Rather, the two share a common parent, but changed differently over time. And, the simplification of the syllable (read: the syllable's losing of sounds) in Punjabi resulted in tones. If the loss was after the main vowel, it resulted in a high tone (H); if before, a low tone (L).

Using terminology for Chinese, we can see that initials might influence tones, and endings might influence tones as well.

That example comes from Hans Henrich Hock's 1999 book Principles of Historical Linguistics, which is an excellent book if you're interested in historical linguistics.

Where do Chinese tones come from?

middle Chinese tone hand
An old illustration of the four Middle Chinese tone
categories—they were traditionally represented on a hand!


So now we've looked at the example of how tones can arise in a non-Chinese language—and please note that the example above is not an isolated example. Tibetan has dialects that are tonal (Lhasa) and those that are not (Amdo). And as we'll learn shortly, Vietnamese is tonal, while some other Mon-Khmer languages related to it are not.

Tones are first mentioned in ancient Chinese literature by Shěn Yuē (沈約/沈约; 441-513 ad) and Zhōu Yóng (周顒/周颙; lived during the 6th century ad). Their interest in tones was related to poetry. It was at this time that tones began play a systematic role in poetry. Eventually, as these rules grew more and more defined, they resulted in what is known as Lǜshī (律詩/律诗) or “Regulated Poetry.”

Shěn Yuē and Zhōu Yóng also coined the names for Middle Chinese (MC) tones:

  • píngshēng (平聲/平声)
  • shǎngshēng (上聲/上声, and yes, 上 is pronounced in the 3rd tone here!)
  • qùshēng (去聲/去声)
  • rùshēng (入聲/入声)

These are commonly referred to as píng (平), shǎng (上), (去), (入).

The names these tones usually get in English are: level, rising, departing, and entering, respectively.

It is thought that the names of the tones is reflective of how they sounded, which makes a lot of sense, but it would be basically impossible to recreate Middle Chinese tones based on their names alone.

How Mandarin tones are related to Middle Chinese tones

Note that the Middle Chinese tones do not map to Mandarin one-to-one.

  1. The first tone in Mandarin comes from píngshēng words whose initials were unvoiced (once again, we see that the voicing of an initial affects tones!).
  2. The second tone in Mandarin comes from píngshēng words whose initials were voiced.

You may see the first tone in Chinese referred to as yīnpíng (陰平/阴平) and the second tone as yángpíng (陽平/阳平). So, to state that clearly: Mandarin tones 1 and 2 come from the same tone in Middle Chinese.

  1. The third tone in Mandarin comes from shǎngshēng words whose Middle Chinese initials weren't “completely voiced.”
  2. The fourth tone in Mandarin comes from qùshēng words AND from shǎngshēng words whose Middle Chinese initials were “completely voiced.”

So what happened to rùshēng?

Rùshēng still exists in a lot of southern dialects, like Cantonese, Hakka, Minnan, etc. A rùshēng syllable in Middle Chinese is one that ends in -p, -t, or -k. Modern dialects also have a glottal stop (which comes from an earlier -p, -t, or -k).

Another interesting aspect of this is that different Mandarin dialects (i.e., northern dialects) have different tones. In other words, they evolved differently from standard Mandarin. One of the funniest ones to me personally is that in Shāndōnghuà (山東話/山东话), the first and third tones are switched from standard Mandarin. So, if they say 我是山東人 (我是山东人), they pronounce it like Wō shì Shándǒngrén (instead of Wǒ shì Shāndōngrén).

Practical usage #1

Yes! There are practical things you can do with this knowledge!

For instance, since we know that píngshēng split based upon the voicing of the Middle Chinese initial, we can surmise that there will be very few 1st tones in Mandarin with a voiced initial: l-, n-, m-, r- (technically, these are sonorants, but we'll leave that for now).

We can do a quick test to see if our expectation is correct. Let's look at the number of characters ending in -īng vs. -íng and see if it holds:

nīng 0 vs. níng 39
mīng 1 vs. míng 32
līng 5 vs. líng 202
jīng 74 vs. jíng 0
qīng 29 vs. qíng 26
xīng 39 vs. xíng 40
tīng 24 vs. tíng 42
dīng 16 vs. díng 1
pīng 16 vs. píng 60
bīng 16 vs. bíng 0

Reading from the top, there are zero characters with nīng as a reading, and 39 that are read níng. Mīng has one character, while míng has 32. Līng has 5, while líng has 202.

So, it holds!

What this means is, if you're ever having a hard time remembering if a word starting in l-, n-, m-, r- has a first or second tone, then guess second tone!

The rest of the results are pretty evenly spread, except for jíng, díng, and bíng. From what we've learned so far, we know that these j-, d-, b- evidently come from an unvoiced initial in Middle Chinese (and knowing that information is one road to being the life of the party!).

Practical usage #2, or How Mandarin and Cantonese tones are related

Another way to use this information practically has to do with learning another Chinese dialect. For instance, if you know the Mandarin tone for a character, then you can narrow the possible Cantonese tones down to two. If you know the Cantonese tone, then you'll know what the Mandarin tone is. Let's look at some examples:

Cantonese has two rising tones, one low rising (the Canto 5th tone), and one high rising (the Canto 2nd tone). These correspond to Middle Chinese shǎngshēng (上聲/上声), with the low rising tone coming from Middle Chinese words with voiced initials, and the high rising words coming from the unvoiced initials.

If you know Cantonese 我 ngo5, then you can be fairly certain that the Mandarin tone will be the 3rd tone, and indeed it's . Same goes for Cantonese words spoken in the high rising tone, like 等 dang2. It also corresponds to the Mandarin third tone: děng.

There are relations like this for all the tones, but I just wanted to give you a feel for how these relations work.

It's important to note that these relations hold about 85% of the time. When they don't hold, it's usually because the Cantonese word is rùshēng (ends in -p, -t, -k; remember that rùshēng words basically distributed randomly among the four Mandarin tones, so we can't predict how they are going to act), like 德, which is pronounced dak1.

Or, it could be that there were multiple ancient readings of a character. Mandarin went with one reading, and Cantonese the other. Sometimes Cantonese retains a historical reading, like 於, which is pronounced yu1. Mandarin should also be pronounced in the first tone: , but at some point, it got reinterpreted to (probably because of 于 ; note that even though 于 today is considered the "simplified version" of 於, they were originally different characters with different pronunciations).

Tonogenesis – the beginning of tones

Given that oracle bone script dates back to roughly 1250 BC, and that the first mention of tones in ancient Chinese literature is around the 5th century AD, that indicates that the first mention of tones is roughly 1750 years after the genesis of Chinese writing. This seems to point to Old Chinese not having tones.

However, that in and of itself doesn't mean that Old Chinese didn't have tones. It could be that people were just not aware of them (in the same way native speakers aren't really aware of what the word “the” means, but they still use it correctly anyway).

Back in 1954, a French linguist, André Haudricourt noticed that in very early Chinese loanwords into Vietnamese, those that correspond to qùshēng in Middle Chinese had the hỏi or ngã tones in Vietnamese, while in later loanwords (around the time of the Táng dynasty), Middle Chinese qùshēng corresponded to the Vietnamese sắc or nặng tones.

This is going to sound familiar (or it should): the words that came to have the hỏi tone were voiceless, and those that came to have the ngã tone were voiced.

So once again, we see that the voicing of the initial is important.

Vietnamese is from the Mon-Khmer language family. Words in Mon-Khmer (that are also native in Vietnamese), the hỏi and ngã tones correspond to a final -h in non-tonal Mon-Khmer languages. This -h reflects an earlier -s (or ).

On this basis, many scholars reconstruct syllables that are qùshēng in Middle Chinese as having an *-s ending in Old Chinese (* just means that the form is a reconstruction). Furthermore, it is also thought that the *-s ending has morphological significance, i.e., it means something.

That should be easily accepted by English speakers! We have third person singular -s (as in goes), plural -s (as in cars), and genitive -s (as in his, hers or Bob's).

Please note that I'm not saying the -s in English is related to the OC *-s!!

Edwin Pulleyblank (1962) even provides some examples where Chinese foreign word transcriptions make more sense with the *-s:

  • 波羅奈 pa-la-najH for Sanskrit Vārāṇasī
  • 阿魏 ʔa-ngjwɨjH and 央匱 ʔjang-gwijH for Tocharian B ankwa
  • 阿迦貳吒 ʔa-kja-nyiH-trae for Akaniṭha
  • 都賴 tu-lajH for Talas

  • 對馬 twojH-maeX for Tsushima

The Middle Chinese is given after the characters here. Note the -H means qùshēng (去聲/去声) and -X means shǎngshēng (上聲上声).

The reason we're showing Middle Chinese in a discussion about Old Chinese is that Middle Chinese is often used as evidence for OC. I've marked the qùshēng and the s in the foreign language in red to make for easier comparison. But, note in each case, that the qùshēng syllable matches with the s in the foreign language. You'll notice that some of the vowels don't match up, but that's because we're looking at Middle Chinese. For instance 都 (Middle Chinese: tu) corresponds to the Ta in Talas. But in Old Chinese, 都 is read *tˤa. Note too that this is merely one line of evidence.

In the same way that Old Chinese *-s corresponds to Middle Chinese qùshēng, a glottal stop in Old Chinese (ʔ) corresponds to Middle Chinese shǎngshēng, but we'll save that for another article.

One big takeaway here is the fact that tones are obviously an integral part of a Chinese syllable. Western learners of Chinese often think of tones as add-ons, as if zhāng is actually "zhang + the first tone."

If you've ever talked to native Chinese speakers about tones, or tried to make jokes based on saying the right syllable, but the “wrong” tone, you'll notice that they certainly do NOT see tones as an add-on. They see zhāng and zhàng, for instance, as being two completely separate entities.

One big mistake that comes from this type of “foreign thinking” is that people think of the syllable first, then try and add the tone on later. This strategy will only lead to bad tones. It's best to think of zhāng as a complete entity, and also zhàng as a complete entity.

I hope you find this discussion as interesting as I do! I was blown away by a lot of this stuff when I first learned it.

Let's do a quick recap:

  • tones arise from syllables losing sounds

  • tones are first mentioned in ancient Chinese literature around the 5th century AD
  • tones as an integral part of Chinese poetry starts around the 5th century AD and reach their apex in the Táng dynasty (618 – 907 AD)
  • syllables in Mandarin starting with l-, n-, m-, r- rarely if ever have first tone readings
  • tones in different dialects of Chinese are related to one another (something very useful to know if you want to learn a dialect)
  • Chinese loanwords into Vietnamese have predictable tones (depending on what point in history they entered Vietnamese), which is very useful if you want to learn Vietnamese (which I highly recommend—Vietnamese is a wonderfully fun language to learn!)

Congratulations, you now have a deeper understanding of Chinese tones!

1. The t in teach is unvoiced, while the ds in dad are voiced. The t in teach is also aspirated – followed by a puff of air, while the p in stop is usually unaspirated in normal speech.


  • @Kevin,

    Note that these are not IPA symbols, but pinyin. In Mandarin, d-, b-, g-, zh-, j- are not voiced—they’re unvoiced and unaspirated. The distinction between these sounds and t-, p-, k-, ch-, q- is that the latter are aspirated and the former are not. None of those sounds are voiced.

    Hope that helps!

    John Renfroe
  • I would love to see this information in video form to get both audio and visual support of these concepts..

    Vera Darmo
  • In the first two paragraphs after the “Take the case of Hindi and Punjabi” heading, I believe you confuse the concepts of aspiration and vocalization — in the examples of p/b and t/d. In all my readings, the difference you describe here is chocked up to vocalization, not aspiration. FWIW. Thanks, —K

    Kevin Baldwin

