What is Etymology, and Is it Useful for Learning Chinese Characters? P – Outlier Linguistics

What is Etymology, and Is it Useful for Learning Chinese Characters? Part 1

What is Etymology and Is it Useful for Learning Chinese Characters? (Part 1)

By Ash Henson

So, what is Chinese character etymology? It’s usually defined something like “the story of a character’s origin and development,” but that definition doesn’t give the full picture. Be that as it may, if we accept this definition for the moment, the first question someone learning Chinese should ask is “How much of that story do I need to know to effectively learn Chinese characters?” There is no single answer to this question, because there is no single way to learn anything. There are many different learning styles, each learner has a unique background and way of viewing the world, etc. Having said that, there are also similarities in how we learn and there are principles of effective learning that apply to us as human beings. As such, my answer to the question of how much of the story you need to know is: it depends. I’ve identified six aspects of character etymology that are pertinent to learning Chinese characters:

#1: Identifying the functional components in a given character.
#2: Identifying how the functional components function.
#3: Identifying corrupted components.
#4: Identifying the meaning that a given character was invented to represent and its relationship to the character form.
#5: Restoring the pictorial quality of character components.
#6: The full story.

In this post, I explain aspects #1 to #3, that is, I explain what each one is and its level of importance. Aspects #4 to #6 will be explained in part 2 of this post (it’s a constant battle to keep posts short!). As it turns out, there is a core of things one needs to know about a given character in order to learn it effectively, and then there are things which are interesting (well... to me anyway!) to know, but not strictly speaking necessary.

According to the Outlier philosophy, the goal of character learning is predictability and long-term recall. Predictability refers to when you come across a new character within a meaningful context, being able to make intelligent guesses about the range of sounds and the range of meanings that character might have. Ideally, you could use that knowledge to make a connection with a word or words in the spoken language. Long-term recall refers to the ability to recall a character form long after its been learned by way of understanding how Chinese characters as a system represent sound and meaning. Since spoken words are combinations of sound and meaning, you can use these two clues in conjunction with understanding characters on a systemic level to pluck your memory strings and recall a character’s form. This is accomplished by understanding the functional components of each character and by understanding how characters work on the system level.

Studies have shown that native Chinese speakers have an intuition about how a given unknown character may sound or what it may mean, but they often find it difficult to articulate. This intuition comes from learning thousands of characters. It is a reflection of the logic inherent to the Chinese writing system. And, it is imperfect. It also takes a long time to acquire. In the Outlier Dictionary of Chinese Characters, our aim is to instill the abilities required for long-term recall and predictability from day one, but that can only be done if we understand characters on their terms, not on ours. Now, let’s look at the seven aspects of etymology:

Aspect #1: Identifying the functional components in a given character.

This is by far the most important aspect of etymology. Knowing how a character represents sound and meaning is the basis for understanding Chinese characters as a system. It is also the basis for being able to detect real (as opposed to superficial) relationships between characters: sound and meaning relationships. And, last but not least, it’s the key to understanding individual characters. So, understanding what a character’s functional components are is crucial for all learners and they (in combination with Aspect #2) are what makes predictability possible to beginners (when you need it the most!). And, while they aren’t the only route to long-term recall, they are the most effective and they have the most positive side-effects.

Example: The functional components for 識1 shì “to know” are 言 yán “speech” and 戠 zhí “to gather together.” 言 is the meaning component and 戠 is the sound component (check out this article for more about the different types of functional components). People often times view this character as 言 + 音 + 戈 and then create a story to combine the meanings “speech” + “sound” + “lance,” but doing so hides the sound connections between 識 and other characters that share the sound component 戠 zhí : 職 zhí, 織 zhí, 幟 zhì. Creating a new story for how meaning is represented in this character (i.e., by breaking it into parts that aren't giving a meaning, then assigning a meaning to them) not only obscures the real way meaning is expressed, it gives a false impression as to how Chinese characters represent meaning in general. Not to mention, an infinite number of stories can be created for any one character, but only a story based upon the functional components will bring the benefit of seeing (from an early stage) the real sound and meaning connections between characters.

Aspect #2: Identifying how the functional components function.

In other words, how sound components express sound and how meaning components express meaning. While knowing what the functional components are in a character is very important, so is understanding how they function in that character. For instance, most people do not distinguish between a component expressing meaning by way of its meaning (i.e, meaning components) vs. expressing meaning by way of form (i.e., form components).

What does that mean exactly? Each functional component has three attributes: form, meaning and sound (or pronunciation). Take 自 “self” for example. Its form is a picture of a person’s nose. Its meaning is “self” and its sound is . If 自 expresses meaning by form (i.e., it’s a form component), then the meaning it expresses has to do with “nose,” as in 息 2 “to breathe,” 鼻 “nose,” 臭 chòu “to stink,” 嗅 xiù“to smell.” So, in the characters 息 , 鼻 , 臭 and 嗅 , 自 is a form component. Characters with meaning components appeared rather late in the game and as such, they are small in number.

Ex. 歪 wāi“not straight, crooked (literally 不正).” It’s easily seen that this character is based upon the combination of the meanings 不 “not” and 正 “straight.” 不’s form is either “part of a plant” or “roots of a plant” and 正’s form is “feet marching towards a city.” These two forms obviously have nothing to do with the meaning of 歪. So, in 歪, 不 and 正 are meaning components. Most semantic components give meaning by form and only a minority give meaning by meaning, yet, most people interpret characters to all be meaning components. Worse yet, they don’t consider the original meanings of the components, but depend rather on their modern meanings. This way of thinking is almost guaranteed to be inaccurate (read: do not help with predictability, long-term recall or seeing real connections between characters).

Understanding how components express meaning and sound is very important to understanding how both individual characters work and characters work as a system.

Aspect #3: Identifying corrupted components.

Technically speaking, this should be part of Aspect #1, but since most people aren’t familiar with this concept, I’ll handle it as a full aspect. Character corruption means that a “character changes form in such a way that the original form intended by the inventor of that character is altered.” In other words, things aren’t always what they seem. There are several advantages to knowing when a component is corrupted or not.

1. To clear up misunderstandings and answer questions such as:

  • What does 往 wǎng “toward” have to do with 主 zhǔ “master, owner?”
    The answer is: nothing, except a superficial connection. The right side of 往 is actually 往主.png3, which is a foot (representing “to go”) and 王 wáng which gives the sound. The 彳 was added later. The “foot” got corrupted into 丶, making 往主.png look like 主.
  • How do the two mountains 山 shān in 出 chū “to go out” represent the notion “to go out?”
    Answer: they don’t. The top is a corruption of 止 zhǐ . Take a look: “to go out” was originally represented by a foot (止 zhǐ , which means “stop” in modern Chinese) walking out of a cave. Through the process of stylization, came to look like two mountains 山 on top of one another, although it makes more sense to analyze it as 屮 (an empty/corrupted component) on top of 凵 (a form component depicting a cave opening). That’s exactly how the Outlier Dictionary explains the character.

2. To give your mind closure. If you know that a given component is a corruption, you know that it is not adding a sound and meaning to the character. There’s no need to look any further for an explanation.

3. To give you a clearer understanding of how modern characters work. Predictability and long-term recall come quickest and most efficiently by understanding characters on an individual as well as systemic level. If you reinterpret corrupted components with your own meaning, you’re simply adding noise to the system.

But isn’t this just making the whole thing more complicated and harder to learn? I would argue no. The number one rule for memorizing anything is understanding. The more you understand the object of learning, the easier it is for you to remember that thing. Understanding which character components are corrupted is increasing your understanding. And, it’s not necessary to know the whole story behind the corruption. The main thing is knowing that the corrupted component is not giving a sound or meaning to the character.

Some examples:

Ex. 高 gāo “tall”: It’s enough that you know that 亠 tóu “lid, cover,” 口 kǒu “mouth” and 冋 jiǒng have nothing to do with why高 looks the way it does. It is actually just a picture of a tall building and has a 口 on the bottom to distinguish it from 京 jīng, which is also a picture of a tall building. Of course, you do need to remember that these components are necessary for correctly writing 高, but you can’t use them to understand why 高 looks the way it does.

Ex. lín “ghost fire”: Since 粦 is only used as a sound component in modern Chinese, it’s enough to know how to write it and its pronunciation. However, understanding what it was originally a picture of is interesting to some people (and can aid in remember how to write it). If you’re one of those people (I know I am!), then you need to remember that 米 “rice” isn’t giving a meaning or sound, it’s merely a placeholder for a picture of a corpse that is on fire (okay, so knowing the real story isn’t always more pleasant!) and that 舛 is a picture of two feet.

Check out Part 2 here!

  1. In the PRC, the standard reading for this character is shí.

  2. xī is the standard pronunciation for 息 in the PRC.

  3. Character form from 小學堂.


  • Outlier is the best method and your dictionary is in at least in one respect even better than pleco(!): etymology. Great posts.

    I think the corruption / simplification / standardization with the Song font created characters like 出 and 止 which could be seen as ideograms in addition to their original forms as pictophonetic and pictograms. Some of the simplifications are clearly just to make it easier to write by hand but occasionally they seem to change the logic of the character. To me chu is a pair of mountains and going out means passing through mountain ranges; and zhi is a person stopped still in front of something high on the right 上

    I have no evidence and my Chinese still isn’t good enough to look into the song standardization/simplifications to figure out if there is any logic other than reducing variation by standardization and making writing easier by simplification. Another wrong example is 重 which i still want to see as a heavy cart because it has extra large wheels.

    I know these three are not etymologically correct, but thought to point it out since it might be interesting for you.

    Your method is correct, though I think the term “form component” is unclear - do you mean form as in visual as in picto? or do you mean form as in formal as in idea as in semantic? This is a vagary of the word “form” in English. So … if you mean “pictorial” then I would suggest using that term but if you mean otherwise then I would suggest using an unequivocal term.

    Thank you,


    Eric Engle
  • I love approaching etymology as a memory and conceptual aid.

    Is there a prominent understanding about this in Chinese-speaking countries? Often when I ask native speakers, they seem not to know.

    I understand that they might not need to know (as a North America would rarely be taught these matters), but I’m curious if they anyone in these regions learns these matters in school to your knowledge.

    Anthony Metivier

Leave a comment

Please note, comments must be approved before they are published