What is Kanji Etymology? Part 1

What is Etymology, and Is it Useful for Learning Kanji? Part 1

What is Etymology and Is it Useful for Learning Kanji? (Part 1)

By Ash Henson

So, what is kanji etymology? It’s usually defined something like “the story of a kanji’s origin and development,” but that definition doesn’t give the full picture. Be that as it may, if we accept this definition for the moment, the first question someone learning Japanese should ask is “How much of that story do I need to know to effectively learn kanji?” There is no single answer to this question, because there is no single way to learn anything. There are many different learning styles, each learner has a unique background and way of viewing the world, etc. Having said that, there are also similarities in how we learn and there are principles of effective learning that apply to us as human beings. As such, my answer to the question of how much of the story you need to know is: it depends. I’ve identified six aspects of kanji etymology that are pertinent to learning kanji:

#1: Identifying the functional components in a given kanji.
#2: Identifying how the functional components function.
#3: Identifying corrupted components.
#4: Identifying the meaning that a given kanji was invented to represent and its relationship to the kanji form.
#5: Restoring the pictorial quality of kanji components.
#6: The full story.

In this post, I talk about aspects #1 to #3. That is, I explain what each one is and its level of importance. Aspects #4 to #6 will be discussed in part 2 of this post (it’s a constant battle to keep posts short!). As it turns out, there is a core of things one needs to know about a given kanji in order to learn it effectively, and then there are things which are interesting (well, to me anyway!) to know, but not strictly speaking necessary.

According to the Outlier philosophy, the goal of kanji learning is predictive ability and long-term recall. Predictive ability refers to when you come across a new kanji within a meaningful context, being able to make intelligent guesses about the range of sounds and the range of meanings that kanji might have. Ideally, you could use that knowledge to make a connection with a word or words in the spoken language. Long-term recall refers to the ability to recall a kanji form long after it’s been learned by way of understanding how kanji as a system represent sound and meaning. Since spoken words are combinations of sound and meaning, you can use these two clues in conjunction with understanding kanji on a systemic level to pluck your memory strings and recall a kanji’s form. This is accomplished by understanding the functional components of each kanji and by understanding how kanji work on the system level.

Studies have shown that native speakers have an intuition about how a given unknown kanji may sound or what it may mean, but they often find it difficult to articulate. This intuition comes from learning thousands of kanji. It is a reflection of the logic inherent to the kanji on a system level. And, it is imperfect. It also takes a long time to acquire. In the Outlier Kanji Dictionary, our aim is to instill the abilities required for long-term recall and predictive ability from day one, but that can only be done if we understand kanji on their terms, not on ours. Now, let’s look at the first three aspects of etymology:

Aspect #1: Identifying the functional components in a given kanji.

This is by far the most important aspect of etymology. Knowing how a kanji represents sound and meaning is the basis for understanding kanji as a system. It is also the basis for being able to detect real (as opposed to superficial) relationships between kanji: sound and meaning relationships. And, last but not least, it’s the key to understanding individual kanji. So, understanding what a kanji’s functional components are is crucial for all learners and they (in combination with Aspect #2) are what makes predictive ability possible to beginners (when you need it the most!). And, while they aren’t the only route to long-term recall, they are the most effective and they have the most positive side-effects.

Example: The functional components for シキ “to know” are ゲン “speech” and シキ “to gather together.” 言 is the meaning component and 戠 is the sound component (check out this article for more about the different types of functional components). People often times view this kanji as 言 + 音 + 戈 and then create a story to combine the meanings “speech” + “sound” + “halberd,” but doing so hides the sound connections between 識 and other kanji that share the sound component シキ: ショク, シキ (also ショク), . Creating a new story for how meaning is represented in this kanji (i.e., by breaking it into parts that aren't giving a meaning, then assigning a meaning to them) not only obscures the real way meaning is expressed, it gives a false impression as to how kanji represent meaning in general. Not to mention, an infinite number of stories can be created for any one kanji, but only a story based upon the functional components will bring the benefit of seeing (from an early stage) the real sound and meaning connections between kanji.

Aspect #2: Identifying how the functional components function.

In other words, how sound components express sound and how meaning components express meaning. While knowing what the functional components are in a kanji is very important, so is understanding how they function in that kanji. For instance, most people do not distinguish between a component expressing meaning by way of its meaning (i.e, meaning components) vs. expressing meaning by way of form (i.e., form components).

What does that mean exactly? Each functional component has three attributes: form, meaning and sound (or pronunciation). Take “self” for example. Its form is a picture of a person’s nose. Its meaning is “self” and its sound is ジ. If 自 expresses meaning by form (i.e., it’s a form component), then the meaning it expresses has to do with “nose,” as in ソク “to breathe,” “nose,” シュウ “to stink,” and キュウ “to smell.” So, in the kanji 息, 鼻, 臭, and 嗅, 自 is a form component.

Kanji with meaning components appeared rather late in the game and as such, they are small in number.

Ex. ワイ “not straight, crooked (literally セイ).” It’s easily seen that this kanji is based upon the combination of the meanings 不 “not” and 正 “straight.” 不’s form is either “part of a plant” or “roots of a plant” and 正’s form is “feet marching towards a city.” These two forms obviously have nothing to do with the meaning of 歪. So, in 歪, 不 and 正 are meaning components. Most semantic components give meaning by form and only a minority give meaning by meaning, yet, most people interpret kanji to all be meaning components. Worse yet, they don’t consider the original meanings of the components, but depend rather on their modern meanings. This way of thinking is almost guaranteed to be inaccurate (read: do not help with predictive ability, long-term recall or seeing real connections between kanji).

Understanding how components express meaning and sound is very important to understanding both how individual kanji work and how kanji work as a system.

Aspect #3: Identifying corrupted components.

Technically speaking, this should be part of Aspect #1, but since most people aren’t familiar with this concept, I’ll handle it as a full aspect. Kanji corruption means that a “kanji changes form in such a way that the original form intended by the inventor of that kanji is altered.” In other words, things aren’t always what they seem. There are several advantages to knowing when a component is corrupted or not.

1. To clear up misunderstandings and answer questions such as:

  • What does オウ “toward” have to do with シュ “master, owner?”
    The answer is: nothing, except a superficial connection. The right side of 往 is actually 往主.png1, which is a foot (representing “to go”) and オウ which gives the sound. The 彳 was added later. The “foot” got corrupted into 丶, making 往主.png look like 主.
  • How do the two mountains サン in シュツ “to go out” represent the notion “to go out?”
    Answer: they don’t. The top is a corruption of . Take a look: “to go out” was originally represented by a foot (, which means “stop” in modern Japanese) walking out of a cave. Through the process of stylization, came to look like two mountains 山 on top of one another, although it makes more sense to analyze it as 屮 (an empty/corrupted component) on top of 凵 (a form component depicting a cave opening). That’s exactly how the Outlier Kanji Dictionary explains the kanji.

2. To give your mind closure. If you know that a given component is a corruption, you know that it is not adding a sound and meaning to the kanji. There’s no need to look any further for a form explanation.

3. To give you a clearer understanding of how modern kanji work. Predictive ability and long-term recall come quickest and most efficiently by understanding kanji on an individual as well as a systemic level. If you reinterpret corrupted components with your own meaning, you’re simply adding noise to the system.

But isn’t this just making the whole thing more complicated and harder to learn? I would argue no. The number one rule for memorizing anything is understanding. The more you understand the object of learning, the easier it is for you to remember that thing. Understanding which kanji components are corrupted is increasing your understanding. And, it’s not necessary to know the whole story behind the corruption. The main thing is knowing that the corrupted component is not giving a sound or meaning to the kanji.

Some examples:

Ex. コウ “tall”: It’s enough that you know that トウ “lid, cover,” コウ “mouth” and キョウ have nothing to do with why 高 looks the way it does. It is actually just a picture of a tall building and has a 口 on the bottom to distinguish it from キョウ, which is also a picture of a tall building. Of course, you do need to remember that these components are necessary for correctly writing 高, but you can’t use them to understand why 高 looks the way it does.

Ex. リン “ghost fire”: Since 粦 is only used as a sound component in modern Japanese, it’s enough to know how to write it and its pronunciation. However, understanding what it was originally a picture of is interesting to some people (and can aid in remember how to write it). If you’re one of those people (I know I am!), then you need to remember that マイ “rice” isn’t giving a meaning or sound, it’s merely a placeholder for a picture of a corpse that is on fire (okay, so knowing the real story isn’t always more pleasant!) and that 舛 is a picture of two feet.

Keep an eye out for Part 2!


  1. Kanji form from 小學堂.

eric wang