Understanding Corruption in Kanji (Part 1) – Outlier Linguistics

Understanding Corruption in Kanji (Part 1)

Understanding Corruption in Kanji (part 1) by

Looking at the Etymologies for 面 and 友

By Ash Henson

First, a note on our use of the term “corruption.”

Note: In this article I use onyomi (オンみ) exclusively to indicate the pronunciation of kanji. This is simply a convention we use for the sake of consistency, and because as the Chinese-derived pronunciations, the onyomi are the only ones relevant when discussing kanji formation.

In our kanji dictionary (you can reserve your copy on Kickstarter until 22 June!) we use the term “empty component” to refer to components which don't indicate meaning or sound in a particular kanji (click here to learn about the other types of components). One of the primary reasons for the existence of empty components is kanji corruption, so we thought it would be interesting to talk about that a bit.

This is the first of two posts on the topic of kanji corruption. If you really want to understand how kanji work, you cannot over look corruption and the role it plays. We're not saying learners need to know this stuff to learn Japanese, but that we as researchers do in order to explain how kanji work, and we thought you might think it's interesting!

So what does it mean for a kanji to become corrupt? Basically, it means that the kanji changes form in such a way that the original form intended by the inventor of that kanji is altered.

Take メン “face” for example:

The oracle bone form (a) is a picture of an eye inside of a larger frame – a face. According to Li Xiaoding (李孝定), of the face’s sensory organs, the most representative of a face is the eyes, hence form (a). Form (b) is from a Qín dynasty excavated text, where the eye モク has been replaced with head (an earlier form of 首シュ). Forms (d) - (g) are from Hàn dynasty カンチョウ steles (stone tablets) (杜忠誥2002:138-143). In (d), you can see that the top line has really been exaggerated and this is the origin of the top stroke on the modern form of 面. Since this stroke was not intended by the inventor(s) of this kanji, it is a form of corruption. An uncorrupted form of 面 may have looked like this: .

Sometimes these changes that occur via corruption are neutral; in other words, they don’t affect the kanji’s functional components (sound components or meaning components), but oftentimes corruption actually causes damage to a kanji’s ability to express sound and/or meaning.

Why is this important?

Why is kanji corruption important? One reason is that if you’re trying to understand a kanji form, and part of that kanji is corrupted, then any explanation you give to it (other than that it’s the result of corruption) is going to be inaccurate. Another major reason has to do with being able to spot spurious etymologies (there will be a future post dedicated solely to explaining how to spot spurious etymologies). If any given author or book never mentions kanji corruption in their etymological explanations, chances are very good that you are reading or hearing spurious etymologies. That’s not to say that all kanji are corrupted, but a significant amount are. Let’s take look at the etymologies of some common kanji to better understand the different ways that kanji can become corrupted.

I’m going to be following along Tu Chung-kao’s (杜忠誥) book Examples of Corrupted Forms in the Shuōwén’s Small Seal Script1 [《說文篆文訛形釋例》] 2 since he does an excellent job of outlining the different types of corruption. According to Prof. Tu, one of the main reasons for kanji corruption is the actual process of writing kanji.

When manuscripts are being copied by hand, it is easy for mistakes to happen either because the manuscript being copied isn’t clear to begin with or if the scribe isn’t being particularly careful3.

Corruption by way of writing (i.e., copying manuscripts)

Prof. Tu gives an example related to ユウ “friend(s)”:

友 (oracle bone script: 4) was originally a picture of two right hands together (two 又), indicating friendship. 又 yòu also acts as a sound component.

The Setsumon Kaiji (セツモンカイ) lists 5 as one of 友’s ancient forms (ブン)6. Though it looks very similar to シュウ “to review”, the two are not related. actually evolved from this Bronze Inscription キンブン form 7, which was also used in the Chu script of the Warring States period, as can be seen in these examples:


(I love the Chu script!)

According to Chi Hsiu-sheng (季旭昇), the bottom half of is 一 and 白 (but pronounced like , not ハク), which is a corruption of an earlier カン.9 Had it survived into modern times, it may have looked like this image10, the 甘 “sweet” component presumably emphasizing the pleasurable feeling of having a good friend. Prof. Tu shows a possible path of the corruption from “two hand” to “wings” in the ancient (古文) form of 友:

Each step in this diagram shows a step towards corruption. (a) and (b) are still easily recognizable as a pair of hands, but then the roundness of the outer fingers becomes more and more square in (c) and (d), such that (d) is already completely square and looks like something in between the “two hands” form and the form for “wings” . Later scribes then interpret it to be something similar to “wings” and help it along by making it look more like “wings”, until finally in (e) and (f), all resemblance to “two hands” is lost. This is one of the ways that kanji corruption happens (see 杜忠誥2002:33-34).


By looking at parts of the etymologies for 面 and 友, we learned a little about one of the most common reasons for kanji corruption: the process of writing itself. Stay tuned for our next post which will explore the various types of kanji corruption by way of explaining the etymologies of 黒, 無, 舞, and 粦!


Before we begin, let's define the special use of the word "corruption" as it is used in this article. The use of "corruption" is reserved for kanji form changes that result in a degredation of the ability of a kanji to express sound and meaning. It is a translation of the Chinese term 訛變.

Important points:

1. It does not describe all kanji form changes, only ones that result in a degradation of meaning/sound representation.

2. It is not meant to convey the idea of getting back to a previous perfect state. It is merely saying that an unhindered ability to represent a sound or meaning is better than the lack of such ability.

Take a specific example, 做:

做 is derived from 作+攵 or 亻+𢼎. Let's take a look at the sounds of the parts:

乍 サ、 サク

作 サ、 サク

做 サ、 サク、 ソ

故, 古 コ

In the modern form, the middle part 古 was originally the sound component 乍. It changed from 乍 to 古 as the result of graphical confusion. Now, you have 做 with a structure that makes very little sense. It is not related to 故 nor 古 in any meaningful way (i.e., it does not give the sound コ. It does not give a meaning related to 故 or 古.) Yes, corruption has a negative connotation, but that's an accurate description. 乍 gives a sound in 作. 做 is an a corruption of 作+攵. 古 does not give sound or meaning in 做. 做 has lost part of its ability to express sound.

Prescriptivism vs. Descriptivism

We are not prescriptivist. I, myself, come from Texas and pronounce the word "get" as if it rhymed with "sit" (not "set"). I like my pronunciation even though it is not standard. When I learn foreign languages, I'm okay with making mistakes as long as they are mistakes that a native speaker would make. We are seeking to describe linguistic phenomena, not tell everyone what to do.

Since written records are only incomplete reflections of spoken language and since they appear late in history, it makes no sense to talk about "the original meaning" of a spoken word. It also doesn't make sense to say that semantic, syntatic, phonological change is necessarily bad. Language change is necessary and happens in all languages at all times (though at varying speeds). The use of "corruption" to describe a character form that has lost some or all of its ability to record sound and meaning is in no way similar to describing regular language change as being a "corruption."

The notion of corruption is relevant to language learning

According to memory experts, the number one rule for effective memorization is understanding the thing you are trying to remember. Kanji corruption is one of the major reasons for empty components (components that neither express sound or meaning in a kanji). If you go giving a meaning to every empty component, you add noise to your learning system. Any kanji has an infinite number of possible stories, but only the real story will help you see the overall semantic and sound patterns that kanji express. Knowing those patterns is useful for both learning and recall.

  1. The English translation here is my own.

  2. 杜忠誥,《說文篆文訛形釋例》,台北市:文史哲出版社,2002年。

  3. This obviously doesn’t apply to the process of listening and copying, which is subject to its own set of problems.

  4. This oracle bone form was taken from Academia Sinica’s 小學堂 (http://xiaoxue.iis.sinica.edu.tw/).

  5. This form was taken from Academia Sinica’s 小學堂 (http://xiaoxue.iis.sinica.edu.tw/).

  6. The Setsumon defines ancient forms (古文) as all characters created before the forms that appear in the Shǐzhòupiān [史籀篇] (according to tradition was written during the reign of King Xuān of Zhōu (周宣王; 827 to 782 BCE)).

  7. This character comes from the 毛公旅方鼎. The digital image used here was taken from Academia Sinica’s 小學堂 (http://xiaoxue.iis.sinica.edu.tw/).

  8. #1 comes from 江陵天星觀1號墓卜筮簡, #2 from 荊門郭店楚墓竹簡‧六德 and #3 from 荊門郭店楚墓竹簡‧語叢3; their digital images come from Academia Sinica’s 小學堂 (http://xiaoxue.iis.sinica.edu.tw/).

  9. 季旭昇《說文新證》上冊,藝文印書館印行,第196-197頁。

  10. Note that the カン “sweet” and エツ “to say” forms are very similar and are often confused for one another historically. Both forms derive from コウ “mouth”, showing something in the mouth; “something sweet and pleasant” for 甘 and a “symbol showing movement” for 曰。季旭昇《說文新證》上冊,藝文印書館印行,第379-381頁。The Chu forms show above contain 曰.


  • Hello. I write today merely to say ‘thank you’ for undertaking this research and making it available via this website, and via mobile apps like KANJI STUDY (for Android), produced by Chase Colburn. Both Outlier content and Kanji Study have been indispensable for my self-education in Japanese, what with having relocated to Japan suddenly a few years ago with only rudimentary formal (classroom) study. Two years later, I’m preparing to sit the JLPT N2 exam. Knowing that I can comprehend kanji more deeply with both Outlier and the accessibility of Kanji Study, I feel very confident in my preparations and chances for success both now and in the future. Thank you to all involved, much love and light and self-awareness to all sentient beings. Gassho.

  • @Naufal,

    Good question. An empty component is a component that doesn’t express either sound or meaning. Since a corrupted component’s original connection to the character’s sound or meaning has usually been completely obscured, we call them empty components.

    There are some cases in which a component corrupts into something else because they’re both 1) similar in form and 2) similar in meaning or sound. In those cases, we don’t call it an empty component, but we do say the component has been “semanticized” or “phoneticized”.

    So it just depends on the case in question.

    John Renfroe
  • Dear, Outlier.
    First, you all are doing a great project. I can’t exactly express my appreciation towards you.
    Because this project, I can acquire many new knowledge and easily remember more of word, pronunciation, and meaning of kanji than before.
    And second, I still don’t understand why you choose the word “empty” for corrupted components. While it’s just a (let we say) metamorphosis or evolution from the earlier form, not a fullfilment of an empty field.
    That is all. And it’s will be grateful if you answer this question.
    Sincerely yours.

    Naufal Al-Haidar
  • I have identified 銀 being used on a banknote title which was printed using a keisho typeface and noted another kanji of similar construct but lacking the print stroke order within the same title. However, after comparisons with another different banknote which used a tensho typeface for its title, I noticed the second use of 銀 as the kanji representative and the first taking on the second when expressed in keisho typeface. I have heard of shorthand kanji but as I am neither a scholar nor a native speaker, I cannot to make any definitive assertion on the title while unable to explain the variants. If there is a way for me to provide a jpeg or pdf attachment to illustrate, please reply provided this inquiry is worth your attention. I am a novice collector of foreign banknotes and have an appreciation for history and language. Despite the English translations, I was hoping to understand how the kanji was translated by dissecting each character by assigning each kanji its furigana in proper on- or kun- reading to bring the word or collection of word meaning into a proper translation. This has proved to be a challenge with nuances in script type as well as finding out that there is a list of uncommon kanji versus kanji used for names. And now I am convinced it might be that this kanji was written in some form of shorthand in order to account for its variation most notably its radical ⾦ did not conform when express in keisho typeface.

    Edward L Mendiola

Leave a comment

Please note, comments must be approved before they are published