Why you need to start thinking of kanji in terms of functional components.
By John Renfroe
This advice is going to rub some people the wrong way, but that’s alright. We’re on a mission to teach you Japanese properly – not to rehash the way “things have always been done.”
Hopefully by the end of the article you’ll understand why I say this.
Kanji radicals have a single purpose: indexing kanji in a dictionary.
They are not designed to help you learn Japanese kanji, and they are not the building blocks of kanji. There, I said it.
There’s a huge misconception about how kanji work. You see this sort of advice all the time: “Kanji are made up of radicals, so you should learn the radicals first,” or “Make sure you learn the radicals. They’re the building blocks of kanji characters.”
This is not true.
People who say this are well-intentioned but ill-informed about the nature of the kanji.
Let’s take a look at the difference between kanji radicals and functional components.
The word “radical” is best understood as “a kanji component that sometimes plays the role of radical,” NOT “a kanji component that has the nature of being a radical”.
For example, 大 【ダイ】 “big” is a component that is on the list of radicals, but that doesn’t mean that 大 is always a radical when it appears in a kanji. A single kanji only has one radical, no matter how many kanji components it has.
And since the choice of which component will play the role of radical is up to the editor of a given dictionary, it may be different in different dictionaries—and may differ between Chinese and Japanese! That’s because a radical’s role is to organize dictionaries, not to explain kanji structure!
And yes, many of the components on the list of radicals do show up a lot in kanji and therefore should be learned, but they should be learned as part of a system of functional components — components which express sound and meaning.
Just memorizing common radicals or radical names is going to leave you lost without a path towards literacy.
So, if you’re talking about radicals, the conversation should focus on dictionary lookup. If you’re talking about how kanji work, or about etymology, then it should be about semantic components and sound components. Getting the terminology straight helps to prevent confusing statements like “radicals are the building blocks of kanji.” They're not. Functional components are.
The concept of radical, or 部首 【ブシュ】 (bushu), didn’t even exist until after the publication of the Setsumon Kaiji (説文解字 【セツモンカイジ】; Shuōwén Jiězì in Chinese) in 100 CE, at which point the writing system had already been around for well over 1500 years. The vast majority of kanji in use today were invented before the Setsumon was published.
Read that again and let it sink in.
If that’s the case, then there’s no way that “radicals” were what people had in mind when they were creating kanji, because radicals didn’t even exist yet! There must have been something else going on.
“There’s no way that radicals were what people had in mind when they were creating kanji.”
The word “radical” is really a poor translation into English of the Japanese (actually, Chinese) word 部首 【ブシュ】 in the first place. Bushu literally means “section head.” Following the model of the Setsumon, kanji dictionaries are traditionally arranged into sections containing similar graphic components. These sections are called 部 【ブ】. The first kanji in that section is the section head (部首 【ブシュ】), or the “first of the section.”
Each kanji in that section is filed under that bushu. Note that I didn’t say the kanji “has” one 部首. It’s an important distinction to make. The kanji is filed under a 部 【ブ】, or section. This is a choice made by the editor of a kanji dictionary, not an inherent part of the nature of kanji.
Think about it—usually when you see a kanji’s radical listed, you’ll also see a stroke count. In a dictionary entry for 家, you’ll likely see “宀 + 7” in the dictionary (side note: 宀 is often called ウかんむり u kanmuri because it sort of looks like the katakana ウ). This is because traditionally-arranged Japanese language dictionaries would first sort kanji by radical, and then by stroke count or stroke order. So if you know the stroke order of the kanji you’re looking for, and can guess at which radical it might be filed under, then you’ll have an easier time finding that kanji in a dictionary.
Which section to file a kanji under can be a fairly arbitrary decision. Most people’s understanding is that the bushu gives a hint about meaning, and that the sound component (声符 【セイフ】) gives a hint about the sound, and that the two are different entities. But that’s not always the case!
Sometimes, the bushu is the sound component. For example, 刂 (刀 【トウ】, “knife”) on the right side of 到 【トウ】 “to arrive” is both the sound component and the radical in 到, but it is not the meaning component. 至 【シ】 (the component on the left side of 到) is the meaning component, and it means “to arrive,” just like 到. Intuitively, you might think that radicals are assigned in a consistent manner, but sometimes the way they’re assigned can be pretty random, as we've seen.
Note that while 刂 and 刀 may look like different radicals, they are actually variants of each other—many radicals have one or more variants that are considered to be essentially the same radical.
You might be thinking, “Sure, there are exceptions, but kanji radicals are usually related to the meaning of the kanji!” But actually, that’s only true about 64% of the time. That means that in 36% of kanji, the radical is not related to meaning. Would you ask a friend for driving directions if you knew he gets lost 36% of the time? I wouldn’t!
So again, kanji are filed into a given section. This is a choice made by a human being, not an inherent part of the nature of kanji, and it’s a flawed — but workable — system.
So hopefully, you can see that “radicals” (remember: section headings!) are useful for organizing and looking things up in a dictionary, but they’re not especially useful for explaining how kanji work.
You should look at all Japanese kanji in terms of their functional components. These are the real building blocks of kanji, because they’re how kanji were originally designed in the first place!
Kanji components can serve a few different functions, and you need to understand those functions instead of lumping them all under one category called “radicals,” as most people do.
There are three attributes that all kanji have (using 大 as an example):
Form: What is it a picture of? 大 is a picture of a person (specifically, an adult).
Meaning: What does it mean? 大 means big, because adults are big in comparison to children.
Sound: What is its pronunciation? (Or, if it’s a sound component, what is the range of sounds it can represent?) 大 is pronounced ダイ dai in Japanese
Note that I'm using onyomi here and in the rest of the article, since those readings are the only ones that are relevant when discussing sound components—remember, kanji came from Chinese, so the sound relationships don’t work for kunyomi words, which are native to the Japanese language.
The possible functions that a component can have derive directly from these three attributes. Let’s take a look at those functions now, and you’ll see how much sense it makes to learn the functional components that make up a kanji, rather than thinking in terms of radicals.
A component can express meaning by way of its form.
Example: 大 is a picture of a person, and that is its function in, for example, the kanji for “beautiful” 美 【ビ】. 美 is not a “big” 大 “sheep” 羊, but a depiction of a person 大 wearing a headdress (the headdress 𦍌 now resembles 羊, but it's unrelated). This is by far the most common way of expressing meaning.
Other examples of 大 functioning in this way include:
天 【テン】 “heavens” (originally “person with a mark indicating the forehead”)
夫 【フ】 “husband, man”
A component can express meaning by way of meaning. Example: 大 means “big,” and it expresses the meaning “big” in kanji like 尖. This is how most people explain all semantic components, but in reality this function is much less common!
尖
Form: “small” over “big”
Meaning: “sharp”
Sound: セン sen
As for why “small” over “big” means “sharp,” take a look:
A component can express sound. Example: 大 is pronounced 【ダイ】 dai in Japanese, and it’s the sound component in the kanji 太【タイ】 tai “great, large”.
Then there is a fourth function that derives from the way kanji evolved in form over time. A component can also serve as a placeholder for an earlier form that has now been corrupted.
This one is difficult to figure out without academic training in paleography, but the Outlier Kanji Dictionary explains which components have been corrupted and how. Continuing with 大 as an example, there are 1) instances in which a component was originally 大 but has now changed to something else, and 2) instances in which a component started as something else but has corrupted to look like 大 today. That means you can’t trust your eyes—you need a reliable source to tell you what’s what!
The sound component in 達 is 𦍒 【タツ、ダ】 tatsu, da. The top part today looks like 土 【ド、ト】 “earth,” but it was originally 大, which was then corrupted over time. An uncorrupted version of this component would look like 羍 today.1
The form above is written in small seal script (小篆【ショウテン】). This is what 大, 土, and 達 looked like in small seal, for comparison:
In the kanji 莫【バク、ボ】 (“do not,” but it originally represented the word “sunset,” which is now written 暮【ボ】), what today looks like 大 on the bottom was originally 艸 【ソウ】 “grass” (there was 艸 on both the top and bottom, and the kanji depicted the sun setting behind the grass), which then corrupted over time to look like 大.
So now you’ve seen how the same component can serve completely different functions in different kanji, and how components can become corrupted over time, obscuring their original purpose. Here’s the interesting thing: out of the kanji I’ve just discussed, 大 is only the radical in 天, 夫, and 太. In the others, it’s not a radical, no matter which function it’s serving! The radical in the other kanji is:
尖:小
美:羊
達:辶
莫:艹
It’s important to note that which side of a kanji a component shows up on has zero bearing on what its function is. There are some general trends, but lots of exceptions! So whether a component is on the right side or the left side of a kanji, you need a little help from a resource like the Outlier Kanji Dictionary to help you figure out what the kanji’s real structure is!
Again, all this is not to say that you should completely throw radicals out the window. They’re good to know, but you should keep in mind what they’re used for: looking up kanji in traditionally-arranged dictionaries. That’s it. They’re not the “building blocks of kanji.” Functional components are! Radicals are an imperfect, man-made system of arranging and looking up kanji in a dictionary. The concept of 部首 didn’t even exist when the vast majority of kanji were being created.
But sound and semantic components did exist. Sound and semantic components are the building blocks of kanji. Sound and semantic components are what people were thinking of whenever they made a new kanji. When you’re learning a new kanji, thinking in terms of these functional components (rather than radicals) will clarify a lot of confusing things about kanji, and help you a lot with the memorization you need to do in order to learn all of the jōyō kanji. And whether you enjoy using flashcards or mnemonics (or both!) to learn kanji, you’ll find that learning their real structure via functional components will make it much easier to get through that list of kanji you need to memorize!
Anything that tries to explain kanji and uses terms like “dotted cliff radical” or “water radical” is going to be inaccurate and (unintentionally) leading you astray.
𦍒 is also a semantic component. 达 is a picture of a guy walking across the road. The original meaning was “arrive at point b from point a”. 達 is the same thing, but has a guy leading a sheep from point A to point B. ↩
Long time no see! As you’ve probably noticed, we haven’t posted anything in a while (several months, actually). Well, that’s because there are a lot exciting things coming our way. We should have some really exciting announcements coming up in the next few weeks.
In this post, I’ll explain the origins of 東 and 西. Part 2 will discuss the origins of 南 & 北.
Overview:
Since the following explanation of 東 is rather involved, I’ll start here with a simple overview. Three characters with similar forms, development and meanings: 東 【トウ】, 束 【ソク】 and 橐 【タク】. Their pronunciations were also much more similar thousands of years ago when these characters were created than they are in modern Japanese. Each of these characters were originally pictures of bags tied at both ends. 東 【トウ】 and 束 【ソク】 were even used interchangeably in bronze inscriptions, while 橐 【タク】 is basically 束 with 石 【セキ、シャク、コク】 added as sound component (though that isn’t obvious from modern Japanese). 東 【トウ】 ended up meaning “east” via sound-loan.
The details:
Well, we can’t really talk about 東 without first talking about 束 【ソク】 “bind; bundle” and 橐 【タク】 “a bag.” What does “east” have to do with bags, binding and bundles? We’re about to find out!
According to Chou Fa-Kao [周法高 Zhōu Fǎgāo], 束 is “a picture of a bag that is tied at both ends.”1 Its original meaning was “bind.” So, a picture of a bag that is bound at both ends to represent the meaning bind. Pretty straightforward, right?
Note that in the diagram above, the full lines represent a change in time period, while the broken line represents forms that were contemporary to one another. It was very common for a given character to have several different forms at any given time, similar to how English words would have several different spellings before spelling was standardized.
It is easy to see that the Bronze Inscription [金文 【キンブン】] form (1c) and the Small Seal script [小篆 【ショウテン】] form (1d) are the direct descendants of the Oracle Bone [甲骨文 【コウコツブン】] form (1a), while (1b) is an alternate form. There are actually quite a few other alternate forms, but here, I’m trying to keep it simple and just show the main branch. Note that though 束 looks somewhat like a combination of 木 and 中 in modern Japanese, in reality, it’s not related to either 木 or 中.
In Outlier terms, we say that the similarity between 束 and 木 + 中 is a surface structure similarity. That is to say, the similarity is not related to the meaning and sound of the components involved, rather it’s due basically to a fluke of history. 束 probably ended up looking like 木 either 1) by the fact that the three lines on the top part of characters tended to flatten out, while those on the bottom didn’t, or 2) by way of analogy with 木.
Even a quick glance at the forms (2a) - (2e) shows that they are very similar to (1a) - (1d) above, except that (2a) - (2e) have an extra line or lines through the main body. According to Lín Yìguāng [林義光], the forms for 束 and 東 where interchangeably used in Bronze Inscriptions. As such, he proposed that the two forms are actually merely variants of the same character.
So, here, we have yet another character that has a similar origin, 橐 【タク】 “bag.” The Shang dynasty Oracle Bone forms (3a) and (3b) look very similar to the Oracle Bone forms for 束 and 東. We can see in (3d) that by the time of the Qin dynasty, 石 【セキ、シャク、コク】 “rock” had been added to the character. Though not obvious from a modern perspective, 石 was added as a sound component. Looking at the Japanese pronunciation, 石 【セキ、シャク、コク】 acting as sound component for 橐 【タク】 seems ludicrous, but if we look at Cantonese pronunciations, sehk and tok respectively, we see that they share a common ending, -k2. Japanese simply adds a vowel because Japanese phonology doesn't allow for syllables ending with -k.
The main vowels in both Cantonese and Japanese are different, but it’s helpful to point out that the sounds of all languages change over time—some slower, some more quickly. During Chaucer’s time (1343 to 1400), the words food, good and blood all rhymed (sounding like goad). Then, during Shakespeare’s time (1564 to 1616), they still rhymed, but by that time they all rhymed with how we now say food 3. Nowadays, they have diverged from one another and all sound different.
If we look at the Old Chinese (OC) reconstructions for these words, 橐 *tak and 石 *dak4, we can see that the initials (the t- and the d-) are pronounced in the same part of the mouth, and therefore are very closely related sounds, as are the main vowels and -k endings. As such, 石 made a rather suitable sound component for 橐. Don’t let the * symbol scare you! It just means that these sounds are reconstructed and have not been attested directly.
Interestingly, there is yet another character with similar origins, though to keep things simple, I’ll leave it out of the analysis: 柬 【カン、ケン】 “card, letter; choice” (in ancient times, it meant the same thing as 束, but had a different pronunciation. In other words, the characters 束 and 柬 represented two different spoken words that had the same or very similar meaning). In Japanese, this component is usually simplified to 東 (for example, in kanji like 練 and 錬, both pronounced 【レン】).
Looking at the OC reconstructions for 東 *tong and 束 *s-tok5, we can see that they both share the same main vowel “o” and an initial “t-.” Though the endings “-ng” and “-k” aren’t exactly the same, they are pronounced at the same place in the mouth (which indicates that they are closely related sounds). So what is up with the “s” at the beginning of 束? The dash in front of the “s” indicates that it is a prefix.
This is similar to English, where we have a series of related words, but differ phonologically because of the addition of prefixes and suffixes. Take the root word “get”: forget, beget, got, gotten, begotten. This series of words have related meanings which are expressed by differing in grammatical affixes (i.e., things which you can attach to root words that express meanings) and changing of the main vowel.
Just like the prefixes for- and be- in the word family for the root word get, the OC *s- prefix is attached to the root word tok. Sharing the same root word (though not necessarily the same affixes) was the main requirement for characters sharing the same sound component. This is one of the reasons we see a significant amount of sound variation in Chinese character sound series.
So why does 東 mean “east” if it’s a picture of a bag tied at both ends? Well, because the word for “east” sounded similar to the word that the character 東 represented, i.e., “bag,” so it was borrowed by way of sound loan to write “east.”
In our dictionary, 東 is explained like this:
The explanation for 西 is far simpler than the one for 東. There’s an agreement among scholars that 西 is a picture of a bird’s nest. According to the Setsumon Kaiji 説文解字, the first character dictionary that tries to explain character forms (published in AD 121), the connection to “west” is because birds return to their nests at sundown and the sun sets in the west6. This is somewhat confirmed by the fact that the character 棲 【セイ】 “resting place for birds; nest” was also often used to write “west.” However, it’s also possible that it is simply a sound loan (i.e., the word for “west” sounds similar to the word meaning “bird’s nest”).
Be sure to check back for the forthcoming part 2 of this series, where we take a look at the origins of 南 & 北. We also have several very exciting announcements to make soon! So stay tuned!
I’m consulting Chi Hsiu-Sheng [季旭昇 Jì Xùshēng]’s (2014)《說文新證》,page 512。↩
Many southern Chinese dialects, such as Cantonese, Hakka and Southern Min retain the entering tone [入声 【ニッショウ】] endings: -k, -p, -t and maintain three nasal endings: -ng, -n and -m. In Japanese, these entering tone endings are generally retained in onyomi readings ending with -ku, -ki, -tsu, -chi, and so on. ↩
Source: http://grammar.about.com/od/fh/g/GreatVowelShift.htm↩
These are simplified versions of Baxter-Sagart Old Chinese reconstructions, version 1.1:
read as: {char} *{OC} {(OC rhyme}) > ({fǎnqìe spelling}切) {MC}
橐 *tʰˤak (鐸部) > (他各切) thak
石 *dAk (鐸部) > (常隻切) dzyek
I’m using a simplified version of the reconstructions, not because they are problematic, but rather to point out to people not trained in historical linguistics the sound similarities between the two words. Explaining the basics of OC reconstruction is a can of worms I’d rather not open just yet.↩
As above, these are simplifications of the Baxter-Sagart reconstructions, version 1.1:
東 *tˤoŋ (東部) < (德紅切) tuwng
束 *s-tʰok (屋部) < (書玉切) syowk↩
《說文》: 「西,鳥在巢上。象形。日在西方而鳥棲,故因以爲東西之西。凡西之屬皆从西。 」↩
Just wanted to make a quick update to say we've released a huge update our dictionary!
This update adds another 500+ Essentials kanji entries to the dictionary, bringing the total number of completed entries to over 2250!
If you're a Kickstarter backer, then you already have early access to the dictionary in Pleco, so you should be prompted to download the update automatically when you open Pleco. If not, just go to Menu > Add-ons > Updated and you should be able to download it there.
If you didn't back the dictionary on Kickstarter but want to get it via our app when it's released (soon!), you can pre-order it here!
Stay safe and enjoy your studies!
]]>By Ash Henson
In the previous post, I went over the first three aspects of etymology as it relates to learning kanji, namely identifying the functional components in a given kanji, identifying how the functional components function, and identifying corrupted components. In this post, I’ll go over the remaining three:
#4: Identifying the meaning that a given kanji was invented to represent and its relationship to the kanji form.
#5: Restoring the pictorial quality of kanji components.
#6: The full story.
Aspect #4: Identifying the meaning that a given kanji was invented to represent and its relationship to the kanji form, a.k.a. a kanji’s “original meaning.”
This is important because it’s the only meaning that is directly related to a kanji’s form. Some kanji still represent their original meanings, like 人 “person,” 火 “fire” or 山 “mountain” for example. Some, represent secondary meanings, like 木 “wood” (originally “tree”), 取 “to take” (originally “to take a ear off a dead soldier after a battle”) or 段 “section, part” (originally “to break into pieces”). And some, are rather distantly related, like 漢 “Chinese ethnicity,” which was originally the name of a river (located in what is now Shaanxi province). Then, it became the name of a Chinese dynasty, and then of the Chinese ethnicity. In situations like this, it’s not important to memorize this relationship. It is, however, important to at least read it. That way, your brain realizes that there is a rational reason for the form component of 漢 being 氵 “water.” For the purposes of memory, brains tend to prefer patterns to randomness.
Note: Check out this article for more about the different types of functional components
But there are also a significant number of kanji whose modern meanings aren’t related at all to their forms. The main reason for this is sound-loan. Because it is difficult to come up with pictures that represent grammatical concepts, the kanji used to write grammatical words are very often sound-loans. For instance, 其 “that” was originally a picture of a winnowing basket. The word for winnowing basket (now written 箕 jī) was similar in sound to the word for “that,” so the kanji for winnowing basket 其 was borrowed to write “that.” Since the word basket lost its original kanji, a new kanji was invented by adding 竹 “bamboo” to 其 to emphasize the meaning “basket”: 箕. These are rather common, actually.
It is important to know when the kanji form is not related to its meaning. This is another form of closure. It’s like if you were to do an internet search and then never get a result. It would just leave a nagging feeling in your gut. Getting an answer, even if the answer is “no results for this search” brings closure. It also keeps you from assigning a false meaning to that component and thereby adding noise to the system of meaning representation.
Aspect #5: Restoring the pictorial quality of kanji components.
Due to the high degree of stylization in modern kanji, much of their pictorial quality has been lost. However, if you have the chance to compare the modern forms, especially on a component level, to their ancestral forms, much of that pictorial quality can be restored. Or even better, have someone else (uh...*coughs* someone like Outlier!) do the hard work for you and show you the main nodes of the form’s evolution with an accompanying explanation, then it becomes very easy to see. Here are a few examples:
又 “again; furthermore”
(1a) is a fairly typical picture of a hand from the oracle bone script. It’s also important to note that the ancient Chinese didn’t have three fingers. Rather than being an actual picture of a hand, this is a pictorial-like symbol that captures the important features of a hand (and it’s easier to draw than an actual hand). Though the orientation and space between the fingers vary, forms (1b) through (1e) are essentially the same. It’s the bending of the top finger that starts in (1f), which is then connected to the middle finger in the modern form 又 that makes the modern form difficult to recognize as a hand.
So, the form of “a right hand” was used to express the original meaning “right hand.”
It came to mean “again; furthermore” by way of sound loan. In other words, the sound of the word “right hand” was the same or similar to the sound of another word that meant “again; furthermore.” The word “right hand” came to be written 右 “right-hand side,” which came to indicate “right-hand side” instead of “right hand.” The 口 is actually just a mark to distinguish 又 from 右, so it does not express a meaning or sound here. Note that the 𠂇 in 右 and 左 “left-hand side” have different origins. The 𠂇 in 左 was a picture of a left hand: .
並 “and; besides”
Forms (2a) and (2b) show a front view of two people standing (立) shoulder to shoulder. The line on the bottom represents the ground. (2c) adds another line, probably for beautification, since it’s not adding a meaning or sound. In (2d), the two 立 are written so close together that they are touching, giving someone the idea for making their arms a single line as seen in (2e). Note how there is a trade off between ease of understanding and ease of writing. (2a) through (2c) represent their meaning very clearly, while (2e) is the easiest to write, but completely opaque, unless you know the earlier forms.
The form “two people standing shoulder to shoulder” was used to represent the original meaning “to stand shoulder to shoulder.” From this meaning evolved the meanings “to put together,” “to put on par with (i.e., to stand as equals)” and “simultaneous (i.e., to happen side by side in time).” The meanings “and” and “besides” most likely evolved from the meaning “to put together.”
秉 “to grasp, hold”
In (3a), the left side is a stalk of grain, and the right side is a hand about to grab it for the harvest. The left side of (3b) is a more simple looking stalk of grain that is being grabbed by hand, while in (3c) is a more stylized version of the (3b). The form “a hand grasping a stalk of grain” was used to represent the original meaning “to grasp grain with the hand,” which evolved to mean “to grasp or to hold” in general.
兼 “concurrently; and”
If you compare (4a) to (3b), you’ll see that (4a) is a single hand simultaneously grabbing two stalks of grain. (4b) and (4c) are more stylized versions of (4a). The form of “a hand simultaneously holding two stalks of grain” was used to represent the original meaning “to put together” and later derived the meanings “compatible (i.e., two things that fit together)” and "simultaneous (i.e., two things done at the same time).
監 “to supervise, oversee”
The left side of (5a) is a vessel (i.e., a container of some kind), while the right side is a person sitting in a kneeling position. The eye is exaggeratedly large to emphasize the idea of inspection. In (5b), the person appears to be standing up and leaning over the vessel, which has water in it. Ancient people used water as for its mirror-like quality to inspect their own faces. Notice that the eye is now unattached to the body (this type of corruption is called “disintegration” and it is explained in an earlier post). (5c) is a more stylized version of (5b). So, the form “a person with an exaggerated eye looking down into a container of water” is used to express the original meaning “to look downwards and inspect.” Later, this meaning evolved to the more general “supervise, oversee.”
Aspect #6: The full story:
This is the “real” story of how any given kanji was born and evolved into modern times. This might be hard to believe, but our posts on kanji etymology are far from comprehensive. The real stories behind kanji can get really complex. The first time I was introduced to actual paleography and saw pre-Qin dynasty kanji, I was shocked. So shocked, that I realized that if I want to do a dictionary of kanji that I absolutely cannot do so without first getting some serious training in paleography. The “real” story is most certainly not for the faint at heart. It’s full of twists and turns, and all manner of crazy phenomena that one would never conceive of just having seen modern kanji. To research the full story, it takes a lot of hard work and a lot of training (I’ve been acquiring the training to do our dictionary since 2006). The trick is to understand how kanji actually work, use the knowledge of kanji evolutions that are well understood in order to research out those that aren’t, follow the evidence and not be swayed by your own personal biases.
How etymology is used in the Outlier Kanji Dictionary:
There are two editions of our dictionary: Essentials and Expert.
The Essentials Edition contains everything you need to know about 3000 kanji without overwhelming you with extra detail. It’s great for students of all levels and backgrounds. It contains exactly what’s required in order to master a given kanji: a form explanation, component breakdown, pronunciation, meanings, example vocabulary, and stroke order for each kanji.
The Expert Edition is for those with an inquiring mind (a.k.a., crazy people like us!). The Expert Edition is perfect for students who enjoy being able to dive deep into the history and etymology of the writing system. All of the Essentials information appears in the main entry, and the Expert Info is just one tap away! This is the ultimate tool for kanji etymology enthusiasts. It includes a highly condensed version of the “real” story that focuses on the parts of that story that will help you better understand how that kanji came to be in a way that aids in learning. It shows the basic evolution of the kanji form and throws in other interesting tidbits.
Make sure to check out this demo of the dictionary to get an idea of what it actually looks like!
]]>By Ash Henson
So, what is kanji etymology? It’s usually defined something like “the story of a kanji’s origin and development,” but that definition doesn’t give the full picture. Be that as it may, if we accept this definition for the moment, the first question someone learning Japanese should ask is “How much of that story do I need to know to effectively learn kanji?” There is no single answer to this question, because there is no single way to learn anything. There are many different learning styles, each learner has a unique background and way of viewing the world, etc. Having said that, there are also similarities in how we learn and there are principles of effective learning that apply to us as human beings. As such, my answer to the question of how much of the story you need to know is: it depends. I’ve identified six aspects of kanji etymology that are pertinent to learning kanji:
#1: Identifying the functional components in a given kanji.
#2: Identifying how the functional components function.
#3: Identifying corrupted components.
#4: Identifying the meaning that a given kanji was invented to represent and its relationship to the kanji form.
#5: Restoring the pictorial quality of kanji components.
#6: The full story.
In this post, I talk about aspects #1 to #3. That is, I explain what each one is and its level of importance. Aspects #4 to #6 will be discussed in part 2 of this post (it’s a constant battle to keep posts short!). As it turns out, there is a core of things one needs to know about a given kanji in order to learn it effectively, and then there are things which are interesting (well, to me anyway!) to know, but not strictly speaking necessary.
According to the Outlier philosophy, the goal of kanji learning is predictive ability and long-term recall. Predictive ability refers to when you come across a new kanji within a meaningful context, being able to make intelligent guesses about the range of sounds and the range of meanings that kanji might have. Ideally, you could use that knowledge to make a connection with a word or words in the spoken language. Long-term recall refers to the ability to recall a kanji form long after it’s been learned by way of understanding how kanji as a system represent sound and meaning. Since spoken words are combinations of sound and meaning, you can use these two clues in conjunction with understanding kanji on a systemic level to pluck your memory strings and recall a kanji’s form. This is accomplished by understanding the functional components of each kanji and by understanding how kanji work on the system level.
Studies have shown that native speakers have an intuition about how a given unknown kanji may sound or what it may mean, but they often find it difficult to articulate. This intuition comes from learning thousands of kanji. It is a reflection of the logic inherent to the kanji on a system level. And, it is imperfect. It also takes a long time to acquire. In the Outlier Kanji Dictionary, our aim is to instill the abilities required for long-term recall and predictive ability from day one, but that can only be done if we understand kanji on their terms, not on ours. Now, let’s look at the first three aspects of etymology:
Aspect #1: Identifying the functional components in a given kanji.
This is by far the most important aspect of etymology. Knowing how a kanji represents sound and meaning is the basis for understanding kanji as a system. It is also the basis for being able to detect real (as opposed to superficial) relationships between kanji: sound and meaning relationships. And, last but not least, it’s the key to understanding individual kanji. So, understanding what a kanji’s functional components are is crucial for all learners and they (in combination with Aspect #2) are what makes predictive ability possible to beginners (when you need it the most!). And, while they aren’t the only route to long-term recall, they are the most effective and they have the most positive side-effects.
Example: The functional components for 識 “to know” are 言 “speech” and 戠 “to gather together.” 言 is the meaning component and 戠 is the sound component (check out this article for more about the different types of functional components). People often times view this kanji as 言 + 音 + 戈 and then create a story to combine the meanings “speech” + “sound” + “halberd,” but doing so hides the sound connections between 識 and other kanji that share the sound component 戠: 職, 織 (also ショク), 幟. Creating a new story for how meaning is represented in this kanji (i.e., by breaking it into parts that aren't giving a meaning, then assigning a meaning to them) not only obscures the real way meaning is expressed, it gives a false impression as to how kanji represent meaning in general. Not to mention, an infinite number of stories can be created for any one kanji, but only a story based upon the functional components will bring the benefit of seeing (from an early stage) the real sound and meaning connections between kanji.
Aspect #2: Identifying how the functional components function.
In other words, how sound components express sound and how meaning components express meaning. While knowing what the functional components are in a kanji is very important, so is understanding how they function in that kanji. For instance, most people do not distinguish between a component expressing meaning by way of its meaning (i.e, meaning components) vs. expressing meaning by way of form (i.e., form components).
What does that mean exactly? Each functional component has three attributes: form, meaning and sound (or pronunciation). Take 自 “self” for example. Its form is a picture of a person’s nose. Its meaning is “self” and its sound is ジ. If 自 expresses meaning by form (i.e., it’s a form component), then the meaning it expresses has to do with “nose,” as in 息 “to breathe,” 鼻 “nose,” 臭 “to stink,” and 嗅 “to smell.” So, in the kanji 息, 鼻, 臭, and 嗅, 自 is a form component.
Kanji with meaning components appeared rather late in the game and as such, they are small in number.
Ex. 歪 “not straight, crooked (literally 不正).” It’s easily seen that this kanji is based upon the combination of the meanings 不 “not” and 正 “straight.” 不’s form is either “part of a plant” or “roots of a plant” and 正’s form is “feet marching towards a city.” These two forms obviously have nothing to do with the meaning of 歪. So, in 歪, 不 and 正 are meaning components. Most semantic components give meaning by form and only a minority give meaning by meaning, yet, most people interpret kanji to all be meaning components. Worse yet, they don’t consider the original meanings of the components, but depend rather on their modern meanings. This way of thinking is almost guaranteed to be inaccurate (read: do not help with predictive ability, long-term recall or seeing real connections between kanji).
Understanding how components express meaning and sound is very important to understanding both how individual kanji work and how kanji work as a system.
Aspect #3: Identifying corrupted components.
Technically speaking, this should be part of Aspect #1, but since most people aren’t familiar with this concept, I’ll handle it as a full aspect. Kanji corruption means that a “kanji changes form in such a way that the original form intended by the inventor of that kanji is altered.” In other words, things aren’t always what they seem. There are several advantages to knowing when a component is corrupted or not.
1. To clear up misunderstandings and answer questions such as:
2. To give your mind closure. If you know that a given component is a corruption, you know that it is not adding a sound and meaning to the kanji. There’s no need to look any further for a form explanation.
3. To give you a clearer understanding of how modern kanji work. Predictive ability and long-term recall come quickest and most efficiently by understanding kanji on an individual as well as a systemic level. If you reinterpret corrupted components with your own meaning, you’re simply adding noise to the system.
But isn’t this just making the whole thing more complicated and harder to learn? I would argue no. The number one rule for memorizing anything is understanding. The more you understand the object of learning, the easier it is for you to remember that thing. Understanding which kanji components are corrupted is increasing your understanding. And, it’s not necessary to know the whole story behind the corruption. The main thing is knowing that the corrupted component is not giving a sound or meaning to the kanji.
Some examples:
Ex. 高 “tall”: It’s enough that you know that 亠 “lid, cover,” 口 “mouth” and 冋 have nothing to do with why 高 looks the way it does. It is actually just a picture of a tall building and has a 口 on the bottom to distinguish it from 京, which is also a picture of a tall building. Of course, you do need to remember that these components are necessary for correctly writing 高, but you can’t use them to understand why 高 looks the way it does.
Ex. 粦 “ghost fire”: Since 粦 is only used as a sound component in modern Japanese, it’s enough to know how to write it and its pronunciation. However, understanding what it was originally a picture of is interesting to some people (and can aid in remember how to write it). If you’re one of those people (I know I am!), then you need to remember that 米 “rice” isn’t giving a meaning or sound, it’s merely a placeholder for a picture of a corpse that is on fire (okay, so knowing the real story isn’t always more pleasant!) and that 舛 is a picture of two feet.
Keep an eye out for Part 2!
Kanji form from 小學堂. ↩
Understanding Corruption in Kanji (part 2), by
Looking at the Etymologies of 黒, 粦, 無, and 舞
By Ash Henson
First, a note on our use of the term “corruption.”
In the previous post in this series, the notion of corruption was introduced as well as one of the main reasons it occurs: the process of writing itself (杜忠誥 2002:32)1. Now, we’ll start looking at some of the types of corruption that occur. Specifically, we’ll be looking at the first of the eight types of corruption identified by Prof. Tu Chung-kao (杜忠誥) via exploring the etymologies of some common kanji.
Type 1 Corruption: Disintegration
Corruption by way of Disintegration is defined as a kanji form or the form of one of its components that was originally a single piece being broken into two pieces (2002:53)2. Let’s take a look at the etymologies of 黒 & 粦 to get a better understanding of what this means.
Example 1: 黒 “black”
Above are two early forms of 黒. (1a) is from Shang dynasty (1700 to 1100 BCE) oracle bone inscriptions (甲骨文), while (1b) is from the early Zhou dynasty3 (1059 to 255 BCE). (1a) and (1b) are pictures of the front view of a person whose face had been tattooed as punishment4 (墨刑) for committing a crime (2002:54)5. The body is the same form as 大 (a picture of an adult person from the front), with the head exaggerated. According to Chinese History: A New Manual by Endymion Wilkinson6, this was one of the "five punishments (五刑)"7. A person receiving 墨刑 had the name of their crime tattooed on their body. If the tattoo was on the face, as it is here with 黒, it was called 黥面. Forms from the Spring and Autumn and Warring States periods sometimes added dots, such as can be seen in (1c) and (1d)8 below (note that in pre-Han dynasty scripts, kanji change and variation was extremely common and also varied by geographic area):
According to Prof. Tu, the addition of dots here was to reinforce the notion of being marked by the 墨刑 tattoo (2002:55).
The bottom of 黒 as stated above was originally 大. During the Eastern Zhou, 大 became more stylized and was written as follows9:
Forms (1e) to (1h) are very typical ways of writing 大 during Warring States; and as can be clearly seen, 大 has already broken into two parts. Forms (1e) and (1g) are still almost a single piece, while forms (1f) and (1h) show a significant amount of separation. What caused this change?
Form (1i) is already very close to the modern form. (1k) is the typical Warring States form similar to forms (1e) to (1h) above. (1j) is a superimposition of (1i) in red, onto (1k) in black, showing that these forms are still essentially the same. The most likely cause for this change is convenience and ease of writing. Though (1k) has one more stroke, it’s actually easier to write and can be written in a more fluid motion (because the way the brush is lifted between strokes is less laborious). At the same time, we can also see how forms can get broken into smaller pieces.
The addition of the extra dots as seen in the forms , combined with writing 大 in two separate parts, like above, caused the bottom of 黒 to look very similar to 炎 “the top of a burning flame.” This is readily seen in (1l), which is the form that appears in the Setsumon Kaiji 説文解字.
In forms (1m) and (1n), the top 火 “fire” gets straightened out, while the bottom retains its 火-ness. In (1n), the top parts looks like 里, while the bottom looks like a taller version of 灬, the form that 火 usually takes in modern kanji when it appears at the bottom of a kanji. (1o) looks mostly like the modern form. Had 黒’s form not become corrupted, it would probably look like now. The original form , a single “piece” being corrupted into , which has multiple parts, is an example of corruption by way of Disintegration.
So, 黒 originally depicted a person with a tattooed face. Dots were added to emphasize the tattoo. The dots, along with the person’s body, separated over time and corrupted into 炎. The top 火 “straightened out” and combined with the face to resemble 里, while the bottom 火 became 灬. Thus, the 灬 in the modern form is simply a corruption of an earlier form, and actually has nothing to do with fire.
And if you’re wondering how we boil this information down in our dictionary, here’s a screen shot:
Example 2: 粦 “friar’s lantern, ignis fatuus, jack-o’-lantern”
Another example of corruption by way of disintegration is 粦, which is phonetic in kanji such as 隣 “neighbor,” 憐 “pitiful,” and 麟 “(Chinese) unicorn; giraffe.” If you aren’t familiar with any of those terms given as the definition for 粦 above (I wasn’t), Dictionary.com gives this definition: “A pale flame or phosphorescence sometimes seen over marshy ground at night. It is believed to be due to the spontaneous combustion of methane or other hydrocarbons originating from decomposing organic matter.” In Japanese this type of fire is called 燐火.
The earliest form for 粦 is the oracle bone (甲骨文) form (2a) and the bronze inscription (金文) (2b). This is a picture of a person represented by 大 , with dots (or small lines) above and below each arm. Notice the difference at the bottom. Form (2a) doesn’t have the feet explicitly drawn, while (2b) does.
Side note: Getting off on the right foot:
An interesting thing about the feet is, in the oracle bone and bronze inscription scripts, there are many kanji which have variants that have feet, but the meaning expressed is the same whether the feet are there or not. We know this because of the meaning they are used to express in a given context. A good example is 無 “not, none” & 舞 “dance.”
無 is the original form of 舞. Its earliest forms (2c) and (2d)10 are pictures of a person doing a rain dance, holding either ox tails or bird feathers as ornaments (季旭昇 2004:470)11. It originally meant “rain dance” and later came to mean just dancing in general. It wasn’t until after the Western Zhou dynasty that 無 was borrowed by way of sound loan to mean “not, none” (2004:493). The meaning “dance” was then represented by adding feet (舛) to the 無 form, producing 舞, as seen below in (2e):
So, these two variant forms (at one time or another) both represented the meaning “rain dance” or “dance,” as such it can be seen that the addition of feet didn’t affect the meaning.
A quick detour en route back to the dots:
As to the explanation for the dots (or little lines), first we must take a rather gruesome detour. Rotting corpses produce phosphine gas. According to Wikipedia: “Phosphine gas is more dense than air and hence may collect in low-lying areas. It can form explosive mixtures with air and also self-ignite.” The Setsumon Kaiji’s definition for 㷠 (粦) is “㷠 (粦) is the blood of dead soldiers as well as that of their cows and horses. 㷠 is ghost fire.”12 So, basically, the rotting corpses of soldiers, cows and horses on a battlefield release phosphine gas and under hot conditions, the phospine gas may self-ignite. When it does, it gives off a green colored flame, hence the name 鬼火 or ghost fire.
Now back to the dots (for real):
Returning to the explanation for the dots, Shirakawa Shizuka (白川静) believes the dots to represent blood13, while Chi Hsiu-Sheng (季旭昇) sees them to be the actual flames of the ghost fire. In the images below, you can see that dots have been used to represent both fire (火) and blood (血):
From a kanji form analysis perspective, both explanations are reasonable, since each has a precedent. However, from a meaning perspective, the fire explanation has the upper hand because it is a more direct explanation that is tied directly into the original meaning, basically “pale green flame.” Blood is also related in the sense of the role it plays in gruesome, bloody battlefields, which are one of the places where the “ghost fire” appears, but its role is indirect, whereas the fire explanation is direct. As such, the dots representing fire is the better explanation.
In the form that appears in the Setsumon Kaiji, the top part has already been corrupted into 炎 “flame” by roughly the same process that corrupted the bottom of 黒 into 炎. The fact that the meaning of 㷠 has to do with fire most likely also played a role (even though flames don’t usually have feet).
Forms (2h) - (2k) are Han dynasty forms of 隣 & 隣 – variant forms, both pronounced lín (杜忠誥 2002:67). Interestingly, the left half of (2h) is still very close to the original form (2b) above. The left half of (2i) is close to the version that appears in the Setsumon Kaiji, while the top of the right half of (2j) has been corrupted into 土 with dots above it. The (2k) form is basically the same as the modern form: the top part has been corrupted into 米 (rice), while the feet (舛) on the bottom are still intact.
So, to recap, the original form (on the left above) was a single, unified form. By the time of the Setsumon Kaiji (middle form), it had already been broken into several pieces and the modern form (on the right) remains in separate pieces, with the further corruption of 炎 into 米. So, what was originally a picture of a person on fire became corrupted into a flame with feet, then into uncooked rice with feet! In Outlier terms, 米 is an empty component (i.e., doesn’t give a sound or meaning; it’s like a placeholder for an older form) for the original form of a burning body.
In a future corruption post, we’ll discuss the 2nd type of corruption: Corruption by way of Connection, where we’ll visit the etymologies for 折 & 制.
Before we begin, let's define the special use of the word "corruption" as it is used in this article. The use of "corruption" is reserved for kanji form changes that result in a degredation of the ability of a kanji to express sound and meaning. It is a translation of the Chinese term 訛變.
Important points:
1. It does not describe all kanji form changes, only ones that result in a degradation of meaning/sound representation.
2. It is not meant to convey the idea of getting back to a previous perfect state. It is merely saying that an unhindered ability to represent a sound or meaning is better than the lack of such ability.
Take a specific example, 做:
做 is derived from 作+攵 or 亻+𢼎. Let's take a look at the sounds of the parts:
乍 サ、 サク
作 サ、 サク
做 サ、 サク、 ソ
故, 古 コ
In the modern form, the middle part 古 was originally the sound component 乍. It changed from 乍 to 古 as the result of graphical confusion. Now, you have 做 with a structure that makes very little sense. It is not related to 故 nor 古 in any meaningful way (i.e., it does not give the sound コ. It does not give a meaning related to 故 or 古.) Yes, corruption has a negative connotation, but that's an accurate description. 乍 gives a sound in 作. 做 is an a corruption of 作+攵. 古 does not give sound or meaning in 做. 做 has lost part of its ability to express sound.
Prescriptivism vs. Descriptivism
We are not prescriptivist. I, myself, come from Texas and pronounce the word "get" as if it rhymed with "sit" (not "set"). I like my pronunciation even though it is not standard. When I learn foreign languages, I'm okay with making mistakes as long as they are mistakes that a native speaker would make. We are seeking to describe linguistic phenomena, not tell everyone what to do.
Since written records are only incomplete reflections of spoken language and since they appear late in history, it makes no sense to talk about "the original meaning" of a spoken word. It also doesn't make sense to say that semantic, syntatic, phonological change is necessarily bad. Language change is necessary and happens in all languages at all times (though at varying speeds). The use of "corruption" to describe a character form that has lost some or all of its ability to record sound and meaning is in no way similar to describing regular language change as being a "corruption."
The notion of corruption is relevant to language learning
According to memory experts, the number one rule for effective memorization is understanding the thing you are trying to remember. Kanji corruption is one of the major reasons for empty components (components that neither express sound or meaning in a kanji). If you go giving a meaning to every empty component, you add noise to your learning system. Any kanji has an infinite number of possible stories, but only the real story will help you see the overall semantic and sound patterns that kanji express. Knowing those patterns is useful for both learning and recall.
↩
杜忠誥,《說文篆文訛形釋例》,台北市:文史哲出版社,2002年。 ↩
My definition here is a more general version of Tu Chung-kao’s definition. His original definition: 凡商、周以來一脈相承的古篆文字,其形體或部件,本當連合為一體的,在《說文》篆文中,則離析或斷裂為二,因而導致字形與原義之乖離者,是為「離析之訛」。And my (quite literal) translation: If any kanji or component forms for any ancient Seal kanji since the Shang and Zhou dynasties, derived from the same origin, that used to form a single unit, but within the Seal kanji forms in the Shuōwén have disintegrated or broken into two, and consequently cause the kanji form and its original meaning to become separated, this is Corruption by Way of Disintegration (2002: 53). ↩
Images taken from 季旭昇2004年《說文新證‧下冊》,藝文印書館印行,第113頁。 季旭昇 labels them as 1 and 2. ↩
This explanation was first put forth by Tang Lan (唐蘭). ↩
Prof. Tu’s original description: 象顔面被墨刑之人的正面形(2002:54)。 ↩
Endymion Wilkinson, 2012. Chinese History: A New Manual. Cambridge: Harvard University Press. ↩
The other four punishments were: 劓 “cutting off the nose,” 刖 “cutting off one or both feet,” 宫 “castration,” and 大辟 “the death penalty” (Wilkinson 2013:311). Cheery bunch, the ancients. ↩
(c) and (d) are taken from 杜忠誥:2002, page 53. Our (c) is his 7 and our (d) is his 9. ↩
These images come from 杜忠誥:2002, page 53. Our (e) is his 1, (f) his 2, (g) his 3 and (h) his 5. ↩
Images taken from 季旭昇2004年《說文新證‧上冊》,藝文印書館印行,第470頁。 季旭昇 labels them as 2 and 3. ↩
The original description:「人持牛尾、鳥羽等舞具跳舞求兩;引伸為一切跳舞」(English translation is mine). ↩
This is my own translation. 《說文》「兵死及牛馬之血為㷠,㷠、鬼火也。」 ↩
“(They) represent the image of being dripping wet with blood”「表示鮮血淋漓之象」(my translation) (杜忠誥2002:67). ↩
Looking at the Etymologies for 面 and 友
By Ash Henson
First, a note on our use of the term “corruption.”
Note: In this article I use onyomi (音読み) exclusively to indicate the pronunciation of kanji. This is simply a convention we use for the sake of consistency, and because as the Chinese-derived pronunciations, the onyomi are the only ones relevant when discussing kanji formation.
In our kanji dictionary (you can reserve your copy on Kickstarter until 22 June!) we use the term “empty component” to refer to components which don't indicate meaning or sound in a particular kanji (click here to learn about the other types of components). One of the primary reasons for the existence of empty components is kanji corruption, so we thought it would be interesting to talk about that a bit.
This is the first of two posts on the topic of kanji corruption. If you really want to understand how kanji work, you cannot over look corruption and the role it plays. We're not saying learners need to know this stuff to learn Japanese, but that we as researchers do in order to explain how kanji work, and we thought you might think it's interesting!
So what does it mean for a kanji to become corrupt? Basically, it means that the kanji changes form in such a way that the original form intended by the inventor of that kanji is altered.
Take 面 “face” for example:
The oracle bone form (a) is a picture of an eye inside of a larger frame – a face. According to Li Xiaoding (李孝定), of the face’s sensory organs, the most representative of a face is the eyes, hence form (a). Form (b) is from a Qín dynasty excavated text, where the eye 目 has been replaced with head (an earlier form of 首). Forms (d) - (g) are from Hàn dynasty 漢朝 steles (stone tablets) (杜忠誥2002:138-143). In (d), you can see that the top line has really been exaggerated and this is the origin of the top stroke on the modern form of 面. Since this stroke was not intended by the inventor(s) of this kanji, it is a form of corruption. An uncorrupted form of 面 may have looked like this: .
Sometimes these changes that occur via corruption are neutral; in other words, they don’t affect the kanji’s functional components (sound components or meaning components), but oftentimes corruption actually causes damage to a kanji’s ability to express sound and/or meaning.
Why is this important?
Why is kanji corruption important? One reason is that if you’re trying to understand a kanji form, and part of that kanji is corrupted, then any explanation you give to it (other than that it’s the result of corruption) is going to be inaccurate. Another major reason has to do with being able to spot spurious etymologies (there will be a future post dedicated solely to explaining how to spot spurious etymologies). If any given author or book never mentions kanji corruption in their etymological explanations, chances are very good that you are reading or hearing spurious etymologies. That’s not to say that all kanji are corrupted, but a significant amount are. Let’s take look at the etymologies of some common kanji to better understand the different ways that kanji can become corrupted.
I’m going to be following along Tu Chung-kao’s (杜忠誥) book Examples of Corrupted Forms in the Shuōwén’s Small Seal Script1 [《說文篆文訛形釋例》] 2 since he does an excellent job of outlining the different types of corruption. According to Prof. Tu, one of the main reasons for kanji corruption is the actual process of writing kanji.
When manuscripts are being copied by hand, it is easy for mistakes to happen either because the manuscript being copied isn’t clear to begin with or if the scribe isn’t being particularly careful3.
Corruption by way of writing (i.e., copying manuscripts)
Prof. Tu gives an example related to 友 “friend(s)”:
友 (oracle bone script: 4) was originally a picture of two right hands together (two 又), indicating friendship. 又 yòu also acts as a sound component.
The Setsumon Kaiji (説文解字) lists 5 as one of 友’s ancient forms (古文)6. Though it looks very similar to 習 “to review”, the two are not related. actually evolved from this Bronze Inscription 金文 form 7, which was also used in the Chu 楚 script of the Warring States period, as can be seen in these examples:
(I love the Chu script!)
According to Chi Hsiu-sheng (季旭昇), the bottom half of is 一 and 白 (but pronounced like 自, not 白), which is a corruption of an earlier 甘.9 Had it survived into modern times, it may have looked like this 10, the 甘 “sweet” component presumably emphasizing the pleasurable feeling of having a good friend. Prof. Tu shows a possible path of the corruption from “two hand” to “wings” in the ancient (古文) form of 友:
Each step in this diagram shows a step towards corruption. (a) and (b) are still easily recognizable as a pair of hands, but then the roundness of the outer fingers becomes more and more square in (c) and (d), such that (d) is already completely square and looks like something in between the “two hands” form and the form for “wings” 羽. Later scribes then interpret it to be something similar to “wings” and help it along by making it look more like “wings”, until finally in (e) and (f), all resemblance to “two hands” is lost. This is one of the ways that kanji corruption happens (see 杜忠誥2002:33-34).
Summary
By looking at parts of the etymologies for 面 and 友, we learned a little about one of the most common reasons for kanji corruption: the process of writing itself. Stay tuned for our next post which will explore the various types of kanji corruption by way of explaining the etymologies of 黒, 無, 舞, and 粦!
Footnotes
Before we begin, let's define the special use of the word "corruption" as it is used in this article. The use of "corruption" is reserved for kanji form changes that result in a degredation of the ability of a kanji to express sound and meaning. It is a translation of the Chinese term 訛變.
Important points:
1. It does not describe all kanji form changes, only ones that result in a degradation of meaning/sound representation.
2. It is not meant to convey the idea of getting back to a previous perfect state. It is merely saying that an unhindered ability to represent a sound or meaning is better than the lack of such ability.
Take a specific example, 做:
做 is derived from 作+攵 or 亻+𢼎. Let's take a look at the sounds of the parts:
乍 サ、 サク
作 サ、 サク
做 サ、 サク、 ソ
故, 古 コ
In the modern form, the middle part 古 was originally the sound component 乍. It changed from 乍 to 古 as the result of graphical confusion. Now, you have 做 with a structure that makes very little sense. It is not related to 故 nor 古 in any meaningful way (i.e., it does not give the sound コ. It does not give a meaning related to 故 or 古.) Yes, corruption has a negative connotation, but that's an accurate description. 乍 gives a sound in 作. 做 is an a corruption of 作+攵. 古 does not give sound or meaning in 做. 做 has lost part of its ability to express sound.
Prescriptivism vs. Descriptivism
We are not prescriptivist. I, myself, come from Texas and pronounce the word "get" as if it rhymed with "sit" (not "set"). I like my pronunciation even though it is not standard. When I learn foreign languages, I'm okay with making mistakes as long as they are mistakes that a native speaker would make. We are seeking to describe linguistic phenomena, not tell everyone what to do.
Since written records are only incomplete reflections of spoken language and since they appear late in history, it makes no sense to talk about "the original meaning" of a spoken word. It also doesn't make sense to say that semantic, syntatic, phonological change is necessarily bad. Language change is necessary and happens in all languages at all times (though at varying speeds). The use of "corruption" to describe a character form that has lost some or all of its ability to record sound and meaning is in no way similar to describing regular language change as being a "corruption."
The notion of corruption is relevant to language learning
According to memory experts, the number one rule for effective memorization is understanding the thing you are trying to remember. Kanji corruption is one of the major reasons for empty components (components that neither express sound or meaning in a kanji). If you go giving a meaning to every empty component, you add noise to your learning system. Any kanji has an infinite number of possible stories, but only the real story will help you see the overall semantic and sound patterns that kanji express. Knowing those patterns is useful for both learning and recall.
↩The English translation here is my own. ↩
杜忠誥,《說文篆文訛形釋例》,台北市:文史哲出版社,2002年。 ↩
This obviously doesn’t apply to the process of listening and copying, which is subject to its own set of problems. ↩
This oracle bone form was taken from Academia Sinica’s 小學堂 (http://xiaoxue.iis.sinica.edu.tw/). ↩
This form was taken from Academia Sinica’s 小學堂 (http://xiaoxue.iis.sinica.edu.tw/). ↩
The Setsumon defines ancient forms (古文) as all characters created before the forms that appear in the Shǐzhòupiān [史籀篇] (according to tradition was written during the reign of King Xuān of Zhōu (周宣王; 827 to 782 BCE)). ↩
This character comes from the 毛公旅方鼎. The digital image used here was taken from Academia Sinica’s 小學堂 (http://xiaoxue.iis.sinica.edu.tw/). ↩
#1 comes from 江陵天星觀1號墓卜筮簡, #2 from 荊門郭店楚墓竹簡‧六德 and #3 from 荊門郭店楚墓竹簡‧語叢3; their digital images come from Academia Sinica’s 小學堂 (http://xiaoxue.iis.sinica.edu.tw/). ↩
季旭昇《說文新證》上冊,藝文印書館印行,第196-197頁。 ↩
Note that the 甘 “sweet” and 曰 “to say” forms are very similar and are often confused for one another historically. Both forms derive from 口 “mouth”, showing something in the mouth; “something sweet and pleasant” for 甘 and a “symbol showing movement” for 曰。季旭昇《說文新證》上冊,藝文印書館印行,第379-381頁。The Chu forms show above contain 曰. ↩
There’s a huge misconception about how kanji work. You see this sort of advice all the time: “Kanji are made up of radicals, so you should learn the radicals first,” or “Make sure you learn the radicals. They’re the building blocks of kanji.” This is not true. People who say this are well-intentioned but ill-informed about the nature of the kanji.
]]>Why you should think of kanji in terms of functional components.
By John Renfroe
I know this advice is going to rub some people the wrong way, but hopefully by the end of the article you’ll understand why I say this: radicals are of little use for learning how kanji work. Their purpose is indexing kanji in a dictionary.
There’s a huge misconception about how kanji work. You see this sort of advice all the time: “Kanji are made up of radicals, so you should learn the radicals first,” or “Make sure you learn the radicals. They’re the building blocks of kanji.” This is not true. People who say this are well-intentioned but ill-informed about the nature of the kanji.
The word “radical” is best understood as “a kanji component that sometimes plays the role of radical,” NOT “a kanji component that has the nature of being a radical”. For example, 大 【ダイ】 “big” is a component that is on the list of radicals, but that doesn’t mean that 大 is always the radical when it appears in a kanji. A single kanji only has a single radical, no matter how many kanji components it has. And since the choice of which component will play the role of radical is up to the editor of a given dictionary, it may be different in different dictionaries (and may differ between Chinese and Japanese!). And yes, many of the components on the list of radicals do show up a lot in kanji and therefore should be learned, but they should be learned as part of a system of functional components — components which express sound and meaning.
So, if you’re talking about radicals, the conversation should focus on dictionary lookup. If you’re talking about how kanji work, or about etymology, then it should be about semantic and sound components. Getting the terminology straight helps to prevent confusing statements like “radicals are the building blocks of kanji.” They're not. Functional components are.
The concept of radical, or 部首 【ブシュ】 (bushu), didn’t even exist until after the publication of the Setsumon Kaiji (説文解字 【セツモンカイジ】; Shuowen Jiezi in Chinese) in 100 CE, at which point the writing system had already been around for well over 1500 years. The vast majority of kanji in use today were invented before the Setsumon. Read that again and let it sink in. If that’s the case, then there’s no way that “radicals” were what people had in mind when they were creating kanji. There must have been something else going on.
So what are radicals, really?
That’s an interesting question. The word “radical” is really a poor translation of 部首 【ブシュ】 in the first place. Bushu literally means “section head.” Following the model of the Setsumon, kanji dictionaries are traditionally arranged into sections containing similar graphic components. These sections are called 部 【ブ】. The first kanji in that section is the section head (部首 【ブシュ】), or the first of the section. Each kanji in that section is filed under one bushu. Note that I didn’t say the kanji “has” one 部首. It’s an important distinction to make. The kanji is filed under a 部 【ブ】, or section. This is a choice made by the editor of a kanji dictionary, not an inherent part of the nature of kanji.
Which section to file a kanji under can be a fairly arbitrary decision. Most people’s understanding is that the bushu gives a hint about meaning and the sound component (声符 【セイフ】) gives a hint about the sound, and that the two are different entities. That’s not always the case. Sometimes, the bushu is the sound component. For example, 刂 (刀 【トウ】, “knife”) is both the sound component and the radical in 到 【トウ】 “to arrive,” but it is not the meaning component. 至 【シ】 is, and it means “to arrive,” just like 到. Intuitively, one would think that radicals are assigned in a consistent manner, but sometimes the way they’re assigned can be very haphazard, as we've seen.
So again, kanji are filed into a given section. This is a choice made by a human being, not an inherent part of the nature of kanji, and it’s a flawed — but workable — system.
So hopefully, you can see that “radicals” (remember: section headings!) are useful for organising and looking things up in a dictionary, but they’re not especially useful for explaining how kanji work.
But there’s a better way
You should look at kanji in terms of their functional components. Kanji components can serve a few different functions, and you need to understand those functions rather than lump them all under one category called “radicals.”
There are three attributes that all kanji have (using 大 as an example):
Form: What is it a picture of? 大 is a picture of a person (specifically, an adult).
Meaning: What does it mean? 大 means big, because adults are big in comparison to children.
Sound: What is its pronunciation? (Or, if it’s a sound component, what is the range of sounds it can represent?) 大 is pronounced ダイ dai in Japanese (I'm using onyomi here, since those readings are the only ones that are relevant when discussing sound components).
The possible functions that a component can have derive directly from these three attributes.
There are three primary functions:
A component can express meaning by way of its form. Example: 大 is a picture of a person, and that is its function in kanji like 美 【ビ】 “beautiful.” 美 is not a “big” 大 “sheep” 羊, but a depiction of a person wearing a headdress (the headdress 𦍌 now resembles 羊, but it's unrelated). This is by far the most common way of expressing meaning.
Other examples of 大 functioning in this way include:
天 【テン】 “heavens” (originally “person with a mark indicating the forehead”)
夫 【フ】 “husband, man”
A component can express meaning by way of meaning. Example: 大 means “big,” and it expresses the meaning “big” in kanji like 尖. This is how most people explain all semantic components, but in reality this function is very uncommon!
尖
As for why “small” over “big” means “sharp,” take a look:
A component can express sound. Example: 大 is pronounced ダイ dai in Japanese, and it originally served as the sound component in the kanji 達 タツ、ダ tatsu, da “to arrive”.
Then there is a fourth function that derives from the way kanji evolved in form over time. A component can also:
Serve as a placeholder for an earlier form that has now been corrupted (note: that article is for Chinese learners, but the content is relevant; an adapted version of the article for people learning Japanese is forthcoming!).
This one is difficult to ascertain without training in palaeography, but the Outlier Kanji Dictionary explains which components have been corrupted and how. Continuing with 大 as an example, there are 1) instances in which a component was originally 大 but has now changed to something else, and 2) instances in which a component started as something else but has corrupted to look like 大 today (that is, you can’t trust your eyes!).
The sound component in 達 is 𦍒 【タツ、ダ】 tatsu, da. The top part today looks like 土 【ド、ト】 “earth,” but it was originally 大, which was then corrupted over time. An uncorrupted version of this component would look like 羍 today.1
The form above is written in small seal script (小篆【ショウテン】). This is what 大 and 土 looked like in small seal, for comparison:
In the kanji 莫【バク、ボ】 (“do not,” but it originally represented the word “sunset,” which is now written 暮【ボ】), what today looks like 大 on the bottom was originally 艸 【ソウ】 “grass” (there was 艸 on both the top and bottom, and the kanji depicted the sun setting behind the grass), which then corrupted over time to look like 大.
So now you’ve seen how the same component can serve completely different functions in different kanji, and how components can become corrupted over time, obscuring their original purpose. Here’s the interesting thing: out of the kanji I’ve just discussed, 大 is only the radical in 天 and 夫. In the others, it’s not, no matter which function it’s serving. The radical in the other kanji is:
尖:小
美:羊
達:辶
莫:艹
Summary
Again, all this is not to say that you should completely throw radicals out the window. They’re good to know, but you should keep in mind what they’re used for: looking up kanji in traditionally-arranged dictionaries. That’s it. They’re not the “building blocks of kanji” (that’s functional components!). They’re an imperfect, man-made system of arranging and looking up kanji in a dictionary. The concept of 部首 didn’t even exist when the vast majority of kanji were being created.
But sound and meaning components did exist. Sound and meaning components are the building blocks of kanji. Sound and meaning components are what people were thinking of whenever they made a new Chinese kanji. When you’re learning a new kanji, thinking in terms of these functional components rather than radicals will clarify a lot of confusing things about kanji. Anything that tells you otherwise is inaccurate and (unintentionally) leading you astray.
𦍒 is also a meaning component. 达 is a picture of a guy walking across the road. The original meaning was “arrive at point b from point a”. 達 is the same thing, but has a guy leading a sheep from point A to point B. ↩
So, how do kanji actually work?
It’s fairly simple, believe it or not. Most kanji are made up of components, and those components can play different roles within the kanji.
]]>By John Renfroe
Note: In this article I use onyomi (音読み) exclusively to indicate the pronunciation of kanji. This is simply a convention we use for the sake of consistency, and because as the Chinese-derived pronunciations, the onyomi are the only ones relevant when discussing kanji formation.
So, how do kanji actually work?
It’s fairly simple, believe it or not. Most kanji are made up of components, and those components can play different roles within the kanji.
Our dictionary breaks each kanji into its components and tells you exactly what each component’s function is.
Most components are related to the kanji’s meaning or sound (or sometimes both!), although some are completely unrelated to the kanji’s meaning or sound.
Let’s take a look at how that works.
You can think of a spoken word as a combination of sound and meaning. Take the word “grass.” Its sound in American English is /græs/ and its meaning is this:
Writing adds another element to the equation: form. “Form” refers to what the writing looks like. So we can say that writing is a combination of sound, meaning, and form.
Put another way, a written word is a form that indicates a sound and meaning (a word). In this case, the form is:
That form indicates the sound /græs/ and the meaning “grass.”
It’s the same with kanji. Let’s look at the three attributes (form, meaning, and sound) for the kanji 大.
So we can say that the form 大 indicates the sound ダイ dai and the meaning ‘big.’ So far so good, right?
As I said earlier, most kanji are made up of components. These components can have different functions, so we call them “functional components.”
The three main types of functional components in kanji are directly related to the three attributes we talked about above: form, meaning, and sound. That’s why we’ve called them form components, meaning components, and sound components. Let’s look at an example of 大 playing the role of each type of component.
Form Component
When 大 shows up as a form component in another kanji, that means its form (a picture of a person) is what’s contributing to the kanji’s meaning. For instance, the kanji 美 (ビ bi, “beautiful”) depicts a person (大) wearing a headdress (which today looks like 羊). You can see that more clearly in the ancient form:
So here the form of 大 (a person) is what’s contributing to the kanji’s meaning. The meaning of 大 (“big”) is irrelevant, as is the sound (dai is unrelated to bi).
Meaning Component
When 大 shows up as a meaning component in another kanji, its meaning (“big”) is what’s contributing to the kanji’s meaning. An example is the kanji 尖 (セン sen, “sharp”). 尖 consists of 小 (ショウ shō, small) over 大 (“big”).
And why does “small” over “big” mean “sharp?”
So here it’s clearly the meaning of 大 (“big”) that’s contributing to the kanji’s meaning. The form of 大 (a person) is irrelevant, as is the sound (dai is unrelated to sen).
Sound Component
When 大 shows up as a sound component in another kanji, its sound (dai) contributes to the kanji’s pronunciation. An example is the kanji 太 (タイ tai, “fat, thick; grand”). Obviously, the sound of 大 (dai) and the sound of 太 (tai) are related.
One Component, Multiple Functions
Sometimes a component can have multiple functions. In the example of 太 “fat, thick; grand” above, 大 “big” is not just a sound component, it’s also a meaning component. But the form of 大 (a person) is irrelevant here.
So we’ve covered the three main categories of functional components. Form components and meaning components can be grouped under a single category called semantic components, since they’re both related to the kanji’s meaning. Sound components are in a category of their own, and are related to the kanji’s sound.
However, there are some components which have nothing to do with the sound or meaning of a kanji. We call those empty components, and we’ll cover those in another post!
]]>