Cantonese (linguistics)

This article is on all of the Yue dialects. For the dialect of Guangzhou and Hong Kong, see Standard Cantonese.

Cantonese (廣東話/广东话, lit. "Guangdong speech", colloquial; 粵語/粤语, lit. "Yu dialect", formal) is one of the major dialects of the Chinese language. It is mainly spoken in the south-eastern part of Mainland China, Hong Kong, Macau, by the Chinese minorities in Southeast Asia and by many overseas Chinese worldwide. Its name is derived from Canton, the former name of Guangzhou, the capital city of Guangdong Province. It is a tonal language.

It is the lingua franca of the overseas Chinese diaspora, spoken by about 70 million people worldwide. While fewer than the nearly one billion of Mandarin speakers, it is rivalled overseas only by the 40 million speakers of Hokkien, or Southern Fujianese dialects, many of whom are located throughout Southeast Asia. Cantonese is most commonly spoken in Hong Kong, the financial and cultural capital of the Cantonese diaspora, and in one form or another in many if not most Chinatowns around the world. For instance, sei yap or siyi (四邑) dialect, from the Guangdong counties where a majority of Exclusion-era Chinese immigrants emigrated, continues to be spoken both by recent immigrants from the mainland and even by third-generation Chinese Americans alike.

Like all other varieties of Chinese, there is plenty of dispute as to whether Cantonese is a language or a dialect. Please see Is Chinese a language or a family of languages? for the issues surrounding this dispute.

Cantonese (粵語/粤语)
Spoken in: China, Singapore, Indonesia, Malaysia, Canada, Australia, New Zealand and other countries where Cantonese migrants have settled.
Region: in China: central Guangdong province(the Pearl River Delta, including Hong Kong and Macau); eastern Guangxi Autonomous Region
Total speakers: 66 million
Ranking: 16
Genetic classification: Sino-Tibetan


Official status
Official language of: Hong Kong, Macau
Regulated by: -
Language codes
ISO 639-1 zh
ISO 639-2 chi (B) / zho (T)


Dialects of Cantonese

There are at least four major dialect groups of Cantonese: Yuehai , which includes the dialect spoken in Guangzhou, Hong Kong and Macau as well as the dialects of Zhongshan, and Dongguan; Siyi (sei yap), exemplified by Taishan (台山 Toisaan, Hoisaan) dialect, which used to be ubiquitous in American Chinatowns before 1970; Gaoyang, as spoken in Yangjiang ; and Guinan (Nanning dialect) spoken widely in Guangxi. However, Cantonese generally refers to the Yuehai dialect.

For the last 150 years, Guangdong Province has been the home of most Chinese emigrants; one county near its center, Taishan (where the siyi or sei yap dialect of Cantonese is spoken), alone may have been the home to more than 60% of Chinese immigrants to the US before 1965, and as a result, Guangdong dialects such as sei yap (the dialects of Taishan, Enping , Kaiping , Xinhui Counties) and what we understand to be mainstream Cantonese (with a heavy Hong Kong influence) have been the major spoken dialects abroad. As more and different kinds of Chinese emigrate, however, the situation is now changing, so that Min (Hokkien, or Fujianese dialect speakers) and Wu dialect speakers are also now heard, as well as Mandarin in increasing numbers from Taiwanese and mainland immigrants.

In addition, there are at least three other major Chinese languages spoken in Guangdong Province—Putonghua, which is official standard Mandarin, spoken in official occasions, used in education, and among the many internal migrants from the north seeking work in the relatively prosperous south; Min-nan (Southern Min) spoken in the eastern regions bordering Fujian, such as those from Chaozhou and Shantou; and Hakka, the language of the Hakka minority, with whom the Cantonese- and Min-speaking majority (or bendi, natives) fought bloody wars during the Qing Dynasty. Hanyu is mandatory through the state education system, but in the household, the popularization of Cantonese-language media (Hong Kong films, television serials, and Cantopop, most notably), isolation from the north, and the economic strength of the Cantonese diaspora ensure that the language has a life of its own. Most wuxia films are filmed originally in Cantonese and then dubbed in both Mandarin or English.


See Standard Cantonese for a discussion of the sounds of Standard Cantonese and pages on individual dialects for their phonologies.

Cantonese versus Mandarin

In some ways, Cantonese is a more conservative dialect than Mandarin. This can be seen, for example, by comparing the words for "I/me" (我) and "hunger" (餓). They are written using very similar characters, but in Mandarin their pronunciation is quite different ("wǒ" vs. "è"), whereas in Cantonese they are pronounced identically except for their tones (ngo5 vs ngo6 respectively). Since the characters hint at a similar pronunciation, it can be concluded that their ancient pronunciation was indeed similar (as preserved in Cantonese), but in Mandarin the two syllables acquired different pronunciations in the course of time.

Cantonese sounds quite different from Mandarin, mainly because it has a different set of syllables. The rules for syllable formation are different; for example, there are syllables ending in non-nasal consonants (e.g. "lak"). It also has a different set of tones. Cantonese is generally considered to have 6 or 7 tones, depending on who is doing the counting, whereas Mandarin has 4 plus a "neutral tone."

Cantonese preserves many syllable-final sounds that Mandarin has lost or merged. For example, the characters, (裔,屹,藝,艾,憶,譯,懿,誼,肄,翳,邑,佚) are all pronounced yi4 in Mandarin, but they are all different in Cantonese (jeoi6, ngat6, ngai6, ngaai6, yik1, yik6, yi3, yi4, si3, ai3, yap1, and yat6, respectively). However, Mandarin's vowel system is somewhat more conservative than Cantonese's, in that many diphthongs preserved in Mandarin have merged or been lost in Cantonese.

There is another obvious difference between Cantonese and Mandarin. Mandarin lacks the syllable-final sound "m"; final "m" and final "n" in Cantonese are merged into "n" in Mandarin, as in Cantonese "taam6" (譚) and "taan4" (壇) versus Mandarin tán, Cnt. "yim4" (鹽) and "jin4" (言) versus Mnd. yán, Cnt. "tim1" (添) and "tin1" (天) versus Mnd. tiān, Cnt. "ham4" (含) and "hon4" (寒) versus Mnd. hán. The examples are too numerous to list.

There are clear sound correspondence s in, for instance, the tones. For example, a fourth-tone word in Cantonese is usually second tone in Mandarin.

Despite the broad area over which Cantonese is spoken, most universities in the US do not and have not historically taught Cantonese, but Mandarin, which is used officially by both the People's Republic of China and Republic of China, and formerly in Imperial China as the court dialect.


There are several major romanization schemes for Cantonese: Barnett-Chao , Gwohngdongwaa Pengyam, Meyer-Wempe , Penkyamp, and Yale. While they do not differ greatly, Yale is the one most commonly seen in the west today. The Hong Kong linguist Sidney Lau modified the Yale system for his popular Cantonese-as-a-second-language course, so that is another system used today by contemporary Cantonese learners. The one advocated by the Linguistic Society of Hong Kong (LSHK) is called jyutping, which solves many of the inconsistencies and problems of the older, favored, and more familiar system of Yale romanization, but departs considerably from it in a number of ways unfamiliar to Yale users. Some effort has been undertaken to promote jyutping, but it is too early to tell how successful it is.

However, learners may feel frustrated that most native Cantonese speakers, no matter how educated they are, really don't understand any romanization system. Apparently, there is no motive for local people to learn any of these systems. The romanization systems are not included in the education system neither in Hong Kong nor in Guangdong province.

Written Cantonese

Colloquial Cantonese is rarely used in formal forms of writing; formal written communication is almost always in standardized hanyu, albeit still pronounced in Cantonese. However, written colloquial Cantonese does exist; it is used mostly for transcription of speech in tabloids, in some broadsheets, for some subtitles, and in other informal forms of communication. It is not uncommon to see the front page of a paper written in hanyu, while the entertainment sections are, at least partly, in Cantonese. The vernacular writing system has evolved over time from a process of modifying characters to express lexical and syntactic elements found in Cantonese but not the standard written language. In spite of their vernacular origin and informal use, these characters have become so important for communication that the Hong Kong Government has incorporated them into a special Supplementary Character Set (HKSCS).

A problem for the student of Cantonese is the lack of a widely accepted, standardized transcription system. Another problem is with Chinese characters: Cantonese uses the same system of characters as Mandarin, but it often uses different words, which have to be written with different characters. At least this is the case in Hong Kong, but in mainland China, Cantonese is written with the exact same characters as Mandarin, though the characters stand for words not actually used in Cantonese. An example may help to clarify this:

The written word for "to be" is 是 in spoken Mandarin (pronounced shì) but is 係 in spoken Cantonese (pronounced hai6). In formal written Chinese, only 是 is used; 係 is only used in classical literature. However, in Hong Kong, 係 is sometimes used in colloquial written Cantonese.

Many characters used in colloquial Cantonese writings are made up by putting a mouth radical (口) on the left hand side of another more well known character to indicate that the character is read like the right hand side, but it is only used phonetically in the Cantonese context. The characters , 叻, 吓, 吔, 呃, 咁, 咗, 咩, 哂, 哋, 唔, 唥, 唧, 啱, 啲, 喐, 喥, 喺, 嗰, 嘅, 嘜, 嘞, 嘢, 嘥, 嚟, 嚡, 嚿, 囖 etc. are commonly used in Cantonese writing. As not all Cantonese words can be found in current encoding system, or the users simply don't know how to enter such characters on the computer, in very informal speech, Cantonese tends to use extremely simple romanization (e.g. use D as 啲), symbols (add an English letter "o" in front of another Chinese character; e.g. 㗎 is defined in Unicode, but will not display in Microsoft Internet Explorer 6.0. hence the proxy o架 is oftern used), homophones (e.g. use 果 as 嗰), and Chinese character of different Mandarin meaning (e.g. 乜, 係, 俾 etc.) to compose a message. For example, "你喺嗰喥好喇, 千祈咪搞佢啲嘢。" is often written in easier form as "你o係果度好喇, 千祈咪搞佢D野。" (character-by-character, approximately 'you, being, there (two characters), good, (final particle), thousand, pray, don't, mess with, him, (genitive particle), things', translation 'You'd better stand there, and don't touch his stuff.')

Other common characters are unique to Cantonese or deviated from their Mandarin usage, they include: 乜, 冇, 仔, 佢, 佬, 係, 俾, 靚 etc.

The words represented by these characters are sometimes cognates with pre-existing Chinese words. However, their colloquial Cantonese pronounciations have diverged from formal Cantonese pronounciations. For example, in formal written Chinese, 無 (mou4) is the character used for "without". In spoken Cantonese, 冇 (mou2) has the same usage, meaning, and pronounciation as 無, differing only by tone. 冇 represents the spoken Cantonese form of the word "without", while 無 represents the word used in Mandarin (pinyin: w) and formal Chinese writing. However, 無 is still used in some instances in spoken Cantonese, like 無論如何 ("no matter what happens"). Another example is 來/嚟, which means "to come". 來(loi4) is used in formal writing; 嚟 (lei4) is the spoken Cantonese form.

