Persian language

Persian (فارسی), also known as Farsi, Parsi, Tajiki or Dari, is a language spoken in Iran, Tajikistan, Afghanistan and Uzbekistan. It has official-language status in the first three countries. There are over 75 million[1] native speakers. It belongs to the Indo-European language family. It is of the Subject Object Verb type.



Persian is a member of the Indo-European family of languages, and within that family, it belongs to the Indo-Iranian (Aryan) branch. Scholars believe the Iranian sub-branch consists of the following chronological linguistic path: Old Persian (Avestan and Achaemenids Persian) ==> Middle Persian (Pahlavi, Parthian, and Sassanids Persian) ==> Modern Persian (Dari, circa 900 CE to present.

(Old Persian, the main language of the Achaemenid inscriptions, should not be confused with the non-Indo-European Elamite language (see Behistun inscription). Over this period, the morphology of the language was simplified from the complex conjugation and declension system of Old Persian to the almost completely regularized morphology and rigid syntax of modern Persian, in a manner often described as paralleling the development of English. Additionally, many words were introduced from neighboring languages, including Aramaic and Greek in earlier times, and later Arabic and to a lesser extent Turkish. In more recent times, some Western European words have entered the language (notably from French and English.)

The language itself has greatly developed during the centuries. Due to technological developments new words and idioms are created and enter into Persian like any other language. In Iran the Academy of Persian Language and Literature is a center that evaluates the new words in order to initiate and advise its Persian equivalent. In Afghanistan, the Academy of Sciences of Afghanistan does the same for Afghan Persian (among other languages).


Persian, the more widely used and official name of the language in English, is the Hellenized form of the native term Parsi. Farsi is the Arabicized form of Parsi and its use in the English language is very recent. Native Persian speakers typically call it "Farsi" in modern usage. ISO, the Academy of Persian Language and Literature, and many other authentic sources call the language Persian. The government of Afghanistan uses both "Dari" and Persian in English communications.

The Academy of Persian Language and Literature as well as most linguists and lexicographers, believe that "Farsi" is not the appropriate term used for the Persian language in English. "Farsi" is actually the arabicized form of "Parsi", due to a lack of the /p/ phoneme in Standard Arabic [2]. Incidentally the Persian Wikipedia is sometimes referred to as the Farsi Wikipedia, as the "fa." prefix implies, due to the fact that "fa" is the ISO 639-1 designation for the language.

Dialects and close languages

Communication is generally mutually intelligible between Iranians, Tajiks, and Persian speaking Afghans, however by popular definition:

  • Dari is the local name for the eastern dialect of Persian, one of the two official languages of Afghanistan; including Hazaragi - spoken by the Hazara people of central Afghanistan.
  • Tajik could also be considered an eastern dialect of Persian, but contrary to Iranian and Afghan Persian, is written in the Cyrillic script.

The following are some of the dialects of various Iranian peoples within modern Iran proper:

There also exists the following 'dialect':

  • Judeo-Persian - Known among Iranian Jews themselves as Latorayi (meaning: 'not the language of the Torah') is an informal dialect among Iranian Jews which uses Persian grammar and structure, but heavily borrows words from Hebrew. This dialect is not common among Iranian Jews and it is generally the 'bazaar language' of Iranian Jewish merchants and store owners, and especially butchers/meat merchants, amongst themselves. In a way, it is comparable to how Yiddish developed out of German/Hebrew, but at a much smaller scale.

Orthography and vocabulary

Modern Persian uses a modified version of the Arabic alphabet (see below). After the conversion of Persia to Islam, it took approximately one hundred fifty years before Persians adopted the Arabic alphabet as a replacement for the older alphabet. Previously, the Persian language (the Middle Persian or Pahlavi at that time) used two different alphabets: A modified version of the Aramaic alphabet, and a native Iranian alphabet called Dīndapirak (literally: religion script).

It should be noted that human languages, and the alphabet used to represent those languages in written form, are two different concepts, and alphabets are not intrinsic to human languages. As such, Persian and Arabic are entirely different languages from different linguistic families, with different phonology and grammar.

Persian adds four letters to the Arabic alphabet for its use, due to the fact that the four sounds that exist in Persian do not exist in Arabic. Additionally, it changes the shape of another two. Some people call this modified alphabet the Perso-Arabic alphabet. The additional four letters are:

sound shape Unicode name
[p] پ Peh
[tʃ] (ch) چ Tcheh
[ʒ] (zh) ژ Jeh
[g] گ Gaf

The letters different in shape are:

sound original Arabic letter modified Persian letter name
[k] ك ک Kaf
[j] and [i:], or rarely [a:] ي or ى ی Yeh

The diacratical marks used in the Arabic script, a.k.a. harakat, are also used in Persian, although some of them have different pronunciations. For example, an Arabic Damma is pronounced as /u/, while in Persian it's pronounced as /o/.

Persian also adds the notion of a pseudo-space to the Arabic script, called a Zero Width Non-Joiner (ZWNJ) by the Unicode Standard. It acts like an space in disconnecting two otherwise-joining adjacent letters, but doesn't have a visual width.

It should also be noted that many Persian words with an Arabic root are spelled differently from the original Arabic word. Alef with Hamza Below (إ) always changes to Alef (ا), Teh Marbuta (ة) usually, but not always, changes to Teh (ت) or Heh (ه), and words using various Hamzas get spelled with yet another kind of Hamza (like مسؤول becoming مسئول).

Other languages, like Pashto or Urdu, have taken those notions and have sometimes extended them with new letters or punctuation.

There are many loanwords in the Persian language, mostly coming from Arabic, English, French, and Turkic languages. Also, the words that have originated in the languages spoken in the region before the Arab invasion, are usually changed in the pronunciation.


The functional contrast for vowels appears to be between long {/i:/, /u:/, /ɑ:/} and short {/e/, /o/, /ę/}. Therefore, it seems possible to represent the vowels as {/i:/, /u:/, /a:/} and {/i/, /u/, /a/}. Also note that /tʃ/ and /dʒ/ are affricates, not stops. The following chart is adapted from this Structural Sketch of Persian. Certain fonts may be needed to view phonetic characters.






 voiceless stops
 voiced stops
 voiceless fricatives
 voiced fricatives
l, r


Normal sentences are structured as "(S) (PP) (O) V". If the object is definite, then the order is "(S) (O + "rɑ:") (PP) V".

Persian and Urdu

The Persian language was crucial in the formation of a common language of the Central, North and Northwest regions of the Indian subcontinent. Following the Mughal conquest of India and the resulting vast Islamic empire, especially in the North and middle areas, a hybrid language of Hindi and Persian began to form around the 10th and 11th centuries CE, one that would eventually be known as Urdu ("tent" in Turkish in allusion to the army barracks of visiting troops). It grew from the interaction of (often Persian speaking) Muslim soldiers and native Hindu peoples, merging with the local Prakrit and Sanskrit-based Khari boli (standing tongue), a proto-Hindi dialect of the north. Soon, the Persian script and Nasta'liq form of cursive was adopted, with additional figures added to accommodate the Indian phonic system, and a new language based on Indian grammar with a vocabulary largely divided between Persian (and indirectly some Arabic) and Hindi. Elements peculiar to Persian, such as the enclitic izaafat, and the use of the takhallus, were readily absorbed into Urdu literature both religious and secular.

Urdu soon gained distinction as the preferred language in Persian courts of India and to this day retains an important place in literary and cultural spheres. Many distinctly Persian forms of literature, such as ghazals and nazm s, came to both influence and be affected by Indian culture, producing a distinct melding of Middle Eastern and South Asian heritages. A famous cross-over writer was Amir Khusro, whose Persian and Hindvi (proto-Hindi-Urdu) couplets are to this day read in India. Persian has not infrequently been termed an adopted classical language of India beside Sanskrit due to its role in Indian tradition.

