Urdu alphabet

From Infogalactic: the planetary knowledge core
Jump to: navigation, search
Urdu alphabet
اردو تہجی
Urdu example.svg
Example of writing in the Urdu alphabet: Urdu
Languages Urdu, Balti, Burushaski, others
Parent systems
U+0600 to U+06FF

U+0750 to U+077F
U+FB50 to U+FDFF

U+FE70 to U+FEFF

The Urdu alphabet is the right-to-left alphabet used for the Urdu language. It is a modification of the Persian alphabet, which is itself a derivative of the Arabic alphabet. With 38 letters and no distinct letter cases, the Urdu alphabet is typically written in the calligraphic Nastaʿlīq script, whereas Arabic is more commonly in the Naskh style. Usually, bare transliterations of Urdu into Roman letters (called Roman Urdu) omit many phonemic elements that have no equivalent in English or other languages commonly written in the Latin script. The National Language Authority of Pakistan has developed a number of systems with specific notations to signify non-English sounds, but these can only be properly read by someone already familiar with the loan letters.[citation needed]


The Urdu language emerged as a distinct register of Hindustani well before the Partition of India. It is distinguished most by its extensive Persian influences (Persian having been the official language of the Mughal government and the most prominent lingua franca of the Indian subcontinent for several centuries before the solidification of British colonial rule during the 19th century). The standard Urdu script is a modified version of the Perso-Arabic script and has its origians in 13th century Iran . It is closely related to the development of the Nastaliq style of Perso-Arabic script . Urdu script in its extended form is known as Shahmukhi script and is used for writing other Indo-Aryan languages of North Indian subcontinent like Punjabi and Saraiki as well.

Despite the invention of the Urdu typewriter in 1911, Urdu newspapers continued to publish prints of handwritten scripts by calligraphers known as katibs or khush-navees until the late 1980s. The Pakistani national newspaper Daily Jang was the first Urdu newspaper to use Nastaʿlīq computer-based composition. There are efforts under way to develop more sophisticated and user-friendly Urdu support on computers and the internet. Nowadays, nearly all Urdu newspapers, magazines, journals, and periodicals are composed on computers with Urdu software programs.

Apart from being more or less Persianate, Urdu and Hindi are mutually intelligible though there are difference in pronunciations.

Countries where Urdu language has been spoken

Afghanistan, Bahrain, Bangladesh, Botswana, Burma, Canada, France, Fiji, Germany, Guyana, India, Kenya, Malaysia, Malawi, Mauritius, Norway, Oman, Pakistan, Qatar, Saudi Arabia, Singapore, South Africa, Thailand, Tajikistan, the UAE,USA, the UK, Uganda, Uzbekistan, and Zambia.[1]


The Nastaʿlīq calligraphic writing style began as a Persian mixture of scripts Naskh and Ta'liq. After the Mughal conquest, Nasta'liq became the preferred writing style for Urdu. It is the dominant style in Pakistan, and many Urdu writers elsewhere in the world use it. Nastaʿlīq is more cursive and flowing than its Naskh counterpart.


Urdu script is an Abjad script which is derived from the Perso-Arabic Script of Iran. Urdu alphabets were standardized in 2004 by National Language Authority which is responsible for standardizing Urdu in Pakistan. According to National Language Authority Urdu has 57 alphabets of which 39 are basic alphabets while 17 are Diagraphs to represent Aspirated consonant made by attaching basic consonant alphabets with a variant of He called Do Chasham He.[2][3] Tāʼ marbūṭah is also sometimes considered a letter though it is rarely used except for in certain loan words from Arabic . Urdu like in its parents Arabic and Persian alphabets use Alif , Waw , Ye and He for vowels . Another letter of Urdu which is used as a vowel to represent /ɛː, eː/ sounds in its final position is Beri Ye . Beri Ye is an archaic variant Ye found in old Persian manuscripts though it is no longer used in modern Persian alphabets. In its middle position Beri Ye is written like the standard Persian Ye and is differentiated only by diacritics from other sounds of Ye in full vocalized form of Urdu. In Urdu a variant of letter Nun called Nun Ghunnah is used to represent Nasalized vowels in final position. In middle Nun Ghunnah is written just like the standard letter of Nun with a diacritic called Maghnoona differentiating it from the standard sound of Nun in full vocalized form of Urdu. The letter of Nun Gunnah is an archaic variant of Nun which is no longer used in modern Arabic alphabets.

No. Name[4] ALA-LC[5] Hunterian[6] IPA Isolated glyph
1 الف alif ā, ʾ, – /ɑː, ʔ, ∅/ ا
2 بے be b /b/ ب
3 پے pe p /p/ پ
4 تے te t /t̪/ ت
5 ٹے ṭe t /ʈ/ ٹ
6 ثے s̱e s /s/ ث
7 جیم jīm j /d͡ʒ/ ج
8 چے ce c ch /t͡ʃ/ چ
9 بڑی حے baṛī ḥe h /h, ɦ/ ح
10 خے khe kh kh /x/ خ
11 دال dāl d /d̪/ د
12 ڈال ḍāl d /ɖ/ ڈ
13 ذال ẕāl z /z/ ذ
14 رے re r /r/ ر
15 ڑے ṛe r /ɽ/ ڑ
16 زے ze z /z/ ز
17 ژے zhe zh zh /ʒ/ ژ
18 سین sīn s /s/ س
19 شین shīn sh sh /ʃ/ ش
20 صواد ṣwād s /s/ ص
21 ضواد ẓwād z /z/ ض
22 طوئے t̤oʾe t /t̪/ ط
23 ظوئے z̤oʾe z /z/ ظ
24 عین ʿain ā, o, e, ʿ, – /ɑː, oː, eː, ʔ, ʕ, ∅/ ع
25 غین ghain gh gh /ɣ/ غ
26 فے fe f /f/ ف
27 قاف qāf q /q/ ق
28 کاف kāf k /k/ ك
29 گاف gāf g /ɡ/ گ
30 لام lām l /l/ ل
31 میم mīm m /m/ م
32a نون nūn n /n, ɲ, ɳ, ŋ/ ن
32b نون غنہ nūn ghunnah n /◌̃/ ں
33 واؤ wāʾo v, ū, o, au w, ū, o, au /ʋ, uː, oː, ɔː/ و
34 چھوٹی ہے choṭī he h /h, ɦ/ or /∅/ ه
35 دو چشمی ہے do-cashmī he h /ʰ/ or /ʱ/ ھ
36 ہمزہ hamzah ʾ, – /ʔ/, /∅/ ء
37 چھوٹی یے choṭī ye y, ī, á /j, iː, ɑː/ ی
38 بڑی یے baṛī ye ai, e /ɛː, eː/ ے

The diagraphs of Aspirated consonant are as follow.

Digraph[5] Transcription[5] IPA
بھ bh [bʱ]
پھ ph [pʰ]
تھ th [t̪ʰ]
ٹھ ṭh [ʈʰ]
جھ jh [d͡ʒʰ]
چھ ch [t͡ʃʰ]
دھ dh [d̪ʱ]
ڈھ ḍh [ɖʱ]
رھ ṛh [rʱ]
ڑھ ṛh [ɽʱ]
کھ kh [kʰ]
گھ gh [ɡʱ]
لھ lh [lʱ]
مھ mh [mʱ]
نھ nh [nʱ]
وھ wh [ʋʱ]
یھ yh [jʱ]


Urdu language have 10 vowels and 10 nasalized vowels. In Urdu every vowel has four forms depending on its position initial, middle, final and isolated. In Urdu like in its parent Arabic alphabet vowels are represented using a combination of diagraphs and diacritics. In urdu Alif, Waw, Ye and He and their variants are used to represent vowels.

Vowel chart

Urdu doesn't have standalone vowel letters. Vowels are either represented by diacritics upon the preceding consonant (often omitted unless ambiguity can arise), or by consonants y and w, (with or without diacritics) used as long vowels. The letter alif acts as a place holder for vowels beginning a syllable and as a long vowel ā within and at the end of a syllable. Urdu does not have short vowel at the end of the word except for in loan words from Arabic . This is a list of Urdu vowels:

Romanization Pronunciation Initial Form Middle Form Final Form Isolated Form
a /ə/ اَندر سَر اَ
ā /aː/ آم باغ وفا آ
i /ɪ/ اِدھر دِن اِ
ī /iː/ اِینٹ تِیسرا گھڑی اِی
u /ʊ/ اُلفت سُلطان اُ
ū /uː/ اُوپر دُور قابُو اُو
e /eː/ ایک میرا لڑکے اے
ai /ɛː/ اَیسا کَیسا ہَے اَے
o /oː/ اوس دوست کو او
au /ɔː/ اَور مَوسم نَو اَو


Alif is the first letter of the Urdu alphabet, and it is used exclusively as a vowel. At the beginning of a word, alif can be used to represent any of the short vowels: اب ab, اسم ism, اردو Urdū. For long ā at the beginning of words alif-mad is used: آپ āp, but a plain alif in the middle and at the and: بھاگنا bhāgnā.


Wāʾo is used to render the vowels "ū", "o", "u" and "au" ([uː], [oː], [ʊ] and [ɔː] respectively), and it is also used to render the labiodental approximant, [ʋ].


Ye is divided into two variants: choṭī ye ("little ye") and baṛī ye ("big ye").

Choṭī ye (ی) is written in all forms exactly as in Persian. It is used for the long vowel "ī" and the consonant "y".

Baṛī ye (ے) is used to render the vowels "e" and "ai" (/eː/ and /ɛː/ respectively). Baṛī ye is distinguishable in writing from choṭī ye only when it comes at the end of a word/ligature. Additionally, Baṛī ye is never used to begin a word/ligature, unlike choṭī ye.

Chotti He

Chotti He in its final position when preceded by short vowel becomes silent and its preceding vowel is pronounced as a long vowel . When Chotti He is preceded by /ə/ then it is pronounced as /aː/ , When Chotti He is preceded by /ɪ/ then it is pronounced as /eː/ and when Chotti He is preceded by /uː/ then it is pronounced as /oː/. In final position in order to differentiate between sound of h with that of silent vowel often two Chotti He are written instead of one.


Ayin in its initial and final position is silent in pronunciation and is replaced by the sound of its preceding or succeeding vowel.

Nun Ghunnah

Nasalized vowels are represented by Nun Ghunnah written after their non nasalized versions . like for example ہَے when nasalized would become ہَیں . In middle form Nun Gunnah is written just like Nun and is differentiated by a diacritic called Maghnoona.


Urdu Transcription
میں maiṉ
کنواں kuṉwāṉ


In Urdu Hamza is silent in its all forms except for when it is used as Hamza-e-Izafat . The main use of Hamza in Urdu is to indicate a vowel cluster.


Urdu uses the same subset of diacritics used in Arabic based on Persian conventions. Urdu also uses Persian names of the diacritics instead of Arabic names. Urdu also have some specialized diacritics which are not found Arabic or Persian alphabets though these diacritics are not commonly used. Commonly used diacritics are Zabar (Arabic Fatḥah) , Zer (Arabic Kasrah) , Pesh (Arabic Ḍammah) which are used to clarify the pronunciation of vowels. Jazam (Arabic Sukun) is used to indicate a Consonant Cluster and Shad (Arabic Tashdid) which is used to indicate a Gemination. Other diacritics include Khari Zabar (Arabic Dagger alif) , Do Zabar (Arabic Fathatan) which are found in some common Arabic loan words. Other Arabic diacritics are also sometimes used though very rarely in loan words from Arabic. Zer-e-Izafat and Hamza-e-Izafat are described in next section.

Other then common diacritics Urdu also have special diacritics which are often found only in dictionaries for the clarification of pronunciation some of whom are irregular. These diacritics include Kasrah-e-Majhool , Fathah-e-Majhool , Dammah-e-Majhool , Maghnoona , Ulta Jazam , Alif-e-Wavi and some other very rare diacritics. Among these only Maghnoona is used commonly in dictionaries and has a unicode representation at u0658. Other diacritics are only rarely written in printed form only in some advance dictionaries.[7]


Iẓāfat is a syntactical construction of two nouns, where the first component is a determined noun, and the second is a determiner. This construction was borrowed from Persian. A short vowel "i" is used to connect these two words. It may be written as zer (ــِ) at the end of the first word, but usually is not written at all. If the first word ends in choṭī he (ه) or ye (ی) then hamzā (ء) is used above the last letter (ۂ or ئ). If the first word ends in a long vowel then baṛī ye (ے) with hamzā on top (ئے) is written.[8]

Forms Example Transliteration Meaning
ــِ شیرِ پنجاب sher-i Panjāb the lion of Punjab
ۂ قطرۂ آب qat̤rah-yi āb (a) drop of water
ئ ولئ کامل walī-yi kāmil perfect saint
ئے روئے زمین -yi zamīn surface of the Earth
صدائے بلند ṣadā-yi buland a high voice

Computers and the Urdu alphabets

During early days Urdu alphabets were not properly represented on any Code page . One of the earliest code to represent Urdu was IBM Code Page 868 which dates back to 1990.[9] Another Early which represented Urdu alphabets was Windows-1256 and MacArabic encoding both of which dates back to mid 90's . In Unicode Urdu is represented inside the Arabic Block section of Unicode . Another Code page page for Urdu which is used in India is Perso-Arabic Script Code for Information Interchange. In Pakistan the 8-bit Code page which is developed National Language Authority is called Urdu Zabta Takhti (اردو ضابطہ تختی) (UZT) [10] which represents Urdu in its most complete form including some of its specialized diacritics though Urdu Zabta Takhti (اردو ضابطہ تختی) (UZT) is not designed to coexist with English.

Romanization standards and systems

There are several Romanization standards for writing Urdu. Though they are not very popular because most of them fall short to represent the Urdu language properly. Instead of standard Romanizaion schemes people on Internet , Mobile Phones and Media often uses a non standard form of Romanization which tries to mimic English Orthography .The problem with this kind of Romanization is that it can only be read by native speakers and even for them with great difficulty. Among standardized Romanization schemes the most accurate is ALA-LC romanization which is also supported by National Language Authority . Other Romanization schemes are often rejected because either they are unable to represent sounds in Urdu Language properly or they often do not take regard of Urdu orthography and favor pronunciation over orthography.[11]

See also


  1. "Urdu". Omniglot.com.
  2. "Controversy over number of letters in Urdu alphabet"
  3. "Corpus Based Urdu Lexicon Development "
  4. Delacy 2003, p. XV–XVI.
  5. 5.0 5.1 5.2 "Urdu romanization" (PDF). The Library of Congress.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  6. Geographical Names Romanization in Pakistan. UNGEGN, 18th Session. Geneva, 12–23 August 1996. Working Papers No. 85 and No. 85 Add. 1.
  7. "Proposal of Inclusion of Certain Characters in Unicode"
  8. Delacy 2003, p. 99–100.
  9. "IBM 868 code page"
  10. "Urdu Zabta Takhti"
  11. "اردو رومن نقل حرفی ۔ ایک ابتدائی تعارف"


  • Delacy, Richard (2003). Beginner's Urdu Script. McGraw-Hill.CS1 maint: ref=harv (link)<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  • Delacy, Richard (2010). Read and write Urdu script. McGraw-Hill.CS1 maint: ref=harv (link)<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  • "Urdu romanization" (PDF). The Library of Congress.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
  • Ishida, Richard. "Urdu script notes".<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>

External links