Arabic alphabet
History · Transliteration
Diacritics · hamza ء
Numerals · Numeration

Due to the fact that the Arabic language has a number of phonemes that have no equivalent in English or other European languages, a number of different transliteration methods have been invented to represent certain Arabic characters, due to various conflicting goals.


Any transliteration system of Arabic has to make a number of decisions, dependent on its intended field of application. The root of the problem is that the information contained in unvocalized Arabic writing is not sufficient to give a reader unfamiliar with the language sufficient information for accurate pronunciation. An exact equivalent of e.g. صدام حسين‎ would be ṣdʾm ḥsyn, which is meaningless to an untrained reader. The "full transliteration" adds information not in the text, which has to be supplied by a speaker of Arabic, ṣaddām ḥussayn. Usually, newspapers and popular books use not a transliteration, but a transcription: instead of translating each written letter they try to reproduce the sound of the words according to the orthography rules of the target language, e.g. Saddam Hussein; for spelling differences depending on the target language, compare Omar Khayyam with German Omar Chajjam, both for عمر خيام‎ (unvocalized ʿmr ḫyʾm, vocalized ʿumar ḫayyām).

Most issues around the romanization are about transliterating vs. transcribing – others, about what should be romanized:

  • transliteration ignores assimilation (sandhi) of the article before "solar letters": al-shams not the transcribed ash-shams / aš-Šams / asch-Schams (German) / asj-Sjams (dutch) / ach-chams (French)
  • a transliteration must render the "tied tā" (ta marbouta ة) faithfully, a transcription must render the sound ("a" like any other "a" or "t" like any other "at" — or in a vocalized text nothing vs. t)
  • "broken alif" (alif maqṣura, ى) must be transliterated with a special symbol, but is transcribed like standing alif, when it stands for a long a (ā)
  • For nunation is true what is true for the rest: transliteration renders what you see, transcription what you hear.

A transcription may reflect the language as spoken by the people of Baghdad, or the official Standard as spoken by a preacher in the mosque or a TV news reader. A transcription is free to add phonological (such as vowels) or morphological (such as word boundaries) information. A transliteration is ideally fully reversible: a machine must be able to translate it into Arabic and back.

A transliteration may be criticized as flawed for any of the following reasons:

  • A "loose" transliteration is ambiguous, rendering several Arabic phonemes with an identical transliteration, or digraphs for a single phoneme (such as sh) may be confused with two adjacent phonemes;
  • Symbols representing phonemes may be considered too similar (e.g., ` and ' or ʿ and ʾ for ayin and hamza);
  • ASCII transliterations using capital letters to disambiguate phonemes are easy to type but may be considered unaesthetic.

A further problem is that a transliteration which represents the letters exactly may be easily misread by non-Arabs, in particular with the use of the definite article (written "al" in arabic, but not necessarily pronounced as such). For instance an-nur (or an-nuur, or an-noor) would be more correctly transliterated along the lines of alnnur, but a hyphen is added and the unpronounced 'l' removed for the convenience of the uninformed non-Arab reader, who would otherwise pronounce an 'l', probably not understand the word to be nur, pronounce only one 'n', and be confused by the role of the double 'n'. Alternatively, if the shadda is not transliterated (since it is strictly not a letter), a hypercorrect transliteration would be alnur, which presents similar problems for the uninformed non-Arab reader.

A final problem is that all these problems produce the problem that a lot of time may be wasted worrying about transcription when it really doesn't matter. A reader who knows Arabic will normally be able to reconstruct the original however it has ben transliterated or transcribed, while a reader who does not know Arabic will not normally understand any of the systems anyhow, and simply be confused.

A table comparing romanizations using DIN 31635, ISO 233, ISO/R 233, UN, ALA-LC, and Encyclopaedia of Islam systems is available here: [9].

Letter Name SATTS UNGEGN ALA-LC DIN-31635 ISO 233 ISO/R 233 Qalam SAS SM IPA
hamza E ʼ, — —, ’ ʾ ˈ, ˌ —, ’ ' ʾ
(zero word-initially)
' (disappears after 'al-' and where alif wal is. [ʔ]
ʼalif A ā ʾ ā aa a, i, u
aa various, including [aː]
bāʼ B b b b b b b b b [b]
tāʼ T t t t t t t t t [t]
ṯāʼ C th th th ç [θ]
ǧīm, jīm, gīm J j j ǧ ǧ ǧ j ŷ j [ʤ] / [ʒ] / [ɡ] / [j]
ḥāʼ H H [ħ]
ḫāʼ O kh kh kh j x [x]
dāl D d d d d d d d d [d]
ḏāl Z dh dh dh đ [ð]
rāʼ R r r r r r r r r [r]
zāy  ; z z z z z z z z [z]
sīn S s s s s s s s s [s]
šīn  : sh sh š š š sh š š [ʃ]
ṣād X ş S [sˁ]
ḍād V D [dˁ]
ṭāʼ U ţ T [tˁ]
ẓāʼ Y Z đ̣ [ðˁ] / [zˁ]
ʻayn ` ʻ ʻ ʿ ʿ ʿ ` ʿ ř [ʕ] / [ʔˁ]
ġayn G gh gh ġ ġ gh g ğ [ɣ] / [ʁ]
fāʼ F f f f f f f f f [f]
qāf Q q q q q q q q q [q]
kāf K k k k k k k k k [k]
lām L l l l l l l l l [l], [lˁ] (in Allah only)
mīm M m m m m m m m m [m]
nūn N n n n n n n n n [n]
hāʼ ~ h h h h h h h h [h]
wāw W w w w w w w w
[w] , [uː]
yāʼ I y y y y y y y
[j] , [iː]
ʼalif mamdūda AEA ā ā, ʼā ʾā ʾâ ā, ʼā ā 'aa [ʔaː]
tāʼ marbūṭa @ h, t h, t h, t h, t h, t t
(zero when in absolute state)
ŧ [a], [at]
ʼalif maqṣūra / y y ā ae à à [aː]
lām ʼalif LA laʾ la
(with hamza)

(with lengthening alif)
treated as laam then alif usually: laa [laː]
ال ʼalif lām AL al- al- al- ʾˈal al- al al- al- When assimilation occurs: ál-

Online communication is often restricted to an ASCII environment in which not only the Arabic letters themselves but also Roman characters with diacritics are unavailable. This problem is faced by most speakers of languages that use non-Roman alphabets, or heavily modified ones. An ad hoc solution consists of using Arabic numerals which mirror or resemble the relevant Arabic.

