Basic Syntax Create your first LaTeX Title Section Paragraph and Comments Text Processing Fonts Formatting List
Document type in Latex↓
Document Class Document Classes Comparison Document Classes Options
Command Table of Contents Page Numbering Footnotes Language Support
Bibliography ↓
Bibliography Management With Biblatex Bibliography Management With Natbib
Language Support in LaTeX
By default LaTeX is set to support the english language meaning that the document will be rearranged on compile to conform to its typesettings standards. Different languages have different typessetting convenctions and most definitely use more characters the 128 available in ASCII (which is the native encoding for TeX).
The following packages are commonly employed for language support:
Inputenc
The package inputenc tranlates inputs into TeX language, generally composing them with ASCII and control sequences: for istance â wold be internally represented as \^a with the ISO latin-1 encoding . With UTF-8 the representation gets more convoluted since complex characters such as the japanese kanji need to be accomodated.
The syntax is \usepackage[encoding]{inputenc}: in pretty much all cases UTF-8 should be used (\usepackage[utf8]{inputenc}), the current distributions of LaTeX actually uses UTF-8 by default but is still good practice to specify it in case your .tex file were to run on a machine with an older distribution.
Encodings other than UTF-8 can still be used but are there mostly for backwards compatibility, a few will be mentioned for historical reasons:
ASCII The American Standard Code for Information Interchange is the first character encoding ever used, pubblished in 1963 it encoded each character in 7 bits + 1 parity bit for a total of 128 characters.
The option for ASCII is (unsurprisingly) [ascii]
MAC OS Roman Encoding used by Apple's PCs from 1981 to 2001. Not having a parity bit it could encode a total of 256 characters with the first 128 being the ASCII ones. The option for MAC OS Roman is [applemac]
UTF-8 UCS The Universal Character Set is of variable lenght (1-4bytes) and can thus encode up to 2.000.000.000 characters (obviusly not all of them are in use as of now), the first 128 of those are still ASCII. Its flexibility made it today's most used character encoding system by far. The option for UTF-8 is [utf8]
Fontenc
The fontenc package takes the interpretations produced by inputenc and converts them in actual characters by using command sequences on an estamblished table of 128, 256 glyphs based on the chosen encoding. For istance â (if not already present in the current fontenc encoding) would be converted by inputenc into \^a, then fontenc would read it and place the accent on the "a" glyph.
The first encoding, written by Donald Knuth was OT1 and used a table of 128 glyphs (as all encoding preceeded by O do): up until 2015 it was the default encoding. The latest LaTeX distributions employs T1 which, with 256 glyphs, covers most latin languages.
Three encodings are dedicated to cyrillic languages due to the vast amount of characters: T2A,T2B,T2C but they are all contained in the X2 encoding.
The package is called with the following syntax \usepackage[encoding]{fontenc}.
Special Characters
Directly from the fontenc manual, the following are command sequences to write special characters that, most likely, won't be available on a standard keyboard:
\‘ OT1,T1 ` (grave) \’ OT1,T1 ́ (acute) \^ OT1,T1 ˆ (circumflex) \~ OT1,T1 ̃ (tilde) \" OT1,T1 ̈ (umlaut) \H OT1,T1 ̋ (Hungarian umlaut) \r OT1,T1 ̊ (ring) \v OT1,T1 ˇ (haček) \u OT1,T1 ̆ (breve) \= OT1,T1 ̄ (macron) \. OT1,T1 ̇ (dot) \b OT1,T1 ̄ (underbar) \c OT1,T1 ̧ (cedilla) \d OT1,T1 . (dot under) \k T1 ̨ (ogonek) \AE OT1,T1 Æ \DH T1 Ð \DJ T1 Ð \L OT1,T1 Ł \NG T1 Ŋ \OE OT1,T1 Œ \O OT1,T1 Ø \SS OT1,T1 ß \TH T1 Þ \ae OT1,T1 æ \dh T1 ð \dj T1 đ \guillemotleft T1 « (guillemet) \guillemotright T1 » (guillemet) \guilsinglleft T1 ‹ (guillemet) \guilsinglright T1 › (guillemet) \i OT1,T1 ı \j OT1,T1  \l OT1,T1 ł \ng T1 ŋ \oe OT1,T1 œ \o OT1,T1 ø \quotedblbase T1 „ \quotesinglbase T1 ‚ \ss OT1,T1 ß \textasciicircum OT1,T1 ^ \textasciitilde OT1,T1 ~ \textbackslash OT1,T1 \ \textbar OT1,T1 | \textbraceleft OT1,T1 { \textbraceright OT1,T1 } \textcompwordmark OT1,T1 (invisible) \textdollar OT1,T1 $ \textemdash OT1,T1 — \textendash OT1,T1 – \textexclamdown OT1,T1 ¡ \textgreater OT1,T1 > \textless OT1,T1 < \textquestiondown OT1,T1 ¿ \textquotedbl T1 " \textquotedblleft OT1,T1 “ \textquotedblright OT1,T1 ” \textquoteleft OT1,T1 ‘ \textquoteright OT1,T1 ’ \textregistered OT1,T1 ® \textsection OT1,T1 § \textsterling OT1,T1 £ \texttrademark OT1,T1 ™ \textunderscore OT1,T1 _ \textvisiblespace OT1,T1 ␣ \th T1 þ
Babel
The babel package supports specific typesettings for one or more languages. The syntax is \includepackage[language]{babel}. For more information about this package please consult the relative section
Language Specific Solution
Some languages, like the japanese, use over 50,000 different characters: fontenc with its tables can't really deal with this without external support: language specific solutions have to be found.
The package CJK can output chinese, japanese and korean:
%\documentclass[UTF8]{ctexart} whole document \documentclass{article} \usepackage{CJKutf8} \begin{document} This package supports texts in Chinese Japanese and Korean:\par \begin{CJK}{UTF8}{min}{ 法84和リオカケ港側ト聞州つざドか購34打オナウカ聞載リ独止70援ざう実本短致召ゃ。前にきー芸多ドさッぐ月勧ぐひ問氏こま科大はよぞお触浜の向究らげて荒理ソコモ迎京改期ーちぞ詳面ヨホ録学輔チロ恵位セマレ訃傍剰えだン。26宿チソ間面アチウエ案抗ぎ力描ぽっば問件そんて屋85真チル文器ミスタネ初勝イテ件有こ稿勢りあ化志にッぱ販演危距告え。 投ホ代判へ都7入ぴま都出ク円出あ別社ユワ断載ノ意中軍ド詳損付へす選棄メカ親策ホス社予ば回供権か報合でぼふ登国うゃも討当都殺幌いゅ。終トカ立35無著リぐ堀無ざクす駒一ユエメヲ情察変ぶづけべ中能やたま十製ハホノネ今9団す禁食加牟ネヘレヤ操26殺上得腕や。外スクテ空内トラス提捕読イやだぜ員村れ優政もびし読65苦コ光季かいクわ送者取ツ者員タスヲモ担知せふリ。} \end{CJK} \end{document}