Designing Fonts for Two Billion people

2 439 words13 min read
DSC 7984 edited

Typotheque tackled the unprecedented task of designing a comprehensive set of fonts for South Asia.

The unknown number of languages in India

Counting the precise number of languages in a country as large and diverse as India is extremely complex. Not only is the evolution of languages dynamic and influenced by sociopolitical developments, in India, the lack of linguistic standardisation, the distinct local identities, and the regional loyalties render any estimation of a total number nebulous at best. While the Constitution of India has deemed 22 languages the ‘scheduled’ languages of India, this is not reflective of the actual number of spoken languages.

When the British colonial administrators conducted the first official survey of languages in the Indian subcontinent in 1898, they reported 179 total.

Every ten years, the government of India conducts its own survey of languages as part of the Census of India. Unlike the British administration’s survey, the Census of India distinguishes between mother tongue, preferred second language, and third language. The most recent available Census of India data from 2011 reports 19,569 unique mother tongue names. These replies are then normalised to eliminate duplications and invalid responses, resulting in 1,369 rationalised mother tongues and an additional 1,474 unclassified or ‘other’ languages. After grouping the languages and removing those used by 10,000 or fewer speakers, the total number of languages is 121. This is a common normalisation process, but one that makes the minority languages invisible and vulnerable, with over 18 million people speaking ‘other’ languages.

Languages of Indian sub-continent
Official languages of the Indian sub-continent.

Ethnologue, the world’s most comprehensive catalogue of languages, currently lists 418 living languages in India. Comparatively, they list 92 in Nepal, 77 in Pakistan, 44 in Bangladesh, 27 in Bhutan, and 12 in Sri Lanka.

One country, many languages

Despite a common westernised misconception that Hindi is India’s national language, only 27% of the country’s population, roughly 322 million people, cite it as their mother tongue. While the nationalist movement has attempted to promote Hindi as the nation-wide language, they’ve never converted the country away from its current system of noncompeting multilingualism where language use is contextual and varies from situation to situation.

Hindi language coverage
Hindi, the most spoken language of India, is the first language of 27% of the country’s population.

Technically, Hindi is the most spoken language and most spoken second language, making it the most interusable Indian language. After Hindi, the regional languages of Bengali and Marathi have the most speakers with 97 million and 83 million respectively. English, which is used primarily in the context of business and education, is only spoken by 10% of the population. Consequently, documents need to be set in a number of languages and writing scripts. The Indian bank notes, for example, use 17 languages, and almost all signage through the country is multilingual.

100 Indian rupees
The back side of the 100 Rupees Indian bank note showing Assamese, Bangla, Gujarati, Kannada, Kashmiri, Konkani, Malayalam, Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu, Urdu languages in the panel, next to English and Hindi.

The many languages spoken in India are a reflection of the country’s long and complex history, or as Indian writer and researcher Kamalpreet Singh Gill says: ‘Languages and scripts are more than mere media of communication; they are carriers of culture, history, and ethos of a people and their civilisation’.

When languages or writing scripts disappear, people lose access to information that is available only in those languages, a process comparable to loss of biodiversity. It takes millenia to establish, and only decades to lose this diversity, a troubling fact and reminder of fragility of most regional languages.

A font for the Indian subcontinent

Typotheque’s mission is to support all living languages of the world. Therefore, we’re working to develop a type system for the Indian sub-continent, covering the official languages of India, Pakistan, Bangladesh, Nepal, and Sri Lanka. This requires designing fonts that support Bangla-Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Meetei Mayek, Odia, Ol Chiki, Sinhala, Tamil and Telugu. We have already designed the Latin and Arabic script which are also used in India and Pakistan and have started the process of developing the Tibetan script.

Unlike alphabetic writing scripts which typically represent one sound with one letter, Indic scripts used throughout the Indian subcontinent contain multiple vowel types: inherent vowels which are part of the consonants, and independent and dependent forms of vowels which can be either placed to the left of, to the right of, above, below, or on both the left and the right sides of the base letter. To represent these sounds, we needed to create hundreds of unique glyphs. November Devanagari, for example, includes over 2,000 glyphs to render the supported languages correctly. This is the reason why historically, metal typesetting was slow to develop for the Indic scripts, and why today, there are significantly more fonts for the Latin script. In fact, there are less than a dozen fonts, many of which are of questionable quality, that could typeset the languages found on the Indian bank note.

In addition to the 22 official languages of the Indian government, we wanted to support as many regional languages as possible. However, to make the project feasible, we needed to narrow down the pool of languages. South Asia is home to over an estimated 2 billion people. By setting the threshold for inclusion of a writing system at 0.1% of the population, or 2 million users, we excluded (for now) scripts including Warang Citi, Mundari Bani, Thaana or Takri.

Understanding Readers preferences

Designing fonts for so many languages is an extremely complex and ambitious task, requiring not only tens of thousands of hours of design development, but also research into every character forms’ recognisability, readers’ preferences, and context for use. The same writing script can feature different shapes of glyphs based on various traditions. For example, a Devanagari font requires different graphic variants of many characters and numerals to accommodate demographic and regional nuances. We’ve created a survey to evaluate and document different graphic variants of Devanagari characters, interviewing hundreds of people gathering data and recommendations. Only then, we would be able to deliver fonts informed by local readers’ needs, and offer specific versions of the fonts for Hindi, Marathi, or Nepali languages.

This research is unprecedented, yet compared to the work required for less common writing scripts like Ol Chiki and Meitei Mayek, it was relatively easy.

Ol Chiki, the official writing system for Santali language, was invented in the early 20th century by scholar Pandit Raghunath Murmu and is spoken by around 7 million people mainly in West Bengal, Odisha, and Jharkhand. There are no books or websites thoroughly documenting the history of the script’s printed forms or are there instruction manuals on its design. To gather the information, we organised multiple field trips to Jharkhand to work with the indigenous communities, teachers and Ol Chiki script proponents to understand their concerns about the language and the script. We relied on their feedback throughout the design process, leading up to their validation of the final fonts. In particular, we’d like to thank Rabindra Murmu, Baburam Soren and Sara Hansda from the Guru Gomkey Academy, Karandih, for their continuous support and guidance in the project, and Sudhir Horo from the Adivasi Design Lab.

Consulting Ol Chiki fonts with the Santali readers
Typotheque field trip to Jharkhand, consulting Ol Chiki fonts with the Santali readers.

We also included the indigenous Meitei script (Meitei Mayek) used in Manipur state in Northeast India. UNESCO’s Atlas of the World’s Languages in Danger names Meitei language, also known as Manipuri, a vulnerable language. Meitei script dates back to the 6th century, but in the 18th century, it was replaced by the Bangla-Assamese script. However, due to the newfound political and technological support, Meitei has experienced a resurgence and is now one of the official scripts of the Government of Manipur. Earlier this year, for the first time, all the newspapers in Meitei language use the Meitei script. Read more about the Meitei Mayek and Ol Chiki scripts.

Designing a Design System

November is a type system consisting of three font families—a sans, a rounded typeface and a stencil variant—each in nine weights with the first two font packages coming in three various widths. Rooted in the tradition of DIN industrial standards but exceeding the typical use case, November offers an unprecedented number of stylistic variations, providing superior legibility and handling modes of communication, from signage to long text, with ease. Only Google’s Noto fonts get close to meeting November’s versatility and accessibility, but without offering round or stencil variants or the same range of styles for marginal writing scripts such as Ol Chiki or Meetei, Noto falls short. November’s clear and unambiguous shapes, and extensive language support (besides the fonts for the Indian sub-continent November supports dozens of other writing scripts, including Chinese, Japanese and Korean!) make November an extremely versatile design tool and key to accessible design.

November, October, November Stencil
November, October, November Stencil

Since the design of November is simple and rational, it adapts seamlessly to other writing scripts. In particular, the round style of November (called October) mimics the look of a ballpoint pen, which is particularly suitable for south Indian writing scripts traditionally engraved into palm leaves due to their consistent stroke thickness. We also included the Stencil construction which is ubiquitous throughout the subcontinent. Stencilling signage is popular because it is fast and inexpensive to produce. We are not aware of any type system in stencil style that supports so many writing scripts, so it seems to be worth the extra work.

The case of Malayalam

Designing fonts requires not just a command of form and design, but an understanding of language’s past and present and the modes in which we’ve created and shared text over time.

Malayalam, the official language of the state of Kerala, has relatively few speakers (33 million) compared to other major Indian languages, but boasts of a robust written culture. After India’s Hindi-language papers, newspapers written in Malayalam have the largest readership.

On the graphical level, the Malayalam script is also infamous for having many complex combining forms, arguably more than any other major Indian script. This complexity posed challenges to printers working with metal type, which led to the Malayalam Script Reform of 1971 and from then on, the existence of two parallel orthographic systems: Reformed Malayalam, intended for printing, and Traditional Malayalam, retained for handwriting. For a type foundry it means that to fully support the Malayalam language, we had to develop two sets of fonts to accommodate both traditional and simplified Malayalam.

In 2022, the Government of Kerala acknowledged advancements in technology and issued an order to reform Malayalam script, reversing many of the changes proposed in the previous 1971 reform, bringing the official orthography closer to traditional-style Malayalam, while retaining elements of the reformed orthography.

Hitesh Malaviya, designer of November Malayalam, conducted extensive research into the reformed and traditional Malayalam orthography to develop both versions of the script. ‘November Malayalam is going to be the first typeface that comes with both traditional and reformed orthography with multiple widths, weights and styles’, says Malaviya, ‘and I am happy that I have designed something to preserve its heritage’. Karthik Malli, a writer and researcher of Indian languages, wrote an in-depth essay entitled Malayalam: Scripting Tradition & Modernity containing much previously unpublished material.

complete linearisation of Malayalam text
To better visualise and conceptualise variations in preferences, we propose a visual model of the spectrum, from more traditional to more linear – the orthography proposed in the 1971 Standard is on the more linear side of the spectrum.

Design Process

Coordinating a team of people to create a single coherent and unified system that respects each writing script presents a large swathe of challenges. Learning from the previous project, we streamlined the process, we developed a methodology for collaborative multiscript type design—each script had a primary designer who served as the chief decision-maker and to combat any resulting limitations and biases, we implemented regular critique sessions and peer reviews of fellow professionals. This allowed us to improve parts of the project as they were growing in complexity.

In previous projects, we involved a font engineer during the final stages. They would code the OpenType Layout features to implement the font shaping behaviour and to make the fonts fully functional.

For this project, from the beginning, we worked in conjunction with Liang Hai, Typotheque’s multilingual font specialist, to ensure the rendering technologies worked in harmony with the design decisions. Indic characters change shape depending on their context, and the designer and font engineer need to coordinate which glyphs to draw and how to access them. Hai was instrumental in setting up the project, defining the glyph sets and deciding on shaping strategies for complex text shaping. Liang also broke the project into smaller parts, based on our linguistic data, and frequency of characters.

In addition to feedback sessions with other designers, we also conducted a validation phase, calling on experienced text users, publishers, editors, and scholars to provide unbiased, implementable critiques in response to the natural text. Their feedback was invaluable, and allowed us to make educated adjustments.

design workflow

It took four years, and tens of thousands of hours of work to complete this project, creating 882 fonts, involving 40 designers, experts, language consultants, and reviewers. We hope that the project will be useful to communities of all sizes across various language barriers, and can’t wait to see what people will create with these fonts.

Contributors to November South Asia project: Aadarsh Rajan, Anand Naorem, Arun Pynadath, Arya Purohit, Athul Jayaraman, Babu Ram Soren, Fiona Ross, Hashim Padiyath, Héctor Mangas Afonso, Hitesh Malaviya, Jyotish Sonowal, Kalapi Gajjar-Bordawekar, Karthik Malli, Kosala Senevirathne, Liang Hai, Lucas Horn, Maithili Shingre, Namrata Goyal, Neelakash Kshetrimayum, Nina Botthof, Noopur Datye, Oscar Guerrero, Parimal Parmar, Pathum Egodawatta, Pratyush Das, Purushoth Kumar, Rabindranath Murmu, Ramakrishna Manda, Sanathoi Laishram, Santhosh Thottingal, Shashi Guduru, Shuchita Grover, Soniya Stella, Subhashish Panigrahi, Sudhir Horo, and Suman Bhandary.