na languages names

A Collaborative Research Project to Overcome Digital Language Support Barriers for Indigenous North American Typography

Typotheque is a Netherlands-based type design company. We develop modern fonts for languages spoken and written all over the world.

We work not only with major languages that bring in most of our company revenue, but are committed to supporting digitally under-resourced languages, and working with Indigenous communities in their goals for language revitalization and reclamation.

We are partnering with the Nattilik, Dakelh, and Haíɫzaqv communities in order to overcome technical challenges faced by these communities at the Unicode Standards level. We work together with these and other communities to provide digital language tools and solutions without charge so that they can access, write and view the correct typographic representation of their languages on all of their digital devices with reduced and ideally no barriers, strengthening the overall vitality of these languages both today and into the future.

Typotheque funds such work internally. In order to make this possible, we look for sustainable ways to support our community partnerships. Currently, most of the funding for our collaborative work with Indigenous communities comes from a combination of font licensing and custom type development for major international clients. When a customer pays for or licenses one of our fonts, we allocate part of the revenue to developing and providing contemporary and high quality fonts and keyboards to Indigenous communities.

We also continue to experiment with other ways of funding community-based projects for which there is no established economic model, and have successfully crowdfunded work on Cherokee font development, using our community of clients and fans that we call Typotheque Club.

Typotheque Indigenous North American Type Project

The 293 First Nations, Inuit, and Métis communities across what is now Canada speak 84 languages and dialects in both traditional homelands and in urban centres. There is a great wealth of orthographic and typographic variation within the unique writing systems that these communities employ to represent their languages. Our project aims to strengthen the active language revitalization, preservation, and reclamation programmes within Indigenous communities by supporting easier and barrier-free access to utilising their writing systems in digital spaces.

Despite the high number of living languages and rich (ortho)graphic diversity, Indigenous communities continue to face widespread challenges in actively accessing and using their languages on digital platforms. These barriers stem both from questions around how a community’s orthography must perform for accurate reading and comprehension, and also in how the typography must appear and shape the orthography in text composition. Even if a community has a stable digital text encoding framework and supportive language tools, issues presented by a lack of adequate keyboards, accurate font rendering support and glyph representation can restrict a community’s ability to exercise self-determination over the appearance and functionality of their language. Digital texting on smartphones, computers and other platforms is an essential space for everyday language engagement, mobilisation and transmission. Challenges of any kind in these domains risk the success of community-driven language revitalization, preservation and reclamation programmes, and impede their overall sovereignty.

Map of Indigenous languages in Canada
Above, a map of the Indigenous languages in Canada and their current geographic distribution.

In order to overcome such challenges, we are working to comprehensively research and document the text encoding, keyboard sources, technical issues, and typographic preferences for each individual and specific Indigenous language community in North America, with the goal of reducing barriers to access and strengthening the digital and overall vitality of these languages today and into the future. We will do this by working in active collaboration with local Indigenous language keepers in each community. Through direct and highly customised partnerships with each individual community, we will ensure that the information pertaining to the completeness of Unicode character sets and stability, keyboard layouts, font support, and Common Locale Data Repository (CLDR) data gathered is accurate and community-centred.

This work is timely and urgent, not only given UNESCOs International Decade of Indigenous Languages (2022 – 2032), but also in light of the latest Canadian Census data (2021) which reports 237,420 Indigenous language speakers in Canada, marking a decline of 10,750 speakers from the 2016 Census, the first decline since comparable data on these languages began to be collected in 1991. On the other hand, the First Peoples’ Cultural Council’s 2022 Report ‘On the status of B.C. First Nations Languages’ documents an encouraging increase of 3,106 new language learners since 2018. Based on our conversations and the 6 existing collaborative partnerships with Indigenous language communities across Canada – Sḵwx̱wú7mesh sníchim (Squamish), Haíɫzaqv (Heiltsuk), Secwepemctsín (Shuswap), ᑕᗸᒡ (Dakelh / Carrier), ᓇᑦᕠᓕᖕᒥᐅᑐᑦ (Nattilik), ᓀᐦᐃᓇᐍᐏᐣ (nêhinawêwin / Swampy Cree) – in which we are already engaged, the success and stability of current language revitalization efforts hinge on how readily the language can be mobilised in digital spaces and on everyday devices. Barriers towards using Indigenous languages in digital spaces significantly restrict the ability of community members to engage with all users, and younger generations of learners in particular. Impediments to digital access can further negatively impact a community’s ability to preserve texts and use their language reliably in day-to-day scenarios. We anticipate that our project will begin in the Spring of 2024, with a focus on two Indigenous language communities in the first, pilot phase. From here, we hope to grow the project to include other communities.

About Language Support and Typography Issues for Indigenous Communities in North America

Below are some examples of different, common technical issues that Indigenous language communities in North America face in using their language in digital spaces:

Unicode and Text Encoding

The Unicode Standard is the international standard for how all text is encoded on digital devices. This means that all characters in a language's orthography must be included in Unicode for it to be accessible for typing and exchanging text across devices such as desktop computers, smartphones, and tablets:

Unicode illustration
The above image shows the interconnectedness and relationship between the Unicode Standard – how all digital text for the world's languages are encoded – and the tools on the devices which we use to enter text. Ideally, if all of the characters that our language needs is encoded in Unicode, and fonts and keyboards support all of these characters, and our devices support the rendering of the characters,then we can freely and reliably use, exchange, and store any text that we compose with any other device in the world.

In order for any language to be fully and accurately supported across all devices, the required characters must be in Unicode and our digital devices must follow the Unicode Standard, meaning that the language tools (keyboard and fonts) that we use on such devices must be fully Unicode-compliant. If these criteria are met, then we are able to accurately and consistently display text on all of our devices and share texts with anyone else's device.

Proposing Additions or Revisions to the Unicode Standard

Sometimes, language communities may be missing characters in the Unicode Standard that they require to use their language on digital text platforms. Similarly, sometimes there are mistakes in the Unicode Standard regarding how characters for some communities should appear visually. We have worked with several Indigenous communities in Canada to propose new characters and character representation revisions to the Unicode Standard. Below, we share links to these successful proposals:

It is important to note that once new characters are added to the Unicode Standard, it can take some time before those characters are published in a new version of the Unicode Standard, and before major operating systems provide support for those characters on their devices (such as Apple, Microsoft, Android).

Local Typographic Preferences

The Unicode Standard specifies character code charts that show a general representation for how a character may appear generally in text. While all of the default, core fonts on our devices follow this visual character appearance model, however, it is possible that different language communities have a different graphic preference that is distinct from the general, standard representation of a given Unicode character. It is possible to support these local preferences through fonts:

locl variants
The above two images show examples of locally-preferred typographic forms for different Indigenous language communities, in the Syllabics and Latin script writing systems, respectively, where the standard Unicode appearance of these characters is different than what the local community prefers for the graphic appearance they identify with. As core fonts on common operating systems will follow the Unicode Standard's graphic representation of a character, many fonts that are commonly available may not render the forms that some communities expect for their typography. It is possible to accommodate these preferences through font technology solutions on devices.

Diacritic Mark Appearance and Rendering

Some languages require many diacritic marks that many common fonts may not clearly distinguish visually:

diacritic representation
Above, the correct and required representation of the comma above glottalization mark is shown in examples one and two, in serif and sans serif font styles, in order to allow for adequate distinction in the glottalization mark in contrast to the acute diacritic mark. In example three, the same sequence is shown in a core system font that renders the comma above glottalization mark in a form that is nearly graphically identical to the acute diacritic mark, making the the two marks indistinguishable for readers.

In the above example, the first two fonts render the distinction adequate between the comma above diacritic and the acute diacritic marks, while the third line font renders these two characters as almost visually identical.

Furthermore, even if a font provides the correct design for these shapes, the font may not render the dynamic diacritic marks accurately, as in the below example, second line:

broken diacritic shaping
The above example shows the correct rendering of the Haíɫzaqv name "H̓áust̓i", which requires the combining diacritical marks to "stack" above letters "H" and "t". For more information on this and the resulting barriers that inadequate support for this diacritic mark rendering causes, please see this article in CBC News.

Even if fonts and application platforms can accurately represent the appearance and rendering of diacritical marks, third-party applications such as Microsoft Word may not allow a large enough "text box height" in order to capture the height of diacritical marks, which may become "clipped" (please see "clipping" in Technical Terminology):

word mark clipping
The above image shows a common experience for many First Nations language users where their diacritic marks are "clipped" in common word processing softwares such as Microsoft Word, rendering the marks ambiguous and unclear.

Font Knowledge for Improving Indigenous Font and Keyboard Support

A central aim of our project is to help communities identify issues in fonts and keyboards that present barriers towards language use in digital text, and to provide public font development documentation, data, and knowledge that allows for all software providers to ensure that their fonts meet the standards determined by local Indigenous communities, to ensure that each community's language works correctly and is rendered correctly from a typographical perspective. An example of this is the following public GitHub repository on font development knowledge for the Syllabics writing system and it's typography:

Syllabics Knowledge GitHub Repository

Font designers and developers require language data such as the Unicode characters that a community uses in their keyboard, to know how diacritic marks should appear and perform in digital text, and some short language text examples that match the list of Unicode characters in order to ensure that the font harmoniously represents the natural language in text. Font designers also require knowledge on how each community prefers certain letter shapes to appear, which may differ from the standard way that Unicode represents each character.