Typotheque Indigenous North American Type

A Collaborative Research Project to Overcome Digital Language Support Barriers for Indigenous North American Typography
Typotheque is a Netherlands-based type design company. We develop modern fonts for languages spoken and written all over the world.
We work not only with major languages that bring in most of our company revenue, but are committed to supporting digitally under-resourced languages, and working with Indigenous communities in their goals for language revitalization and reclamation.
We are partnering with the Nattilik, Dakelh, and Haíɫzaqv communities in order to overcome technical challenges faced by these communities at the Unicode Standards level. We work together with these and other communities to provide digital language tools and solutions without charge so that they can access, write and view the correct typographic representation of their languages on all of their digital devices with reduced and ideally no barriers, strengthening the overall vitality of these languages both today and into the future.
Typotheque funds such work internally. In order to make this possible, we look for sustainable ways to support our community partnerships. Currently, most of the funding for our collaborative work with Indigenous communities comes from a combination of font licensing and custom type development for major international clients. When a customer pays for or licenses one of our fonts, we allocate part of the revenue to developing and providing contemporary and high quality fonts and keyboards to Indigenous communities.
We also continue to experiment with other ways of funding community-based projects for which there is no established economic model, and have successfully crowdfunded work on Cherokee font development, using our community of clients and fans that we call Typotheque Club.
Typotheque Indigenous North American Type Project
The 293 First Nations, Inuit, and Métis communities across what is now Canada speak 84 languages and dialects in both traditional homelands and in urban centres. There is a great wealth of orthographic and typographic variation within the unique writing systems that these communities employ to represent their languages. Our project aims to strengthen the active language revitalization, preservation, and reclamation programmes within Indigenous communities by supporting easier and barrier-free access to utilising their writing systems in digital spaces.
Despite the high number of living languages and rich (ortho)graphic diversity, Indigenous communities continue to face widespread challenges in actively accessing and using their languages on digital platforms. These barriers stem both from questions around how a community’s orthography must perform for accurate reading and comprehension, and also in how the typography must appear and shape the orthography in text composition. Even if a community has a stable digital text encoding framework and supportive language tools, issues presented by a lack of adequate keyboards, accurate font rendering support and glyph representation can restrict a community’s ability to exercise self-determination over the appearance and functionality of their language. Digital texting on smartphones, computers and other platforms is an essential space for everyday language engagement, mobilisation and transmission. Challenges of any kind in these domains risk the success of community-driven language revitalization, preservation and reclamation programmes, and impede their overall sovereignty.

In order to overcome such challenges, we are working to comprehensively research and document the text encoding, keyboard sources, technical issues, and typographic preferences for each individual and specific Indigenous language community in North America, with the goal of reducing barriers to access and strengthening the digital and overall vitality of these languages today and into the future. We will do this by working in active collaboration with local Indigenous language keepers in each community. Through direct and highly customised partnerships with each individual community, we will ensure that the information pertaining to the completeness of Unicode character sets and stability, keyboard layouts, font support, and Common Locale Data Repository (CLDR) data gathered is accurate and community-centred.
This work is timely and urgent, not only given UNESCOs International Decade of Indigenous Languages (2022 – 2032), but also in light of the latest Canadian Census data (2021) which reports 237,420 Indigenous language speakers in Canada, marking a decline of 10,750 speakers from the 2016 Census, the first decline since comparable data on these languages began to be collected in 1991. On the other hand, the First Peoples’ Cultural Council’s 2022 Report ‘On the status of B.C. First Nations Languages’ documents an encouraging increase of 3,106 new language learners since 2018. Based on our conversations and the 6 existing collaborative partnerships with Indigenous language communities across Canada – Sḵwx̱wú7mesh sníchim (Squamish), Haíɫzaqv (Heiltsuk), Secwepemctsín (Shuswap), ᑕᗸᒡ (Dakelh / Carrier), ᓇᑦᕠᓕᖕᒥᐅᑐᑦ (Nattilik), ᓀᐦᐃᓇᐍᐏᐣ (nêhinawêwin / Swampy Cree) – in which we are already engaged, the success and stability of current language revitalization efforts hinge on how readily the language can be mobilised in digital spaces and on everyday devices. Barriers towards using Indigenous languages in digital spaces significantly restrict the ability of community members to engage with all users, and younger generations of learners in particular. Impediments to digital access can further negatively impact a community’s ability to preserve texts and use their language reliably in day-to-day scenarios. We anticipate that our project will begin in the Spring of 2024, with a focus on two Indigenous language communities in the first, pilot phase. From here, we hope to grow the project to include other communities.
About Language Support and Typography Issues for Indigenous Communities in North America
Below are some examples of different, common technical issues that Indigenous language communities in North America face in using their language in digital spaces:
Unicode and Text Encoding
The Unicode Standard is the international standard for how all text is encoded on digital devices. This means that all characters in a language's orthography must be included in Unicode for it to be accessible for typing and exchanging text across devices such as desktop computers, smartphones, and tablets:

In order for any language to be fully and accurately supported across all devices, the required characters must be in Unicode and our digital devices must follow the Unicode Standard, meaning that the language tools (keyboard and fonts) that we use on such devices must be fully Unicode-compliant. If these criteria are met, then we are able to accurately and consistently display text on all of our devices and share texts with anyone else's device.
Proposing Additions or Revisions to the Unicode Standard
Sometimes, language communities may be missing characters in the Unicode Standard that they require to use their language on digital text platforms. Similarly, sometimes there are mistakes in the Unicode Standard regarding how characters for some communities should appear visually. We have worked with several Indigenous communities in Canada to propose new characters and character representation revisions to the Unicode Standard. Below, we share links to these successful proposals:
- ᓇᑦᕠᓕᖕᒥᐅᑐᑦ (Nattilik) community: Proposal to encode 16 additional characters to the Unified Canadian Aboriginal Syllabics
 
- ᑕᗸᒡ (Dakelh) community: Proposed changes to the representative glyphs of the Unified Canadian Aboriginal Syllabics code charts
 
- Haíɫzaqv (Heiltsuk) community: Proposal to Encode 3 Additional Latin Characters for Wakashan and Salishan Languages to the Unicode Standard
 
It is important to note that once new characters are added to the Unicode Standard, it can take some time before those characters are published in a new version of the Unicode Standard, and before major operating systems provide support for those characters on their devices (such as Apple, Microsoft, Android).
Local Typographic Preferences
The Unicode Standard specifies character code charts that show a general representation for how a character may appear generally in text. While all of the default, core fonts on our devices follow this visual character appearance model, however, it is possible that different language communities have a different graphic preference that is distinct from the general, standard representation of a given Unicode character. It is possible to support these local preferences through fonts:

Diacritic Mark Appearance and Rendering
Some languages require many diacritic marks that many common fonts may not clearly distinguish visually:

In the above example, the first two fonts render the distinction adequate between the comma above diacritic and the acute diacritic marks, while the third line font renders these two characters as almost visually identical.
Furthermore, even if a font provides the correct design for these shapes, the font may not render the dynamic diacritic marks accurately, as in the below example, second line:

Even if fonts and application platforms can accurately represent the appearance and rendering of diacritical marks, third-party applications such as Microsoft Word may not allow a large enough "text box height" in order to capture the height of diacritical marks, which may become "clipped" (please see "clipping" in Technical Terminology):

Font Knowledge for Improving Indigenous Font and Keyboard Support
A central aim of our project is to help communities identify issues in fonts and keyboards that present barriers towards language use in digital text, and to provide public font development documentation, data, and knowledge that allows for all software providers to ensure that their fonts meet the standards determined by local Indigenous communities, to ensure that each community's language works correctly and is rendered correctly from a typographical perspective. An example of this is the following public GitHub repository on font development knowledge for the Syllabics writing system and it's typography:
https://github.com/typotheque/syllabics-knowledge

Font designers and developers require language data such as the Unicode characters that a community uses in their keyboard, to know how diacritic marks should appear and perform in digital text, and some short language text examples that match the list of Unicode characters in order to ensure that the font harmoniously represents the natural language in text. Font designers also require knowledge on how each community prefers certain letter shapes to appear, which may differ from the standard way that Unicode represents each character.
Heiltsuk Revitalization (Haíɫzaqv Nation)
Through a memorandum of understanding (MoU) signed on 30 April 2024, we are proud and grateful to work in respectful collaboration with Heiltsuk Revitalization and the Heiltsuk Tribal Council towards strengthening the Haíɫzaqv community's ongoing efforts towards language revitalization, preservation, and reclamation. This partnership continues and extends the work that Heiltsuk Revitalization and Typotheque begin together in collaboration in 2023, to successfully request three new characters to the Unicode Standard and ensure their publication in version 16.0 of the Unicode Standard, in September 2024. Below, we present the MoU which details our shared goals and guiding principles for the project:
FirstVoices (First Peoples’ Cultural Council)
Through a memorandum of understanding (MoU) signed on 8 October 2024, The First Peoples' Cultural Council (FPCC) and Typotheque entered into a collaborative partnership of sharing resources and pledging to work together towards investigating issues faced in digital text input (keyboards) and graphic output (fonts). This collaboration initiative will focus on a close relationship between the FPCC's FirstVoices team, the member Indigenous communities participating in the FirstVoices project, and Typotheque.
Below, we present the MoU which details our shared goals and guiding principles for this working partnership:
Text Encoding Refers to any means of preparing, storing, accessing, and exchanging digital text in digital software and hardware systems, and sharing text reliably between multiple devices. Today, the international standard for text encoding is the Unicode Standard, which all major operating system and device manufacturers (Apple, Google, Microsoft), modern web browsers, and third-party applications have adopted a mandate to follow. In order for text encoding to work correctly, all characters must be within the common standard (Unicode) and all devices, applications (Microsoft Word) and language tools (keyboards and fonts) must follow this standard.
Unicode Unicode is a non-profit organization that maintains several projects related to digital text encoding, including the The Unicode Standard, which is the international standard for text encoding on digital systems and devices for text storage and interchange. Unicode also maintains several other projects under its organization including the CLDR project (please see "CLDR" for more information).
CLDR stands for "Common Locale Data Repository", and is a project of the Unicode Consortium. The CLDR is a repository of language data for use in systems in order to provide language-specific environments, both on operating system platforms (Android, iOS(iPhone), macOS, Windows, etc.), for third-party applications (Microsoft Excel, Word), and for web browsers. The language data in the CLDR allows software and hardware manufacturers to adapt their software to the conventions of different languages for common software tasks and navigation, such as menu labels and file names, date and time, in order to show these conventions for a local language region.
ASCII Is the abbreviated from of American Standard Code for Information Interchange, which is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Modern computer systems have evolved to use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set. The first 128 character code points in the Unicode Standard are these 128 ASCII characters for backward compatibility.

Unicode Code Charts The Unicode Standard presents a visual definition of all characters encoded (available or that are published) in the Unicode Standard, providing both a visual, general graphic representation of each character, it's associated unique code point, and a character names list. An example of a Unicode code chart is shown here.

Text Shaping Is the process of taking Unicode character sequences (text) input by a keyboard and in conjunction with a font, representing that text in the way that it must be composed for a specific orthography. From the context of Indigenous languages in North America, many Latin script-based writing systems require the shaping of diacritical marks that must "stack" above a given base letter to modify it's sound. This is made possible through Unicode's mark-to-mark attachment technology, and is only possible if a receiving font supports this technology. For more detailed information on shaping, please see this resource on HarfBuzz, a text shaping engine used by all modern web browsers.

Text Clipping Text clipping occurs when diacritic marks (or any letterform elements) exceed a pre-defined boundary that an application's developers have specified, which all text elements must be contained within in a given line of text:

ISO The International Standards Organization which provides technical standardization specifications for many products, including language encoding. ISO is a non-governmental organization that is international in scope with over 170 member countries. Canada, for example, is a member of ISO, and is represented at ISO through the Standards Council of Canada (SCC).
ISO-106464, known as the "Universal Coded Character Set" (UCS, Unicode), is a specification managed by ISO that is specifically concerned with character encoding standards. It is intentionally kept in-sync with the Unicode Standard, and provides a way for national bodies to have a direct input path for character encoding requirements and concerns. ISO-106464 is concerned only with character encoding repertoires to support all language orthography requirements within its region and the identity of those characters.
Keyboard Layout Refers to the arrangement of keys on a keyboard, which determines how characters and functions are accessed. Many Indigenous languages in North America use characters that are not available on a standard English keyboard. To type these characters, a virtual keyboard (a program that reorganizes the characters on your physical keyboard) is needed, along with a font that can render them correctly.

The above example shows the keyboard layouts for Haíɫzaqvḷa (Heiltsuk)and ᓇᑦᕠᓕᖕᒥᐅᑐᑦ (Nattilik), and highlights a character that is typed by holding the shift key.
Unicode Casing In the Unicode Standard, individual characters are given unique code points in order to distinguish them from any other character in the Unicode Standard. In script's such as the Latin script, many languages require case variation between upper and lowercases for proper nouns. In order for applications to convert lowercase letters to capital letters and vice versa, these letter "pairs" must be encoded as members of the same script.
For example, the recent proposal to add new capital Latin script characters for Haíɫzaqvḷa required the addition of a new lowercase Latin script "lambda" in order to allow for the lowercase λ character in Haíɫzaqvḷa to convert to the new capital letter (U+A7DA LATIN CAPITAL LETTER LAMBDA). Prevously, Haíɫzaqvḷa orthgraphy had made use of the lowercase Greek lambda character (U+03BB λ GREEK SMALL LETTER LAMDA) which can only convert to the uppercase Greek capital Lambda letter (U+039B Λ GREEK CAPITAL LETTER LAMDA), which is not the correct letter pairing required for Haíɫzaqvḷa:

Unicode Confusable Characters
In the Unicode Standard, there are distinct characters with very similar graphic representations that can create confusion for human readers of these characters, due to their similarity in common font representations:

The above example shows confusable characters that can be used to represent one's language in digital text. The characters marked in orange in lines (1) and (2) above, respectively, are distinct character codes with similar graphic representations. They are read distinctly by computer devices, however, human readers may be confused by their visual form.
There may be even greater confusability of some characters in Unicode depending on the script in question, for example, the Syllabics script (UCAS), which features some characters that are almost visually identical, if not identical:

Text Spoofing and Security Risks The act of intentionally using visually-similar characters for malicious purposes in digital text. This means that a bad actor could create visually-confusing web domain addresses, email subtitles and text, or other applications in order to do harm.
To illustrate, the word “ᑭᐢᑫᔨᐦᑕᒼ” in ᓀᐦᐃᔭᐍᐏᐣ (nêhiyawêwin) (Plains Cree) is encoded using U+14BC ᒼ CANADIAN SYLLABICS WEST-CREE M for the final character ᒼ, however, one could also type this same word with U+1466 ᑦ CANADIAN SYLLABICS T as “ᑭᐢᑫᔨᐦᑕᑦ”. We can observe the difference here, but it would be graphically very hard to tell the difference in everyday situations which could lead to the creation of "fake" words and labels that can result in security risks.
Language data is any type of data related to a language that makes it possible for computer software to represent a language in digital text on devices. Operating system and device manufacturers (Apple, Microsoft, Google) and application developers require certain – but not all – aspects of a given languages data in order to accurately render and represent that language on their devices and in applications. In order to make the rendering and representation possible, device and application developers require keyboard and font tools that are capable of allowing the input of the required characters (keyboards) and rendering their correct visual appearance (fonts). As such, keyboard and font developers also require certain elements of a languages data and typographic knowledge in order to accurately represent the language's entry on the device and it's graphic representation.
This section presents the language data principles by which the Typotheque Indigenous North American Type research project adheres to in order to facilitate self-determination of all Indigneous languages we collaborate with in this process, as well as to protect Indigenous language data and each language community's data sovereignty.
Principles
In our work in partnership with Indigenous communities, we adhere to the First Nations principles of OCAP as well as the CARE principles for Indigenous data governance towards how Indigenous language data and information will be collected, stored, used, and made available to the public for the purpose of supporting language support in digital systems and overall sovereignty. Our project maintains, above all else, that each individual Indigenous community always must retain the right to full ownership of all aspects of their language data and self-determination over how their language data may be accessed and used, and whether it may or may not be made publicly available. We ensure that the Indigenous communities that we work in partnership with have full access to all of their language data at all times, during and after the project.
For more information towards the First Nations principles of OCAP and CARE principles for Indigenous data governance, please feel free to follow the above links to learn more.
Before beginning any language software work and to establish outcomes for a project, it is advisable to first assess questions towards the current digital language support situation for your community, and identify goals and required steps that are needed for a given project. The First Peoples' Cultural Council (FPCC) provides the wonderful resource "Check Before you Tech" which provides information and a list of questions to consult with first to help establish goals for a prospective project and partnership.
Purposes
Following our projects guiding principles towards ensuring Indigenous language data sovereignty outlined above, the below section presents purposes towards supporting one's language on digital devices, along with the corresponding language data requirements required by each to achieve each purpose. Alongside the language data elements listed under each purpose, you will find the corresponding minimum licensing type that would be required for developers and designers to be able to work with the language data in order to incoporate it into language tools in order to achieve each purpose:
1. Font Support
- Required Unicode character set for your language. public, reference-only
 - Required rendering of orthography. public, reference-only
 - Corpora example of language (5,000 words). public, reference-only
 - Character / Kerning pairs. public, reference-only
 - Knowledge of typographic conventions. public, reference-only
 - Preferred typographic forms. public, reference-only
 - ISO and OpenType LangSys language tags. public domain
 
The above language data and knowledge is required in order to allow all fonts (those on Apple, Google, and Microsoft devices as well as third-party fonts) to support your language accurately and as is expected by readers in your community. By making this knowledge and data publicly-available, device manufacturer's can ensure that their core fonts (which are used on desktop computers, tablets and smartphones) display your language and it's required rendering and typography correctly. It also allows for other font companies to meet the same standards of rendering and typography for your language community.
2. Default Keyboard on Devices
- Required Unicode character set. public, reference-only
 - Required rendering of orthography (shaping). public, reference-only
 - Character occurrence frequencies. public, reference-only
 - Keyboard source file made available on GitHub. open source CC0
 
In order for major operating system manufacturers (Apple, Google, Microsoft) to add your language's keyboard to their platform, they require that the keyboard source file is available under an open source license so they can implement it legally on their devices. An example of this can be seen on the Nattilik community's GitHub.
3. Operating System Language Environment
- Contribution to Unicode's CLDR locale data set for label and menu translations. Unicode CLA license
 - Knowledge of typographic conventions and expected behaviours. public, reference-only
 - Preferred typographic forms. public, reference-only
 - ISO and OpenType LangSys language tags. public domain
 
Unicode's CLDR project is a collection of language data and translations that allows for all operating system menu labels and date & time to be displayed on your computer, tablet, or smartphone, and therefore allows for a language environment on your device in your language. In order to contribute to CLDR, your community must register an organization account with CLDR, and agree to Unicode's CLA license agreement. "The Unicode CLAs are license agreements that ensure that a contributor retains ownership of any intellectual property rights in their contribution while granting the Unicode Consortium the necessary legal rights to use and redistribute that contribution in the various Consortium products."
4. Map Place Names and Locations
- List of correct community names, streets, rivers, lakes, etc. in your language. open source CC0
 - The geo locale data for each respective community for accurate location. open source CC0
 - Required Unicode character set. public, reference-only
 - Required rendering of orthography. public, reference-only
 - Knowledge of typographic conventions and expected behaviours. public, reference-only
 - Preferred typographic forms. public, reference-only
 - ISO and OpenType LangSys language tags. public domain
 
Making map place names and locations for your community is important not only so that place names are correctly represented in your community's geographic region and traditional lands, but so that it provides a very strong requirement for device manufacturers (Apple, Microsoft, Google) to adopt and implement full support for how your writing system must render and appear graphically in digital text in order for their Maps softwares to display names correctly and accurately. This in turn also pushes these companies to provide a default keyboard on their system to ensure that users can input text for their language when using the Maps application.
Available typographic variants for Indigenous languages
The following graphics present the presently-available typographic glyph variants that are available in the Typotheque Lava, November, and Zed typeface families that can be substituted for the standard, common, glyph form of certain letters. These variants can be requested by organizations and local communities in order to tailor their typography to match their preferences of how their language should appear graphically. Note that for all of these characters, the Unicode input values remain intact, only the drawing of each character's glyph changes.
British Columbia First Nations
Lava
The following glyph variants are available for Lava, with corresponding italic counterparts. See and try the whole Lava font family here:

November
The following glyph variants can be requested for implemetation in the November font family, for all styles. See the full set of styles available for November here:

Zed Display
The following glyph variants can be requested for implemetation in the Zed Display font family, for all styles. See the full set of styles available for Zed Display here:

* Note that the default positioning of the ogonek diacrtic mark in First Nations font builds is in the centered position. All characters that use the ogonek would have these variants applied.** The lowercase ascender letters h and b would also take this mark positioning variant to follow the example of k, above.
Typographic and Orthographic Concerns for Indigenous Languages
- 2022. Julia Schillo and Mark Turin. “Type right: Examining the underlying causes of common typeface and font errors for Indigenous orthographies, and a possible path forward.” Language Documentation and Conservation, 16: 364-398.
 - 2020. Julia Schillo and Mark Turin. “Applications and Innovations in Typeface Design for North American Indigenous Languages.” Book 2.0, 10 (1): 71–98.
 - Kevin King, "Syllabics typographic guidelines and local typographic preferences", From Typotheque, 24 January 2022
 - Kevin King, "On developing a secondary style for the Canadian Syllabics", From Typotheque, 24 January 2022 #### Related Reading of Indigenous Language Support Barriers
 - Hilary Bird, ‘Baby named Sahaiʔa prompts changes to Vital Statistics Act’. From CBC News, 13 June 2016, accessed 4 April 2022, https://www.cbc.ca/amp/1.3630353
 - Betsy Trumpener, ‘Heiltsuk woman unable to restore Indigenous surname on ID because system can't handle its spelling’, CBC News, Posted: Jul 08, 2021 6:00 AM PT | Last Updated: July 8, 2021. Accessed 8 March 2022, https://www.cbc.ca/news/canada/british-columbia/heiltsuk-nation-indigenous-name-bc-government-identification-1.6093186
 - Yvette Brend, ‘Indigenous parents push for birth registries to allow their languages' special characters, accents’, CBC News, British Columbia, Posted: Apr 22, 2022 1:00 AM PT Accessed 22 April 2022, https://www.cbc.ca/news/canada/british-columbia/indigenous-names-vital-stats-1.6426239
 - “Province's Vital Statistics Act restricts what accents and symbols can be used” https://www.cbc.ca/news/indigenous/first-nations-baby-name-manitoba-1.6356017 ### Press
 - CBC News, "Special syllabics developed in Nunavut mean Nattilingmiutut can be read anywhere in the world".
 - CBC News, "Dakelh language to get standardized writing system for keyboard use".