Pronunciation Engine · Part 4 of 5
8 min read

The Pronunciation Engine — Why Accurate Phoneme Extraction Is Everything

The entire scoring system — Strategic Fit, Technical Quality, collision detection — depends on one thing: correctly extracting the phonemes from a name. A standard dictionary handles common words. Brand names are not common words. Here is how the three-tier engine handles both.

Series

The previous post laid out the four-step scoring system in full — how a name's Sound Print is compared against a Brand Profile to produce a Strategic Fit Score, and how four usability pillars independently produce a Technical Quality score. Both rely entirely on a single upstream input: the correct phoneme sequence for the name being evaluated.

If that extraction is wrong, everything downstream is wrong. A misread phoneme changes the Composition Score. A misclassified vowel changes the flow calculation. An inaccurate pronunciation changes the Bar Test result. Accuracy at the phoneme extraction layer is not an engineering detail — it is the critical dependency the entire framework rests on.

"Brand names are often invented words that don't exist in any dictionary. The system had to handle both."

The Three-Tier Pronunciation Engine

The engine resolves every name through a strict hierarchy. Each tier is only reached if the previous tier cannot produce a result. The logic is deliberate: use the most authoritative source available, fall back only when necessary, and never guess when a definitive answer exists.

1
CMU Pronouncing Dictionary 133,854 entries
First query

The Carnegie Mellon University Pronouncing Dictionary is the primary source — a comprehensive, linguistically validated database of English word pronunciations in ARPAbet phoneme notation. For any name that is a real English word or close derivative, this tier produces an authoritative result. If the name is found here, the engine stops and returns that pronunciation.

2
Brand Name Database 2,800+ curated entries
If not in Tier 1

A curated database of real brand name pronunciations — covering established global brands whose names are not standard dictionary words. Names like Google, Zara, Häagen-Dazs, or Rolex exist here with their correct phoneme sequences, not general linguistic approximations. This tier handles the large class of names that are real brands but not real English words. If found, the engine stops here.

3
On-Device Neural Network 6MB · runs offline
If not in Tier 1 or 2

For invented names — neologisms that exist in neither the CMU dictionary nor the brand database — a specialised 6MB neural network model predicts pronunciation entirely on-device. The model was trained on both the CMU dictionary and the brand name database, specifically to optimise accuracy for brand-like invented words rather than general English text. It runs entirely offline. No name is sent to any server. The model handles the full range of invented brand constructions: syllable combinations, phonetic morphs, portmanteau blends, and pure neologisms.

What This Looks Like Across Name Types

The practical effect of the three-tier hierarchy is that the engine handles the full spectrum of names a naming process produces — from real words used as brand names to entirely invented constructions — without manual input from the user.

Name Type Phoneme Sequence Tier Used
Apple
Real word
AE · P · AH · L
Tier 1 — CMU Dict
Slack
Real word
S · L · AE · K
Tier 1 — CMU Dict
Google
Invented word / brand
G · UW · G · AH · L
Tier 2 — Brand DB
Rolex
Invented word / brand
R · OW · L · EH · K · S
Tier 2 — Brand DB
Baksal
Pure neologism
B · AE · K · S · AH · L
Tier 3 — Neural Net
Klexaro
Pure neologism
K · L · EH · K · S · AH · R · OW
Tier 3 — Neural Net
On Privacy

All three tiers of the pronunciation engine run entirely on-device. The CMU dictionary, the brand name database, and the neural network model are all stored locally. When you type a candidate name, no query leaves your phone or computer at any point in the phoneme extraction process. Your naming ideas stay private.

Collision Detection

A phonetically strong name — one that scores well against its chosen brand profile — can still fail if it sounds or looks too close to an existing brand. Legal risk and perceptual confusion both stem from proximity, not just identity. The collision detection layer addresses this independently from the phonetic scoring.

2,800+
global brands cross-referenced for phonetic and visual similarity
Every candidate name is automatically compared against the curated brand database — the same database used in Tier 2 of the pronunciation engine — checking both how the name sounds and how it looks in text form. Phonetic similarity catches names that sound like established brands even when spelled differently. Visual similarity catches names that look like established brands even when pronounced differently.
9,800+
checks across 27 languages for profanity, sensitive terms, and negative sentiment
A separate safety layer runs independently from the brand collision check. It screens the candidate name's phonetic form across 27 languages for profanity, culturally sensitive terms, and negative sentiment associations. A name that is clean in English may carry unwanted connotations in a target market. This check surfaces those conflicts before a name reaches any external audience.

Both layers run entirely on-device. The output is immediate and binary:

Clean pass

No phonetic similarity, visual similarity, or cross-language conflicts detected against the 2,800+ brand database or across the 9,800+ language checks.

Specific flag with conflict identified

A flag identifies the precise nature of the conflict — phonetic similarity to a named brand, visual similarity to a named brand, or a specific cross-language profanity or negative sentiment detection — so the issue is actionable, not just a warning.

Pro Users

Pro users can extend the collision database with their own custom brand data — adding proprietary brand portfolios, client competitor sets, or category-specific brand lists to the cross-reference check. The collision system is user-extensible by design.

Why This Architecture Matters

The pronunciation engine and collision detection layer share a common design principle: every check that can be run locally is run locally. The three-tier engine resolves pronunciations on-device. The collision database is stored on-device. The neural network runs on-device. The 27-language safety check runs on-device.

This is not a privacy marketing claim — it is an architectural consequence of how the system was built. A naming process involves candidate ideas that have not been disclosed publicly. The framework was designed so that those ideas never need to leave the device in order to be evaluated.

The final post in this series covers what happens after a name clears evaluation and collision detection — how the Sound Print becomes the foundation for a complete visual identity brief, and why Klexaro functions as an orchestration layer rather than an execution layer for identity design.

Test Your Name Against 2,800+ Brands — Free

Phoneme extraction, scoring, and collision detection all run on-device. Your candidate names stay private throughout the entire process.

Download on App Store →
Previous ← The Scoring System Next in series From Sound Print to Brand Identity →