v0.1.0  · early access
§engineering / phonetic matching speech recognition swift

Phonetic match fixes name detection

·7 min read·by meetping team

The first version of MeetPing's keyword watcher used a single regex with word boundaries. It was 30 lines of Swift, it was fast, and it failed in the most predictable way possible: it missed the founder's name approximately every other meeting. The founder's name is Ogtay. Parakeet hears it and writes down "octaye." The regex shrugs.

This post is what we did about it: a worked example of why straight string match fails for proper nouns through a streaming ASR, why Soundex fixes most of it, why Levenshtein catches the rest, and the actual Swift loop that ties the three together.

The mutations are predictable

ASR mutations on uncommon names are not random; they're weighted by the language-model prior the decoder was trained on. "Ogtay" pronounced /ɒɡˈtaɪ/ has no high-probability completions in the model's training data, so the decoder reaches for nearby phonetic neighbours that do. We collected the actual Parakeet outputs over six weeks. Top variants:

  • octaye — most common (rhymes with "okay")
  • october — when context primes a month
  • aug-tay — when the speaker stresses the first syllable
  • october (again) — about a quarter of all hits

Three of those four contain no substring of the literal name. A regex with /\bogtay\b/i fires zero times.

Soundex maps "Ogtay" and "octaye" to the same code

Soundex is a 1918 algorithm for indexing names by sound. It runs in linear time, encodes any English-ish word as a one-letter-plus-three-digits code, and groups phonetically similar names together. The rules: keep the first letter, drop vowels (and h, w, y) after the first position, then map consonants to digits — b/f/p/v → 1, c/g/j/k/q/s/x/z → 2, d/t → 3, l → 4, m/n → 5, r → 6 — collapsing adjacent duplicates. Pad with zeros to length four.

Worked example for our test case:

"Ogtay"  → O (keep) g→2 t→3 a(drop) y(drop)
         → "O23"  → pad → "O230"

"octaye" → O (keep) c→2 t→3 a(drop) y(drop) e(drop)
         → "O23"  → pad → "O230"   ← match

Same code. So if we Soundex both the watchword and every confirmed Parakeet token, we catch the variant for free. "October" Soundexes to O236 — different code, doesn't fire — which is a feature, not a bug, when you're trying to avoid pinging on the literal month.

↳ pull quote

Soundex is a 108-year-old algorithm and it solves about two-thirds of our ASR-on-proper-nouns problem in twelve lines of Swift.

Levenshtein catches the long tail

Some mutations beat Soundex. "Aug-tay" with the hyphen tokenises into two tokens. "October" (e instead of o) Soundexes the same as "October" but the literal-vs-phonetic tie-break could go wrong. For these we run a second pass: edit-distance, with a budget of 2 for tokens up to 6 characters and 3 for longer. Levenshtein is O(n·m) but the inputs are tiny (single words), and we only run it on the 0.5% of tokens that didn't match regex or Soundex.

Edit distance has a downside: too generous a budget and you'll match every short word against every other short word. We score: regex hit = 1.0, Soundex equivalence = 0.6, edit distance ≤ 2 = 0.4. Anything below 0.4 is ignored. Multiple watchwords matching the same token: highest score wins.

The Swift loop

func match(token: String, against words: [Watchword]) -> Match? {
    var best: Match?
    for w in words {
        // 1. word-boundary regex (cheap, exact)
        if w.regex.firstMatch(in: token) != nil {
            return Match(word: w, score: 1.0, kind: .literal)
        }
        // 2. soundex equivalence
        if soundex(token) == w.soundex {
            best = better(best, Match(word: w, score: 0.6, kind: .phonetic))
            continue
        }
        // 3. levenshtein fallback (budget = 2 or 3)
        let budget = token.count > 6 ? 3 : 2
        if levenshtein(token, w.text, max: budget) <= budget {
            best = better(best, Match(word: w, score: 0.4, kind: .fuzzy))
        }
    }
    return best
}

The whole matcher is about 90 lines once you include the Soundex and bounded Levenshtein implementations. It runs per-token on the confirmed Parakeet output stream — never on partials, because partials change. The 8-second per-watchword cooldown is on top of this; even if a noun fires three times in a sentence we ping once.

Does it work

The regex-only version misses most proper-noun mentions the moment Parakeet renders them differently than the watchword spelling — which is most of the time for non-English names. Adding Soundex catches the bulk of phonetic neighbors. Adding bounded Levenshtein picks up the typo-style remainder. We'll publish concrete recall numbers once we've run the full corpus post-launch; this post will get an addendum with the table.

Read the keyword-watch feature page for what this looks like from the user side, or the Parakeet writeup for the layer below.

Phonetic + fuzzy matching, in your menubar.

MeetPing's keyword watcher runs three matchers per token. Add unusual names to your list — they'll fire.