v0.1.2  · early access
§explainer / zoom meeting keyword alert

Zoom meeting keyword alerts that actually fire on time

·6 min read·by meetping team

Zoom does many things. Firing a live alert when a specific word is spoken in a call is not one of them. The closest built-in feature is the Smart Recording highlight reel, which happens after the call, and the live captions, which scroll regardless of whether you are reading them. This post is about what an actual keyword alert needs to look like and how the latency math works out.

What Zoom ships out of the box

Live captions are decent. They lag the audio by 1-2 seconds on a normal connection, they cover most major languages, and they overlay the call without joining as a bot. They are not a notification. There is no Mac notification, no sound, no on-screen interrupt — they are visual context for people who are already watching the call.

The post-meeting transcript and Smart Recording features surface keywords too late to act on. You can search the transcript after the call, you can ctrl-F your own name, but the moment has passed and the question has been re-routed.

Zoom Apps technically allow third-party panels inside the client, but the API does not expose the live audio stream to those apps. So any actual keyword alert has to live outside the Zoom process, at the OS level.

The latency budget

The whole reason a keyword alert exists is to fire while the speaker is still on the topic. The realistic budget, measured from utterance to on-screen alert, is two seconds. Above that the speaker has moved on; you answer the wrong question or you ask them to repeat.

That budget spends itself on four things: the audio capture path, the ASR first-partial, the keyword match, and the notification round-trip to the macOS notification daemon. On MeetPing's pipeline the split is roughly: 30 ms capture, 1.4 s Parakeet first-partial, sub-millisecond match, 50 ms notification — call it 1.5 seconds at the median. Read the Parakeet engineering writeup for the chunk-sizing detail behind that 1.4 s.

↳ pull quote

The two-second budget evaporates the moment you add a cloud round-trip. Local ASR on the ANE is not the differentiator — it is the only way the budget closes.

What a keyword alert actually needs

The matching layer is harder than it looks on paper. A naive substring search produces two failure modes — false positives (the keyword rob firing on robust) and false negatives (the name Ogtay never firing because Parakeet heard octaye). Both are unacceptable in a tool that is meant to interrupt you.

MeetPing's matcher runs in three passes per token:

  • Word-boundary regex — cheap, exact. Handles the 80% case and rejects substrings.
  • Soundex equivalence — maps Ogtay and octaye to the same four-character code, so phonetic mutations still trigger.
  • Levenshtein with a small edit budget — catches the long-tail mutations Soundex misses (typo-level differences).

There is an 8-second per-keyword cooldown so a single mention pings once, not seven times. Full algorithm is on the keyword watch page.

The auto-arm requirement

A listener you have to remember to switch on does not last a week. Every meeting alert tool eventually faces this — if the user has to open the app and click "start listening" before each call, the workflow falls apart on the busy days. MeetPing's auto-arm watches Core Audio for mic activation and the foreground app; when both conditions point at Zoom the listener arms itself, and when the mic releases it disarms. There is no always-on recording.

Zoom-specific gotchas

A few things to know if you set up keyword alerts on Zoom specifically:

  • System audio capture requires screen recording permission on macOS 13+. Zoom's participant audio comes through that path on most installs.
  • Krisp upstream changes the audio characteristic of the mic feed (noise floor, EQ); the listener handles this fine, but the first calibration run can be slightly different.
  • Zoom breakout rooms work because the mic device does not change; the watcher keeps streaming. Switching between two Zoom calls in different windows does the right thing automatically.

The take

Zoom does not ship keyword alerts. The third-party shape that works is a macOS menubar app with auto-arm, an on-device streaming ASR, and a three-pass matcher. Anything missing one of those three slips the latency budget or the accuracy budget and stops being useful.

MeetPing is one such app. If you want it on a non-Zoom stack — Google Meet, Microsoft Teams, Slack huddles — it works the same way on all of them. The auto-arm detector knows about each.

Zoom alerts that actually fire.

MeetPing pings the moment a watchword is spoken — typically inside 1.5 seconds. On-device, $24.90 lifetime.