Speaker ID Configuration¶

File: configs/speaker.yaml
Commands: kenzy-speaker, kenzy-enroll, kenzy-setup

The speaker identification service uses a SpeechBrain ECAPA-TDNN model to compare incoming audio against enrolled speaker profiles and return the closest match.

Full reference¶

Service¶

Key	Default	Description
`host`	`"127.0.0.1"`	Bind address
`port`	`8768`	HTTP port
`log_level`	`"info"`	Log verbosity

Model¶

Key	Default	Description
`model_source`	`"speechbrain/spkrec-ecapa-voxceleb"`	HuggingFace model ID. Downloaded once by `kenzy-setup`.
`model_save_dir`	`"models/speaker"`	Local cache directory for the downloaded model

Speaker profiles¶

Key	Default	Description
`embeddings_dir`	`"data/speakers"`	Directory containing per-speaker `.npy` embedding files. Each file is named `<speaker_name>.npy`.
`identify_threshold`	`0.25`	Cosine similarity threshold [0.0–1.0]. Utterances below this score are attributed to `unknown_speaker`.
`unknown_speaker`	`"unknown"`	Name returned when no enrolled speaker exceeds the threshold.

Enrollment (`kenzy-enroll`)¶

Key	Default	Description
`enroll_sample_rate`	`16000`	Microphone sample rate during enrollment
`enroll_silence_rms`	`300`	RMS threshold above which a frame is considered speech
`enroll_silence_ms`	`800`	Consecutive silence (ms) that ends a recording
`enroll_min_speech_ms`	`1500`	Minimum speech (ms) required for a valid sample
`enroll_prompts`	(built-in list)	Sentences read aloud by the user during enrollment. Phonetically diverse sentences produce better embeddings.
`tts.url`	—	TTS service used to read enrollment prompts aloud
`tts.timeout`	`30.0`	TTS HTTP timeout

Threshold tuning¶

The default threshold of 0.25 is permissive. In a quiet environment with good microphone placement, raising it to 0.30–0.35 reduces false matches. Lower it if enrolled speakers are being returned as unknown.

Security

Speaker identification is used as an access gate for sensitive operations (locking/unlocking doors, opening covers). A misidentified speaker could bypass this gate. Keep the threshold at a value you are comfortable with for your environment.

Example¶

host: "127.0.0.1"
port: 8768

model_source: "speechbrain/spkrec-ecapa-voxceleb"
model_save_dir: "models/speaker"

embeddings_dir: "data/speakers"
identify_threshold: 0.28
unknown_speaker: "unknown"