STT Configuration¶

File: configs/stt.yaml
Command: kenzy-stt [config_path]

The STT service accepts POST requests with base64-encoded PCM audio and returns a transcript. It is built on faster-whisper, a CTranslate2-optimized implementation of OpenAI Whisper.

Full reference¶

Key	Default	Description
`host`	`"127.0.0.1"`	Bind address
`port`	`8767`	HTTP port
`log_level`	`"info"`	Log verbosity

Whisper model¶

Key	Default	Description
`whisper.model`	`"tiny"`	Model size: `tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`. Larger models are more accurate but slower and need more RAM.
`whisper.device`	`"cpu"`	Inference device: `cpu` or `cuda`
`whisper.compute_type`	`"int8"`	Quantisation: `int8` (fastest on CPU), `float16` (GPU), `float32` (highest quality)
`whisper.language`	`"en"`	Language code (e.g. `"en"`, `"fr"`), or `null` for auto-detect

Model size guide¶

Model	Size	Relative speed	Notes
`tiny`	~75 MB	Fastest	Good for fast hardware or simple commands
`base`	~145 MB	Fast	Better accuracy, still CPU-friendly
`small`	~460 MB	Moderate	Good balance for a dedicated CPU server
`medium`	~1.5 GB	Slow on CPU	Recommended with a GPU
`large-v3`	~3 GB	Slow	Best accuracy; GPU strongly recommended

Raspberry Pi

On a Pi Zero 2 W, run the STT service on a more powerful server and point stt.url in server.yaml at it. The tiny or base model on a modern x86 CPU gives acceptable latency.

Example¶

host: "127.0.0.1"
port: 8767

whisper:
  model: "base"
  device: "cpu"
  compute_type: "int8"
  language: "en"