TTS Configuration¶
File: configs/tts.yaml
Command: kenzy-tts [config_path]
The TTS service accepts POST requests with text and returns raw int16 PCM audio at 24 kHz mono. Two providers are supported, selected via the provider key.
Provider selection¶
| Key | Default | Description |
|---|---|---|
provider |
"openai" |
TTS backend: openai or kokoro |
OpenAI provider¶
Requires: OPENAI_API_KEY in .env
Long responses are automatically split at sentence boundaries and concatenated, so there is no effective limit on response length.
| Key | Default | Description |
|---|---|---|
openai.model |
"gpt-4o-mini-tts" |
OpenAI TTS model |
openai.voice |
"sage" |
Voice persona (see below) |
openai.speed |
1.0 |
Playback speed multiplier, 0.25–4.0 |
Available voices¶
alloy · ash · ballad · coral · echo · fable · nova · onyx · sage · shimmer
Example¶
provider: "openai"
openai:
model: "gpt-4o-mini-tts"
voice: "nova"
speed: 1.1
Kokoro provider¶
Requires:
- pip install -e ".[kokoro]" (installs the kokoro package and PyTorch)
- sudo apt-get install espeak-ng (system phonemization library)
- Run kenzy-setup after install to pre-download model weights
Kokoro runs entirely locally with no API key or internet connection required at runtime. It produces high-quality speech and outputs at 24 kHz mono — the same format as the OpenAI provider, so nothing downstream changes.
Note
The voice_prompt style instruction generated by the LLM (e.g. "speak warmly at a conversational pace") is an OpenAI-specific feature. It is silently ignored when using Kokoro.
| Key | Default | Description |
|---|---|---|
kokoro.voice |
"af_heart" |
Kokoro voice name (see below) |
kokoro.device |
"auto" |
Inference device (see below) |
kokoro.speed |
1.0 |
Playback speed multiplier, 0.5–2.0 |
kokoro.lang_code |
(from voice) | Language code. Derived automatically from the first character of the voice name if omitted. |
Device options¶
| Value | Description |
|---|---|
auto |
Detects the best available device at startup: CUDA → MPS → CPU (recommended) |
cpu |
Always use CPU |
cuda |
NVIDIA GPU. Also covers AMD GPUs when using a ROCm-enabled PyTorch build. |
mps |
Apple Silicon GPU (M1/M2/M3/M4) |
Voice names and languages¶
The voice name prefix determines the language:
| Prefix | Language | Example voices |
|---|---|---|
af_ |
American English (female) | af_heart, af_bella, af_sky |
am_ |
American English (male) | am_adam, am_michael |
bf_ |
British English (female) | bf_emma, bf_isabella |
bm_ |
British English (male) | bm_lewis, bm_george |
The lang_code is derived from the first character of the voice name (af_heart → 'a', bf_emma → 'b'). Set it explicitly only if you need to override this.
Example¶
provider: "kokoro"
kokoro:
voice: "af_heart"
device: "auto"
speed: 1.0
Installation¶
# System dependency
sudo apt-get install espeak-ng
# Python package
pip install -e ".[kokoro]"
# Pre-download model weights
kenzy-setup