TTS Configuration

File: configs/tts.yaml
Command: kenzy-tts [config_path]

The TTS service accepts POST requests with text and returns raw int16 PCM audio at 24 kHz mono. Two providers are supported, selected via the provider key.

Provider selection

Key Default Description
provider "openai" TTS backend: openai or kokoro

OpenAI provider

Requires: OPENAI_API_KEY in .env

Long responses are automatically split at sentence boundaries and concatenated, so there is no effective limit on response length.

Key Default Description
openai.model "gpt-4o-mini-tts" OpenAI TTS model
openai.voice "sage" Voice persona (see below)
openai.speed 1.0 Playback speed multiplier, 0.25–4.0

Available voices

alloy · ash · ballad · coral · echo · fable · nova · onyx · sage · shimmer

Example

provider: "openai"

openai:
  model: "gpt-4o-mini-tts"
  voice: "nova"
  speed: 1.1

Kokoro provider

Requires: - pip install -e ".[kokoro]" (installs the kokoro package and PyTorch) - sudo apt-get install espeak-ng (system phonemization library) - Run kenzy-setup after install to pre-download model weights

Kokoro runs entirely locally with no API key or internet connection required at runtime. It produces high-quality speech and outputs at 24 kHz mono — the same format as the OpenAI provider, so nothing downstream changes.

Note

The voice_prompt style instruction generated by the LLM (e.g. "speak warmly at a conversational pace") is an OpenAI-specific feature. It is silently ignored when using Kokoro.

Key Default Description
kokoro.voice "af_heart" Kokoro voice name (see below)
kokoro.device "auto" Inference device (see below)
kokoro.speed 1.0 Playback speed multiplier, 0.5–2.0
kokoro.lang_code (from voice) Language code. Derived automatically from the first character of the voice name if omitted.

Device options

Value Description
auto Detects the best available device at startup: CUDA → MPS → CPU (recommended)
cpu Always use CPU
cuda NVIDIA GPU. Also covers AMD GPUs when using a ROCm-enabled PyTorch build.
mps Apple Silicon GPU (M1/M2/M3/M4)

Voice names and languages

The voice name prefix determines the language:

Prefix Language Example voices
af_ American English (female) af_heart, af_bella, af_sky
am_ American English (male) am_adam, am_michael
bf_ British English (female) bf_emma, bf_isabella
bm_ British English (male) bm_lewis, bm_george

The lang_code is derived from the first character of the voice name (af_heart'a', bf_emma'b'). Set it explicitly only if you need to override this.

Example

provider: "kokoro"

kokoro:
  voice: "af_heart"
  device: "auto"
  speed: 1.0

Installation

# System dependency
sudo apt-get install espeak-ng

# Python package
pip install -e ".[kokoro]"

# Pre-download model weights
kenzy-setup