TTS Configuration¶

File: configs/tts.yaml
Command: kenzy-tts [config_path]

The TTS service accepts POST requests with text and returns raw int16 PCM audio at 24 kHz mono. Two providers are supported, selected via the provider key.

Provider selection¶

Key	Default	Description
`provider`	`"openai"`	TTS backend: `openai` or `kokoro`

OpenAI provider¶

Requires: OPENAI_API_KEY in .env

Long responses are automatically split at sentence boundaries and concatenated, so there is no effective limit on response length.

Key	Default	Description
`openai.model`	`"gpt-4o-mini-tts"`	OpenAI TTS model
`openai.voice`	`"sage"`	Voice persona (see below)
`openai.speed`	`1.0`	Playback speed multiplier, 0.25–4.0

Available voices¶

alloy · ash · ballad · coral · echo · fable · nova · onyx · sage · shimmer

Example¶

provider: "openai"

openai:
  model: "gpt-4o-mini-tts"
  voice: "nova"
  speed: 1.1

Kokoro provider¶

Requires: - pip install -e ".[kokoro]" (installs the kokoro package and PyTorch) - sudo apt-get install espeak-ng (system phonemization library) - Run kenzy-setup after install to pre-download model weights

Kokoro runs entirely locally with no API key or internet connection required at runtime. It produces high-quality speech and outputs at 24 kHz mono — the same format as the OpenAI provider, so nothing downstream changes.

Note

The voice_prompt style instruction generated by the LLM (e.g. "speak warmly at a conversational pace") is an OpenAI-specific feature. It is silently ignored when using Kokoro.

Key	Default	Description
`kokoro.voice`	`"af_heart"`	Kokoro voice name (see below)
`kokoro.device`	`"auto"`	Inference device (see below)
`kokoro.speed`	`1.0`	Playback speed multiplier, 0.5–2.0
`kokoro.lang_code`	(from voice)	Language code. Derived automatically from the first character of the voice name if omitted.

Device options¶

Value	Description
`auto`	Detects the best available device at startup: CUDA → MPS → CPU (recommended)
`cpu`	Always use CPU
`cuda`	NVIDIA GPU. Also covers AMD GPUs when using a ROCm-enabled PyTorch build.
`mps`	Apple Silicon GPU (M1/M2/M3/M4)

Voice names and languages¶

The voice name prefix determines the language:

Prefix	Language	Example voices
`af_`	American English (female)	`af_heart`, `af_bella`, `af_sky`
`am_`	American English (male)	`am_adam`, `am_michael`
`bf_`	British English (female)	`bf_emma`, `bf_isabella`
`bm_`	British English (male)	`bm_lewis`, `bm_george`

The lang_code is derived from the first character of the voice name (af_heart → 'a', bf_emma → 'b'). Set it explicitly only if you need to override this.

Example¶

provider: "kokoro"

kokoro:
  voice: "af_heart"
  device: "auto"
  speed: 1.0

Installation¶

# System dependency
sudo apt-get install espeak-ng

# Python package
pip install -e ".[kokoro]"

# Pre-download model weights
kenzy-setup