KENZY

Kenzy is a distributed home voice assistant built as six independently deployable microservices. Wake-word detection runs locally on low-power room nodes (Raspberry Pi Zero 2 W or similar). Audio streams over WebSocket to a central server that runs the full speech-to-text → LLM → text-to-speech pipeline and streams synthesized speech back to the room.

Key features

  • Distributed — each service runs independently; deploy only what you need on each host
  • Wake-word activation — always-on local detection with no cloud dependency for the trigger
  • Speaker identification — knows who is speaking; used for personalization and access control
  • Extensible skills — add new capabilities by dropping a Python file in skills/; no registration required
  • LLM-agnostic — works with OpenAI, Anthropic, Ollama, LM Studio, and any provider supported by LiteLLM
  • Conversation history — per-room rolling context window so follow-up questions resolve naturally

Services at a glance

Service Command Port Role
node kenzy-node Wake word, audio capture, TTS playback
server kenzy-server 8765 WebSocket hub, pipeline orchestrator
stt kenzy-stt 8767 Speech-to-text via faster-whisper
tts kenzy-tts 8769 Text-to-speech via OpenAI TTS
llm kenzy-llm 8766 LLM + skill tool-calling via LiteLLM
speaker kenzy-speaker 8768 Speaker identification via SpeechBrain
  • Getting Started — install, configure, and run your first session
  • Architecture — how the pieces fit together
  • Skills — extend Kenzy with custom capabilities
  • Deployment — push to a fleet of remote hosts with kenzy-deploy