Files
orpheus-tts/main.py
Alex 25ed6625aa True streaming TTS: AsyncLLMEngine + incremental token decoding
Replaced sync vLLM LLM with AsyncLLMEngine for real streaming.
Tokens now flow incrementally: vLLM → async_generate_tokens →
orpheus_tts tokens_decoder → audio chunks → StreamingResponse.

First audio chunk arrives after ~28 tokens (SNAC codec warmup)
instead of waiting for all ~2000+ tokens to complete.

Expected: first-byte latency drops from ~15s to ~1-2s.

Background jobs (submit/async) still work via sync wrapper that
collects all tokens from the async engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:36:24 -05:00

23 KiB