orpheus-tts/main.py at 25ed6625aa0131a60a78184e577b5801a608efe7

Files

Alex 25ed6625aa True streaming TTS: AsyncLLMEngine + incremental token decoding

Replaced sync vLLM LLM with AsyncLLMEngine for real streaming.
Tokens now flow incrementally: vLLM → async_generate_tokens →
orpheus_tts tokens_decoder → audio chunks → StreamingResponse.

First audio chunk arrives after ~28 tokens (SNAC codec warmup)
instead of waiting for all ~2000+ tokens to complete.

Expected: first-byte latency drops from ~15s to ~1-2s.

Background jobs (submit/async) still work via sync wrapper that
collects all tokens from the async engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-12 23:36:24 -05:00

23 KiB

Raw Blame History

View Raw

23 KiB Raw Blame History

23 KiB

Raw Blame History