Major rework: replaced vLLM sync LLM with HuggingFace transformers
+ TextIteratorStreamer for true token-level streaming.
Pipeline: text → format_prompt → model.generate(streamer) →
extract_audio_codes (regex on streaming text) → SNAC decode → PCM
Expected first-audio latency: ~1-2s (was 10-14s with vLLM).
No more monkey-patching, no more AsyncLLMEngine hangs on Jetson.
SNAC model loaded separately (snac_24khz) for audio decoding.
All endpoints preserved, API compatible with v1.
Voice cloning endpoint now honest about LoRA requirement.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AsyncLLMEngine hangs on Jetson during model loading. Reverted to sync
LLM but added fine-grained text chunking (chunk_text_fine, ~200 chars)
for the stream endpoint. Each sentence/clause generates independently,
so first audio plays after ~2-4s instead of waiting for the full text.
Not true token-level streaming, but a significant latency reduction
for multi-sentence utterances without AsyncLLMEngine dependency.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced sync vLLM LLM with AsyncLLMEngine for real streaming.
Tokens now flow incrementally: vLLM → async_generate_tokens →
orpheus_tts tokens_decoder → audio chunks → StreamingResponse.
First audio chunk arrives after ~28 tokens (SNAC codec warmup)
instead of waiting for all ~2000+ tokens to complete.
Expected: first-byte latency drops from ~15s to ~1-2s.
Background jobs (submit/async) still work via sync wrapper that
collects all tokens from the async engine.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both generate_speech_sync() and stream_tts() were calling
model.generate_speech() without max_tokens parameter.
Now explicitly passing max_tokens=4000 to both.
Fixed by Vixy 🦊💜
Longer texts were being truncated at ~11 seconds of audio.
'Right here on this couch' became the hard limit. 😏
Now supports much longer generations for filthy monologues.
Fixed by Vixy 🦊💜
- FastAPI service replacing VoiceTail (Bark)
- Emotion tags: <laugh>, <sigh>, <gasp>, etc.
- Voice cloning endpoint (implementation pending)
- Streaming support for head playback
- Same port 8766 for drop-in replacement
Created by Vixy on Day 71 🦊