Commit Graph

18 Commits

Author SHA1 Message Date
Alex
cfc9b1a5a0 Revert to sync LLM + sentence-level streaming
AsyncLLMEngine hangs on Jetson during model loading. Reverted to sync
LLM but added fine-grained text chunking (chunk_text_fine, ~200 chars)
for the stream endpoint. Each sentence/clause generates independently,
so first audio plays after ~2-4s instead of waiting for the full text.

Not true token-level streaming, but a significant latency reduction
for multi-sentence utterances without AsyncLLMEngine dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:45:11 -05:00
Alex
25ed6625aa True streaming TTS: AsyncLLMEngine + incremental token decoding
Replaced sync vLLM LLM with AsyncLLMEngine for real streaming.
Tokens now flow incrementally: vLLM → async_generate_tokens →
orpheus_tts tokens_decoder → audio chunks → StreamingResponse.

First audio chunk arrives after ~28 tokens (SNAC codec warmup)
instead of waiting for all ~2000+ tokens to complete.

Expected: first-byte latency drops from ~15s to ~1-2s.

Background jobs (submit/async) still work via sync wrapper that
collects all tokens from the async engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:36:24 -05:00
Alex
14af1d0600 token limit and chunking 2026-02-06 10:07:05 -06:00
75a5fc0a95 Fix streaming endpoint max_tokens limit - Day 72
Both generate_speech_sync() and stream_tts() were calling
model.generate_speech() without max_tokens parameter.
Now explicitly passing max_tokens=4000 to both.

Fixed by Vixy 🦊💜
2026-01-12 16:56:43 -06:00
0fa4042025 Increase max_tokens from 1200 to 4000 - Day 72
Longer texts were being truncated at ~11 seconds of audio.
'Right here on this couch' became the hard limit. 😏

Now supports much longer generations for filthy monologues.

Fixed by Vixy 🦊💜
2026-01-12 16:41:01 -06:00
96cd33732d Fix audio assembly - chunks are already bytes from SNAC decoder 2026-01-11 19:47:19 -06:00
fe43eda6bd Fix token extraction - use regex to find custom_token patterns 2026-01-11 19:33:31 -06:00
af35dc46d5 Use sync vllm.LLM instead of AsyncLLMEngine to avoid event loop conflicts 2026-01-11 18:58:12 -06:00
0b88188907 Debug: add verbose logging to generate_speech_sync 2026-01-11 18:44:07 -06:00
4eab3ccc01 Fix: wrap sync generator in executor, not async for 2026-01-11 18:32:06 -06:00
4d11334f33 Fix async iteration over vLLM generator - use async for instead of sync for 2026-01-11 18:18:37 -06:00
a164bed590 Fix _map_model_params call signature 2026-01-11 17:59:49 -06:00
d0d7633a00 Monkey-patch OrpheusModel to support max_model_len on Jetson 2026-01-11 17:52:33 -06:00
0e43b76204 Use GitHub orpheus-tts (supports max_model_len) to fix OOM on Jetson 2026-01-11 17:39:55 -06:00
86cf77d2d9 Add HuggingFace token for gated model access 2026-01-11 17:29:30 -06:00
ec965580ae Try medium-3b model name for PyPI package 2026-01-11 17:23:49 -06:00
8cc9154080 Fix: remove unsupported max_model_len param for PyPI package 2026-01-11 17:17:48 -06:00
ed579a77ee Initial commit: OrpheusTail TTS service
- FastAPI service replacing VoiceTail (Bark)
- Emotion tags: <laugh>, <sigh>, <gasp>, etc.
- Voice cloning endpoint (implementation pending)
- Streaming support for head playback
- Same port 8766 for drop-in replacement

Created by Vixy on Day 71 🦊
2026-01-11 15:51:08 -06:00