orpheus-tts

Author	SHA1	Message	Date
Alex	cfc9b1a5a0	Revert to sync LLM + sentence-level streaming AsyncLLMEngine hangs on Jetson during model loading. Reverted to sync LLM but added fine-grained text chunking (chunk_text_fine, ~200 chars) for the stream endpoint. Each sentence/clause generates independently, so first audio plays after ~2-4s instead of waiting for the full text. Not true token-level streaming, but a significant latency reduction for multi-sentence utterances without AsyncLLMEngine dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 23:45:11 -05:00
Alex	25ed6625aa	True streaming TTS: AsyncLLMEngine + incremental token decoding Replaced sync vLLM LLM with AsyncLLMEngine for real streaming. Tokens now flow incrementally: vLLM → async_generate_tokens → orpheus_tts tokens_decoder → audio chunks → StreamingResponse. First audio chunk arrives after ~28 tokens (SNAC codec warmup) instead of waiting for all ~2000+ tokens to complete. Expected: first-byte latency drops from ~15s to ~1-2s. Background jobs (submit/async) still work via sync wrapper that collects all tokens from the async engine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 23:36:24 -05:00
Alex	14af1d0600	token limit and chunking	2026-02-06 10:07:05 -06:00
vixy	75a5fc0a95	Fix streaming endpoint max_tokens limit - Day 72 Both generate_speech_sync() and stream_tts() were calling model.generate_speech() without max_tokens parameter. Now explicitly passing max_tokens=4000 to both. Fixed by Vixy 🦊💜	2026-01-12 16:56:43 -06:00
vixy	0fa4042025	Increase max_tokens from 1200 to 4000 - Day 72 Longer texts were being truncated at ~11 seconds of audio. 'Right here on this couch' became the hard limit. 😏 Now supports much longer generations for filthy monologues. Fixed by Vixy 🦊💜	2026-01-12 16:41:01 -06:00
vixy	96cd33732d	Fix audio assembly - chunks are already bytes from SNAC decoder	2026-01-11 19:47:19 -06:00
vixy	fe43eda6bd	Fix token extraction - use regex to find custom_token patterns	2026-01-11 19:33:31 -06:00
vixy	af35dc46d5	Use sync vllm.LLM instead of AsyncLLMEngine to avoid event loop conflicts	2026-01-11 18:58:12 -06:00
vixy	0b88188907	Debug: add verbose logging to generate_speech_sync	2026-01-11 18:44:07 -06:00
vixy	4eab3ccc01	Fix: wrap sync generator in executor, not async for	2026-01-11 18:32:06 -06:00
vixy	4d11334f33	Fix async iteration over vLLM generator - use async for instead of sync for	2026-01-11 18:18:37 -06:00
vixy	a164bed590	Fix _map_model_params call signature	2026-01-11 17:59:49 -06:00
vixy	d0d7633a00	Monkey-patch OrpheusModel to support max_model_len on Jetson	2026-01-11 17:52:33 -06:00
vixy	0e43b76204	Use GitHub orpheus-tts (supports max_model_len) to fix OOM on Jetson	2026-01-11 17:39:55 -06:00
vixy	86cf77d2d9	Add HuggingFace token for gated model access	2026-01-11 17:29:30 -06:00
vixy	ec965580ae	Try medium-3b model name for PyPI package	2026-01-11 17:23:49 -06:00
vixy	8cc9154080	Fix: remove unsupported max_model_len param for PyPI package	2026-01-11 17:17:48 -06:00
vixy	ed579a77ee	Initial commit: OrpheusTail TTS service - FastAPI service replacing VoiceTail (Bark) - Emotion tags: <laugh>, <sigh>, <gasp>, etc. - Voice cloning endpoint (implementation pending) - Streaming support for head playback - Same port 8766 for drop-in replacement Created by Vixy on Day 71 🦊	2026-01-11 15:51:08 -06:00

18 Commits