Files
orpheus-tts/main.py
Alex d650fd06b9 OrpheusTail v2: transformers streaming engine (replaces vLLM)
Major rework: replaced vLLM sync LLM with HuggingFace transformers
+ TextIteratorStreamer for true token-level streaming.

Pipeline: text → format_prompt → model.generate(streamer) →
extract_audio_codes (regex on streaming text) → SNAC decode → PCM

Expected first-audio latency: ~1-2s (was 10-14s with vLLM).
No more monkey-patching, no more AsyncLLMEngine hangs on Jetson.

SNAC model loaded separately (snac_24khz) for audio decoding.
All endpoints preserved, API compatible with v1.
Voice cloning endpoint now honest about LoRA requirement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 08:38:30 -05:00

22 KiB