16aa52665618346a280deabeb42602462202d295
SNAC has 3 codebook layers, each 4096 entries. Token position within the group of 7 determines which layer: pos 0 = L1 (offset 0), pos 1-2 = L2 (offset 4096), pos 3-6 = L3 (offset 8192). Without this, codes exceeded 4096 and caused index-out-of-range in SNAC. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OrpheusTail - Orpheus TTS Service
Replaces VoiceTail (Bark) with Orpheus TTS for better emotion control and voice cloning.
Why Orpheus over Bark?
| Feature | Bark | Orpheus |
|---|---|---|
| Emotion control | Random/unpredictable | Tag-based: <laugh>, <sigh>, etc. |
| Voice cloning | No | Zero-shot from 5-sec sample |
| Latency | Slow | ~200ms streaming |
| Consistency | Chaotic (french horn!) | Predictable |
| Built-in voices | Few | 8 quality voices |
Emotion Tags
Add these anywhere in your text:
<laugh>- Laughter<chuckle>- Light chuckle<sigh>- Sigh<cough>- Cough<sniffle>- Sniffle<groan>- Groan<yawn>- Yawn<gasp>- Gasp
Example:
"Bonjour mon amour! <sigh> I missed you so much. <laugh> But now you're here!"
Built-in Voices
In order of conversational realism (per Orpheus docs):
- tara (default) - Most natural
- leah
- jess
- leo
- dan
- mia
- zac
- zoe
Voice Cloning
Upload a 5-30 second reference audio to create a custom voice:
curl -X POST "http://localhost:8766/voice/clone?name=vixy" \
-F "audio=@vixy_reference.wav"
Then use it:
curl -X POST http://localhost:8766/tts/submit \
-H "Content-Type: application/json" \
-d '{"text": "Hello!", "voice": "vixy"}'
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/voices |
GET | List available voices & tags |
/tts/submit |
POST | Submit TTS job |
/tts/status/{job_id} |
GET | Check job status |
/tts/audio/{job_id} |
GET | Download audio |
/tts/stream |
POST | Stream audio (for head) |
/voice/clone |
POST | Upload voice reference |
/voice/{name} |
DELETE | Delete custom voice |
Architecture
┌─────────────────────────────────────────────┐
│ OrpheusTail Service │
│ (AGX Orin) │
│ │
│ POST /tts/submit ──► WAV file (for MCP) │
│ POST /tts/stream ──► Audio stream (head) │
│ │
│ Emotion tags: <laugh> <sigh> <whisper> │
│ Voice cloning: 5-sec reference audio │
└─────────────────────────────────────────────┘
│ │
▼ ▼
voice-mcp Head-vixy Pi
(Claude Desktop) (streams & plays)
Deployment
# On AGX Orin
cd /path/to/orpheus-tts
docker-compose up -d
# Check logs
docker-compose logs -f
# Test
curl http://localhost:8766/health
TODO
- Implement proper voice cloning with reference audio
- Test streaming endpoint with head-vixy
- French accent voice training/selection
- Head-side client for streaming playback
Notes
- Same port as VoiceTail (8766) for drop-in replacement
- Model requires ~15GB VRAM (AGX Orin has plenty)
- First request may be slow (model warmup)
- Cache enabled by default to speed up repeated phrases
Created by Vixy on Day 71 🦊
Description
OrpheusTail - Orpheus TTS Service for Vixy. Emotion-controlled speech with voice cloning.
Languages
Python
93.4%
Dockerfile
6.6%