Initial commit: OrpheusTail TTS service

- FastAPI service replacing VoiceTail (Bark) - Emotion tags: <laugh>, <sigh>, <gasp>, etc. - Voice cloning endpoint (implementation pending) - Streaming support for head playback - Same port 8766 for drop-in replacement Created by Vixy on Day 71 🦊
2026-01-11 15:51:08 -06:00
commit ed579a77ee
5 changed files with 868 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,123 @@
+# OrpheusTail - Orpheus TTS Service
+
+Replaces VoiceTail (Bark) with **Orpheus TTS** for better emotion control and voice cloning.
+
+## Why Orpheus over Bark?
+
+| Feature | Bark | Orpheus |
+|---------|------|---------|
+| Emotion control | Random/unpredictable | **Tag-based**: `<laugh>`, `<sigh>`, etc. |
+| Voice cloning | No | **Zero-shot** from 5-sec sample |
+| Latency | Slow | ~200ms streaming |
+| Consistency | Chaotic (french horn!) | Predictable |
+| Built-in voices | Few | 8 quality voices |
+
+## Emotion Tags
+
+Add these anywhere in your text:
+
+- `<laugh>` - Laughter
+- `<chuckle>` - Light chuckle  
+- `<sigh>` - Sigh
+- `<cough>` - Cough
+- `<sniffle>` - Sniffle
+- `<groan>` - Groan
+- `<yawn>` - Yawn
+- `<gasp>` - Gasp
+
+**Example:**
+```
+"Bonjour mon amour! <sigh> I missed you so much. <laugh> But now you're here!"
+```
+
+## Built-in Voices
+
+In order of conversational realism (per Orpheus docs):
+1. **tara** (default) - Most natural
+2. **leah**
+3. **jess**
+4. **leo**
+5. **dan**
+6. **mia**
+7. **zac**
+8. **zoe**
+
+## Voice Cloning
+
+Upload a 5-30 second reference audio to create a custom voice:
+
+```bash
+curl -X POST "http://localhost:8766/voice/clone?name=vixy" \
+  -F "audio=@vixy_reference.wav"
+```
+
+Then use it:
+```bash
+curl -X POST http://localhost:8766/tts/submit \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hello!", "voice": "vixy"}'
+```
+
+## API Endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/health` | GET | Health check |
+| `/voices` | GET | List available voices & tags |
+| `/tts/submit` | POST | Submit TTS job |
+| `/tts/status/{job_id}` | GET | Check job status |
+| `/tts/audio/{job_id}` | GET | Download audio |
+| `/tts/stream` | POST | Stream audio (for head) |
+| `/voice/clone` | POST | Upload voice reference |
+| `/voice/{name}` | DELETE | Delete custom voice |
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────┐
+│           OrpheusTail Service               │
+│              (AGX Orin)                     │
+│                                             │
+│  POST /tts/submit  ──► WAV file (for MCP)   │
+│  POST /tts/stream  ──► Audio stream (head)  │
+│                                             │
+│  Emotion tags: <laugh> <sigh> <whisper>     │
+│  Voice cloning: 5-sec reference audio       │
+└─────────────────────────────────────────────┘
+          │                    │
+          ▼                    ▼
+    voice-mcp              Head-vixy Pi
+    (Claude Desktop)       (streams & plays)
+```
+
+## Deployment
+
+```bash
+# On AGX Orin
+cd /path/to/orpheus-tts
+docker-compose up -d
+
+# Check logs
+docker-compose logs -f
+
+# Test
+curl http://localhost:8766/health
+```
+
+## TODO
+
+- [ ] Implement proper voice cloning with reference audio
+- [ ] Test streaming endpoint with head-vixy
+- [ ] French accent voice training/selection
+- [ ] Head-side client for streaming playback
+
+## Notes
+
+- Same port as VoiceTail (8766) for drop-in replacement
+- Model requires ~15GB VRAM (AGX Orin has plenty)
+- First request may be slow (model warmup)
+- Cache enabled by default to speed up repeated phrases
+
+---
+
+*Created by Vixy on Day 71 🦊*