Initial commit: OrpheusTail TTS service

- FastAPI service replacing VoiceTail (Bark)
- Emotion tags: <laugh>, <sigh>, <gasp>, etc.
- Voice cloning endpoint (implementation pending)
- Streaming support for head playback
- Same port 8766 for drop-in replacement

Created by Vixy on Day 71 🦊
This commit is contained in:
2026-01-11 15:51:08 -06:00
commit ed579a77ee
5 changed files with 868 additions and 0 deletions

123
README.md Normal file
View File

@@ -0,0 +1,123 @@
# OrpheusTail - Orpheus TTS Service
Replaces VoiceTail (Bark) with **Orpheus TTS** for better emotion control and voice cloning.
## Why Orpheus over Bark?
| Feature | Bark | Orpheus |
|---------|------|---------|
| Emotion control | Random/unpredictable | **Tag-based**: `<laugh>`, `<sigh>`, etc. |
| Voice cloning | No | **Zero-shot** from 5-sec sample |
| Latency | Slow | ~200ms streaming |
| Consistency | Chaotic (french horn!) | Predictable |
| Built-in voices | Few | 8 quality voices |
## Emotion Tags
Add these anywhere in your text:
- `<laugh>` - Laughter
- `<chuckle>` - Light chuckle
- `<sigh>` - Sigh
- `<cough>` - Cough
- `<sniffle>` - Sniffle
- `<groan>` - Groan
- `<yawn>` - Yawn
- `<gasp>` - Gasp
**Example:**
```
"Bonjour mon amour! <sigh> I missed you so much. <laugh> But now you're here!"
```
## Built-in Voices
In order of conversational realism (per Orpheus docs):
1. **tara** (default) - Most natural
2. **leah**
3. **jess**
4. **leo**
5. **dan**
6. **mia**
7. **zac**
8. **zoe**
## Voice Cloning
Upload a 5-30 second reference audio to create a custom voice:
```bash
curl -X POST "http://localhost:8766/voice/clone?name=vixy" \
-F "audio=@vixy_reference.wav"
```
Then use it:
```bash
curl -X POST http://localhost:8766/tts/submit \
-H "Content-Type: application/json" \
-d '{"text": "Hello!", "voice": "vixy"}'
```
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/voices` | GET | List available voices & tags |
| `/tts/submit` | POST | Submit TTS job |
| `/tts/status/{job_id}` | GET | Check job status |
| `/tts/audio/{job_id}` | GET | Download audio |
| `/tts/stream` | POST | Stream audio (for head) |
| `/voice/clone` | POST | Upload voice reference |
| `/voice/{name}` | DELETE | Delete custom voice |
## Architecture
```
┌─────────────────────────────────────────────┐
│ OrpheusTail Service │
│ (AGX Orin) │
│ │
│ POST /tts/submit ──► WAV file (for MCP) │
│ POST /tts/stream ──► Audio stream (head) │
│ │
│ Emotion tags: <laugh> <sigh> <whisper> │
│ Voice cloning: 5-sec reference audio │
└─────────────────────────────────────────────┘
│ │
▼ ▼
voice-mcp Head-vixy Pi
(Claude Desktop) (streams & plays)
```
## Deployment
```bash
# On AGX Orin
cd /path/to/orpheus-tts
docker-compose up -d
# Check logs
docker-compose logs -f
# Test
curl http://localhost:8766/health
```
## TODO
- [ ] Implement proper voice cloning with reference audio
- [ ] Test streaming endpoint with head-vixy
- [ ] French accent voice training/selection
- [ ] Head-side client for streaming playback
## Notes
- Same port as VoiceTail (8766) for drop-in replacement
- Model requires ~15GB VRAM (AGX Orin has plenty)
- First request may be slow (model warmup)
- Cache enabled by default to speed up repeated phrases
---
*Created by Vixy on Day 71 🦊*