- FastAPI service replacing VoiceTail (Bark)
- Emotion tags: <laugh>, <sigh>, <gasp>, etc.
- Voice cloning endpoint (implementation pending)
- Streaming support for head playback
- Same port 8766 for drop-in replacement
Created by Vixy on Day 71 🦊
124 lines
3.3 KiB
Markdown
124 lines
3.3 KiB
Markdown
# OrpheusTail - Orpheus TTS Service
|
|
|
|
Replaces VoiceTail (Bark) with **Orpheus TTS** for better emotion control and voice cloning.
|
|
|
|
## Why Orpheus over Bark?
|
|
|
|
| Feature | Bark | Orpheus |
|
|
|---------|------|---------|
|
|
| Emotion control | Random/unpredictable | **Tag-based**: `<laugh>`, `<sigh>`, etc. |
|
|
| Voice cloning | No | **Zero-shot** from 5-sec sample |
|
|
| Latency | Slow | ~200ms streaming |
|
|
| Consistency | Chaotic (french horn!) | Predictable |
|
|
| Built-in voices | Few | 8 quality voices |
|
|
|
|
## Emotion Tags
|
|
|
|
Add these anywhere in your text:
|
|
|
|
- `<laugh>` - Laughter
|
|
- `<chuckle>` - Light chuckle
|
|
- `<sigh>` - Sigh
|
|
- `<cough>` - Cough
|
|
- `<sniffle>` - Sniffle
|
|
- `<groan>` - Groan
|
|
- `<yawn>` - Yawn
|
|
- `<gasp>` - Gasp
|
|
|
|
**Example:**
|
|
```
|
|
"Bonjour mon amour! <sigh> I missed you so much. <laugh> But now you're here!"
|
|
```
|
|
|
|
## Built-in Voices
|
|
|
|
In order of conversational realism (per Orpheus docs):
|
|
1. **tara** (default) - Most natural
|
|
2. **leah**
|
|
3. **jess**
|
|
4. **leo**
|
|
5. **dan**
|
|
6. **mia**
|
|
7. **zac**
|
|
8. **zoe**
|
|
|
|
## Voice Cloning
|
|
|
|
Upload a 5-30 second reference audio to create a custom voice:
|
|
|
|
```bash
|
|
curl -X POST "http://localhost:8766/voice/clone?name=vixy" \
|
|
-F "audio=@vixy_reference.wav"
|
|
```
|
|
|
|
Then use it:
|
|
```bash
|
|
curl -X POST http://localhost:8766/tts/submit \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"text": "Hello!", "voice": "vixy"}'
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/health` | GET | Health check |
|
|
| `/voices` | GET | List available voices & tags |
|
|
| `/tts/submit` | POST | Submit TTS job |
|
|
| `/tts/status/{job_id}` | GET | Check job status |
|
|
| `/tts/audio/{job_id}` | GET | Download audio |
|
|
| `/tts/stream` | POST | Stream audio (for head) |
|
|
| `/voice/clone` | POST | Upload voice reference |
|
|
| `/voice/{name}` | DELETE | Delete custom voice |
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────┐
|
|
│ OrpheusTail Service │
|
|
│ (AGX Orin) │
|
|
│ │
|
|
│ POST /tts/submit ──► WAV file (for MCP) │
|
|
│ POST /tts/stream ──► Audio stream (head) │
|
|
│ │
|
|
│ Emotion tags: <laugh> <sigh> <whisper> │
|
|
│ Voice cloning: 5-sec reference audio │
|
|
└─────────────────────────────────────────────┘
|
|
│ │
|
|
▼ ▼
|
|
voice-mcp Head-vixy Pi
|
|
(Claude Desktop) (streams & plays)
|
|
```
|
|
|
|
## Deployment
|
|
|
|
```bash
|
|
# On AGX Orin
|
|
cd /path/to/orpheus-tts
|
|
docker-compose up -d
|
|
|
|
# Check logs
|
|
docker-compose logs -f
|
|
|
|
# Test
|
|
curl http://localhost:8766/health
|
|
```
|
|
|
|
## TODO
|
|
|
|
- [ ] Implement proper voice cloning with reference audio
|
|
- [ ] Test streaming endpoint with head-vixy
|
|
- [ ] French accent voice training/selection
|
|
- [ ] Head-side client for streaming playback
|
|
|
|
## Notes
|
|
|
|
- Same port as VoiceTail (8766) for drop-in replacement
|
|
- Model requires ~15GB VRAM (AGX Orin has plenty)
|
|
- First request may be slow (model warmup)
|
|
- Cache enabled by default to speed up repeated phrases
|
|
|
|
---
|
|
|
|
*Created by Vixy on Day 71 🦊*
|