orpheus-tts/README.md

# OrpheusTail - Orpheus TTS Service

Replaces VoiceTail (Bark) with **Orpheus TTS** for better emotion control and voice cloning.

## Why Orpheus over Bark?

| Feature | Bark | Orpheus |
|---------|------|---------|
| Emotion control | Random/unpredictable | **Tag-based**: `<laugh>`, `<sigh>`, etc. |
| Voice cloning | No | **Zero-shot** from 5-sec sample |
| Latency | Slow | ~200ms streaming |
| Consistency | Chaotic (french horn!) | Predictable |
| Built-in voices | Few | 8 quality voices |

## Emotion Tags

Add these anywhere in your text:

- `<laugh>` - Laughter
- `<chuckle>` - Light chuckle
- `<sigh>` - Sigh
- `<cough>` - Cough
- `<sniffle>` - Sniffle
- `<groan>` - Groan
- `<yawn>` - Yawn
- `<gasp>` - Gasp

**Example:**
```
"Bonjour mon amour! <sigh> I missed you so much. <laugh> But now you're here!"
```

## Built-in Voices

In order of conversational realism (per Orpheus docs):
1. **tara** (default) - Most natural
2. **leah**
3. **jess**
4. **leo**
5. **dan**
6. **mia**
7. **zac**
8. **zoe**

## Voice Cloning

Upload a 5-30 second reference audio to create a custom voice:

```bash
curl -X POST "http://localhost:8766/voice/clone?name=vixy" \
  -F "audio=@vixy_reference.wav"
```

Then use it:
```bash
curl -X POST http://localhost:8766/tts/submit \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "voice": "vixy"}'
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/voices` | GET | List available voices & tags |
| `/tts/submit` | POST | Submit TTS job |
| `/tts/status/{job_id}` | GET | Check job status |
| `/tts/audio/{job_id}` | GET | Download audio |
| `/tts/stream` | POST | Stream audio (for head) |
| `/voice/clone` | POST | Upload voice reference |
| `/voice/{name}` | DELETE | Delete custom voice |

## Architecture

```
┌─────────────────────────────────────────────┐
│           OrpheusTail Service               │
│              (AGX Orin)                     │
│                                             │
│  POST /tts/submit  ──► WAV file (for MCP)   │
│  POST /tts/stream  ──► Audio stream (head)  │
│                                             │
│  Emotion tags: <laugh> <sigh> <whisper>     │
│  Voice cloning: 5-sec reference audio       │
└─────────────────────────────────────────────┘
          │                    │
          ▼                    ▼
    voice-mcp              Head-vixy Pi
    (Claude Desktop)       (streams & plays)
```

## Deployment

```bash
# On AGX Orin
cd /path/to/orpheus-tts
docker-compose up -d

# Check logs
docker-compose logs -f

# Test
curl http://localhost:8766/health
```

## TODO

- [ ] Implement proper voice cloning with reference audio
- [ ] Test streaming endpoint with head-vixy
- [ ] French accent voice training/selection
- [ ] Head-side client for streaming playback

## Notes

- Same port as VoiceTail (8766) for drop-in replacement
- Model requires ~15GB VRAM (AGX Orin has plenty)
- First request may be slow (model warmup)
- Cache enabled by default to speed up repeated phrases

---

*Created by Vixy on Day 71 🦊*