voice-mcp/README.md

# Voice MCP

MCP server for text-to-speech generation using Bark TTS service on Jetson AGX Orin.

## Overview

This MCP server provides tools to generate and play speech from text using the Bark TTS service. It follows a similar pattern to dreamtail-mcp but for voice generation instead of images.

## Features

- **voice_generate()** - Convert text to speech using Bark TTS
- **voice_play()** - Play generated audio files on macOS
- **voice_get_last()** - Get info about last generated voice
- French voice preset (v2/fr_speaker_1) hardcoded
- Automatic download and caching of audio files

## Requirements

- Python 3.8+
- macOS (for audio playback with `afplay`)
- Bark TTS service running on bigorin.local:8766

## Installation

```bash
cd ~/Projects/voice-mcp

# Install dependencies
pip install -r requirements.txt

# Test the server
python3 voice_mcp.py
```

## Configuration

Add to your Claude Desktop configuration file:

```json
{
  "mcpServers": {
    "voice": {
      "command": "python3",
      "args": [
        "/Users/yourname/Projects/voice-mcp/voice_mcp.py"
      ],
      "env": {
        "BARK_BASE_URL": "http://bigorin.local:8766",
        "VOICE_DOWNLOAD_DIR": "/Users/yourname/voice_audio",
        "DEFAULT_VOICE": "v2/fr_speaker_1"
      }
    }
  }
}
```

**Environment Variables:**
- `BARK_BASE_URL` - Bark TTS service URL (default: http://bigorin.local:8766)
- `VOICE_DOWNLOAD_DIR` - Where to save audio files (default: ~/voice_audio)
- `DEFAULT_VOICE` - Voice preset to use (default: v2/fr_speaker_1)

## Usage

### Generate Speech

```python
# In Claude Desktop with MCP enabled
voice_generate("Bonjour, comment allez-vous?")
# Returns: "abc123-def456.wav"
```

The tool will:
1. Submit text to Bark TTS service
2. Poll for completion (up to 120 seconds)
3. Download the WAV file to ~/voice_audio/
4. Return the filename

### Play Audio

```python
voice_play("abc123-def456.wav")
# Returns: "Playing audio: abc123-def456.wav"
```

Plays the audio file using macOS's built-in `afplay` command.

### Get Last Generation Info

```python
voice_get_last()
# Returns: {"job_id": "abc123", "filename": "abc123.wav", "text": "Bonjour..."}
```

## API Reference

### voice_generate(text: str) → str

Generate speech from text using Bark TTS.

**Args:**
- `text` (str): Text to convert to speech

**Returns:**
- Filename of the generated WAV file

**Raises:**
- `RuntimeError`: If generation fails or times out

**Example:**
```python
filename = voice_generate("Bonjour le monde!")
```

### voice_play(filename: str) → str

Play a WAV audio file on macOS.

**Args:**
- `filename` (str): Name of the WAV file to play

**Returns:**
- Confirmation message

**Raises:**
- `FileNotFoundError`: If audio file doesn't exist
- `RuntimeError`: If playback fails

**Example:**
```python
voice_play("abc123-def456.wav")
```

### voice_get_last() → dict

Get information about the last generated voice.

**Returns:**
- Dictionary with job_id, filename, and text

**Example:**
```python
info = voice_get_last()
```

## File Structure

```
voice-mcp/
├── voice_mcp.py                        # MCP server implementation
├── requirements.txt                     # Python dependencies
├── README.md                           # This file
└── claude_desktop_config.example.json  # Example config
```

Downloaded audio files are stored in `~/voice_audio/` by default.

## How It Works

1. **Submit Job**: `voice_generate()` sends text to Bark TTS service
2. **Poll Status**: Checks generation progress every 3 seconds
3. **Download Audio**: When complete, downloads WAV file
4. **Return Filename**: Returns filename for later playback
5. **Play Audio**: `voice_play()` uses macOS `afplay` to play the file

## Troubleshooting

### Connection Refused

If you get connection errors:
```bash
# Check if Bark TTS service is running
curl http://bigorin.local:8766/health
```

### Audio File Not Found

Make sure you're using the exact filename returned by `voice_generate()`:
```python
filename = voice_generate("Test")
voice_play(filename)  # Use the returned filename
```

### afplay Not Found

The `afplay` command is macOS-only. If you're on Linux/Windows, you'll need to modify `voice_play()` to use a different audio player.

## Voice Preset

The default voice is `v2/fr_speaker_1` (French speaker #1). To use a different voice:

1. Edit `.env` or set environment variable:
   ```bash
   export DEFAULT_VOICE=v2/en_speaker_6
   ```

2. See `VOICES.md` in bark-tts project for all available voices (130 total)

## Performance

- **Generation time**: 45-70 seconds for short text (on Jetson AGX Orin 64GB)
- **Cached results**: Instant if same text was generated before
- **Timeout**: 120 seconds (configurable in code)
- **Poll interval**: 3 seconds

## Related Projects

- **bark-tts** - The Bark TTS service this MCP connects to
- **dreamtail-mcp** - Similar MCP for image generation with DreamTail

## License

MIT