Files
voice-mcp/README.md
Alex Kazaiev 14b6fdcd96 Initial commit: Voice MCP (VoiceTail/Bark TTS)
🎤 MCP integration for VoiceTail (Bark TTS on Jetson Orin)
- voice_submit: Submit text for async TTS generation
- voice_status: Check generation progress
- voice_download: Download completed audio
- voice_generate: Blocking generation for short texts
- voice_play: Play audio via afplay
- voice_get_last: Get last generation info

My voice in the physical world 🦊
2025-12-16 20:56:32 -06:00

213 lines
4.8 KiB
Markdown
Executable File

# Voice MCP
MCP server for text-to-speech generation using Bark TTS service on Jetson AGX Orin.
## Overview
This MCP server provides tools to generate and play speech from text using the Bark TTS service. It follows a similar pattern to dreamtail-mcp but for voice generation instead of images.
## Features
- **voice_generate()** - Convert text to speech using Bark TTS
- **voice_play()** - Play generated audio files on macOS
- **voice_get_last()** - Get info about last generated voice
- French voice preset (v2/fr_speaker_1) hardcoded
- Automatic download and caching of audio files
## Requirements
- Python 3.8+
- macOS (for audio playback with `afplay`)
- Bark TTS service running on bigorin.local:8766
## Installation
```bash
cd ~/Projects/voice-mcp
# Install dependencies
pip install -r requirements.txt
# Test the server
python3 voice_mcp.py
```
## Configuration
Add to your Claude Desktop configuration file:
```json
{
"mcpServers": {
"voice": {
"command": "python3",
"args": [
"/Users/yourname/Projects/voice-mcp/voice_mcp.py"
],
"env": {
"BARK_BASE_URL": "http://bigorin.local:8766",
"VOICE_DOWNLOAD_DIR": "/Users/yourname/voice_audio",
"DEFAULT_VOICE": "v2/fr_speaker_1"
}
}
}
}
```
**Environment Variables:**
- `BARK_BASE_URL` - Bark TTS service URL (default: http://bigorin.local:8766)
- `VOICE_DOWNLOAD_DIR` - Where to save audio files (default: ~/voice_audio)
- `DEFAULT_VOICE` - Voice preset to use (default: v2/fr_speaker_1)
## Usage
### Generate Speech
```python
# In Claude Desktop with MCP enabled
voice_generate("Bonjour, comment allez-vous?")
# Returns: "abc123-def456.wav"
```
The tool will:
1. Submit text to Bark TTS service
2. Poll for completion (up to 120 seconds)
3. Download the WAV file to ~/voice_audio/
4. Return the filename
### Play Audio
```python
voice_play("abc123-def456.wav")
# Returns: "Playing audio: abc123-def456.wav"
```
Plays the audio file using macOS's built-in `afplay` command.
### Get Last Generation Info
```python
voice_get_last()
# Returns: {"job_id": "abc123", "filename": "abc123.wav", "text": "Bonjour..."}
```
## API Reference
### voice_generate(text: str) → str
Generate speech from text using Bark TTS.
**Args:**
- `text` (str): Text to convert to speech
**Returns:**
- Filename of the generated WAV file
**Raises:**
- `RuntimeError`: If generation fails or times out
**Example:**
```python
filename = voice_generate("Bonjour le monde!")
```
### voice_play(filename: str) → str
Play a WAV audio file on macOS.
**Args:**
- `filename` (str): Name of the WAV file to play
**Returns:**
- Confirmation message
**Raises:**
- `FileNotFoundError`: If audio file doesn't exist
- `RuntimeError`: If playback fails
**Example:**
```python
voice_play("abc123-def456.wav")
```
### voice_get_last() → dict
Get information about the last generated voice.
**Returns:**
- Dictionary with job_id, filename, and text
**Example:**
```python
info = voice_get_last()
```
## File Structure
```
voice-mcp/
├── voice_mcp.py # MCP server implementation
├── requirements.txt # Python dependencies
├── README.md # This file
└── claude_desktop_config.example.json # Example config
```
Downloaded audio files are stored in `~/voice_audio/` by default.
## How It Works
1. **Submit Job**: `voice_generate()` sends text to Bark TTS service
2. **Poll Status**: Checks generation progress every 3 seconds
3. **Download Audio**: When complete, downloads WAV file
4. **Return Filename**: Returns filename for later playback
5. **Play Audio**: `voice_play()` uses macOS `afplay` to play the file
## Troubleshooting
### Connection Refused
If you get connection errors:
```bash
# Check if Bark TTS service is running
curl http://bigorin.local:8766/health
```
### Audio File Not Found
Make sure you're using the exact filename returned by `voice_generate()`:
```python
filename = voice_generate("Test")
voice_play(filename) # Use the returned filename
```
### afplay Not Found
The `afplay` command is macOS-only. If you're on Linux/Windows, you'll need to modify `voice_play()` to use a different audio player.
## Voice Preset
The default voice is `v2/fr_speaker_1` (French speaker #1). To use a different voice:
1. Edit `.env` or set environment variable:
```bash
export DEFAULT_VOICE=v2/en_speaker_6
```
2. See `VOICES.md` in bark-tts project for all available voices (130 total)
## Performance
- **Generation time**: 45-70 seconds for short text (on Jetson AGX Orin 64GB)
- **Cached results**: Instant if same text was generated before
- **Timeout**: 120 seconds (configurable in code)
- **Poll interval**: 3 seconds
## Related Projects
- **bark-tts** - The Bark TTS service this MCP connects to
- **dreamtail-mcp** - Similar MCP for image generation with DreamTail
## License
MIT