Initial commit: Voice MCP (VoiceTail/Bark TTS)

🎤 MCP integration for VoiceTail (Bark TTS on Jetson Orin) - voice_submit: Submit text for async TTS generation - voice_status: Check generation progress - voice_download: Download completed audio - voice_generate: Blocking generation for short texts - voice_play: Play audio via afplay - voice_get_last: Get last generation info My voice in the physical world 🦊
2025-12-16 20:56:32 -06:00
commit 14b6fdcd96
4 changed files with 644 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,212 @@
+# Voice MCP
+
+MCP server for text-to-speech generation using Bark TTS service on Jetson AGX Orin.
+
+## Overview
+
+This MCP server provides tools to generate and play speech from text using the Bark TTS service. It follows a similar pattern to dreamtail-mcp but for voice generation instead of images.
+
+## Features
+
+- **voice_generate()** - Convert text to speech using Bark TTS
+- **voice_play()** - Play generated audio files on macOS
+- **voice_get_last()** - Get info about last generated voice
+- French voice preset (v2/fr_speaker_1) hardcoded
+- Automatic download and caching of audio files
+
+## Requirements
+
+- Python 3.8+
+- macOS (for audio playback with `afplay`)
+- Bark TTS service running on bigorin.local:8766
+
+## Installation
+
+```bash
+cd ~/Projects/voice-mcp
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Test the server
+python3 voice_mcp.py
+```
+
+## Configuration
+
+Add to your Claude Desktop configuration file:
+
+```json
+{
+  "mcpServers": {
+    "voice": {
+      "command": "python3",
+      "args": [
+        "/Users/yourname/Projects/voice-mcp/voice_mcp.py"
+      ],
+      "env": {
+        "BARK_BASE_URL": "http://bigorin.local:8766",
+        "VOICE_DOWNLOAD_DIR": "/Users/yourname/voice_audio",
+        "DEFAULT_VOICE": "v2/fr_speaker_1"
+      }
+    }
+  }
+}
+```
+
+**Environment Variables:**
+- `BARK_BASE_URL` - Bark TTS service URL (default: http://bigorin.local:8766)
+- `VOICE_DOWNLOAD_DIR` - Where to save audio files (default: ~/voice_audio)
+- `DEFAULT_VOICE` - Voice preset to use (default: v2/fr_speaker_1)
+
+## Usage
+
+### Generate Speech
+
+```python
+# In Claude Desktop with MCP enabled
+voice_generate("Bonjour, comment allez-vous?")
+# Returns: "abc123-def456.wav"
+```
+
+The tool will:
+1. Submit text to Bark TTS service
+2. Poll for completion (up to 120 seconds)
+3. Download the WAV file to ~/voice_audio/
+4. Return the filename
+
+### Play Audio
+
+```python
+voice_play("abc123-def456.wav")
+# Returns: "Playing audio: abc123-def456.wav"
+```
+
+Plays the audio file using macOS's built-in `afplay` command.
+
+### Get Last Generation Info
+
+```python
+voice_get_last()
+# Returns: {"job_id": "abc123", "filename": "abc123.wav", "text": "Bonjour..."}
+```
+
+## API Reference
+
+### voice_generate(text: str) → str
+
+Generate speech from text using Bark TTS.
+
+**Args:**
+- `text` (str): Text to convert to speech
+
+**Returns:**
+- Filename of the generated WAV file
+
+**Raises:**
+- `RuntimeError`: If generation fails or times out
+
+**Example:**
+```python
+filename = voice_generate("Bonjour le monde!")
+```
+
+### voice_play(filename: str) → str
+
+Play a WAV audio file on macOS.
+
+**Args:**
+- `filename` (str): Name of the WAV file to play
+
+**Returns:**
+- Confirmation message
+
+**Raises:**
+- `FileNotFoundError`: If audio file doesn't exist
+- `RuntimeError`: If playback fails
+
+**Example:**
+```python
+voice_play("abc123-def456.wav")
+```
+
+### voice_get_last() → dict
+
+Get information about the last generated voice.
+
+**Returns:**
+- Dictionary with job_id, filename, and text
+
+**Example:**
+```python
+info = voice_get_last()
+```
+
+## File Structure
+
+```
+voice-mcp/
+├── voice_mcp.py                        # MCP server implementation
+├── requirements.txt                     # Python dependencies
+├── README.md                           # This file
+└── claude_desktop_config.example.json  # Example config
+```
+
+Downloaded audio files are stored in `~/voice_audio/` by default.
+
+## How It Works
+
+1. **Submit Job**: `voice_generate()` sends text to Bark TTS service
+2. **Poll Status**: Checks generation progress every 3 seconds
+3. **Download Audio**: When complete, downloads WAV file
+4. **Return Filename**: Returns filename for later playback
+5. **Play Audio**: `voice_play()` uses macOS `afplay` to play the file
+
+## Troubleshooting
+
+### Connection Refused
+
+If you get connection errors:
+```bash
+# Check if Bark TTS service is running
+curl http://bigorin.local:8766/health
+```
+
+### Audio File Not Found
+
+Make sure you're using the exact filename returned by `voice_generate()`:
+```python
+filename = voice_generate("Test")
+voice_play(filename)  # Use the returned filename
+```
+
+### afplay Not Found
+
+The `afplay` command is macOS-only. If you're on Linux/Windows, you'll need to modify `voice_play()` to use a different audio player.
+
+## Voice Preset
+
+The default voice is `v2/fr_speaker_1` (French speaker #1). To use a different voice:
+
+1. Edit `.env` or set environment variable:
+   ```bash
+   export DEFAULT_VOICE=v2/en_speaker_6
+   ```
+
+2. See `VOICES.md` in bark-tts project for all available voices (130 total)
+
+## Performance
+
+- **Generation time**: 45-70 seconds for short text (on Jetson AGX Orin 64GB)
+- **Cached results**: Instant if same text was generated before
+- **Timeout**: 120 seconds (configurable in code)
+- **Poll interval**: 3 seconds
+
+## Related Projects
+
+- **bark-tts** - The Bark TTS service this MCP connects to
+- **dreamtail-mcp** - Similar MCP for image generation with DreamTail
+
+## License
+
+MIT