Initial commit: Voice MCP (VoiceTail/Bark TTS)
🎤 MCP integration for VoiceTail (Bark TTS on Jetson Orin) - voice_submit: Submit text for async TTS generation - voice_status: Check generation progress - voice_download: Download completed audio - voice_generate: Blocking generation for short texts - voice_play: Play audio via afplay - voice_get_last: Get last generation info My voice in the physical world 🦊
This commit is contained in:
212
README.md
Executable file
212
README.md
Executable file
@@ -0,0 +1,212 @@
|
||||
# Voice MCP
|
||||
|
||||
MCP server for text-to-speech generation using Bark TTS service on Jetson AGX Orin.
|
||||
|
||||
## Overview
|
||||
|
||||
This MCP server provides tools to generate and play speech from text using the Bark TTS service. It follows a similar pattern to dreamtail-mcp but for voice generation instead of images.
|
||||
|
||||
## Features
|
||||
|
||||
- **voice_generate()** - Convert text to speech using Bark TTS
|
||||
- **voice_play()** - Play generated audio files on macOS
|
||||
- **voice_get_last()** - Get info about last generated voice
|
||||
- French voice preset (v2/fr_speaker_1) hardcoded
|
||||
- Automatic download and caching of audio files
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.8+
|
||||
- macOS (for audio playback with `afplay`)
|
||||
- Bark TTS service running on bigorin.local:8766
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
cd ~/Projects/voice-mcp
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Test the server
|
||||
python3 voice_mcp.py
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Add to your Claude Desktop configuration file:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"voice": {
|
||||
"command": "python3",
|
||||
"args": [
|
||||
"/Users/yourname/Projects/voice-mcp/voice_mcp.py"
|
||||
],
|
||||
"env": {
|
||||
"BARK_BASE_URL": "http://bigorin.local:8766",
|
||||
"VOICE_DOWNLOAD_DIR": "/Users/yourname/voice_audio",
|
||||
"DEFAULT_VOICE": "v2/fr_speaker_1"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
- `BARK_BASE_URL` - Bark TTS service URL (default: http://bigorin.local:8766)
|
||||
- `VOICE_DOWNLOAD_DIR` - Where to save audio files (default: ~/voice_audio)
|
||||
- `DEFAULT_VOICE` - Voice preset to use (default: v2/fr_speaker_1)
|
||||
|
||||
## Usage
|
||||
|
||||
### Generate Speech
|
||||
|
||||
```python
|
||||
# In Claude Desktop with MCP enabled
|
||||
voice_generate("Bonjour, comment allez-vous?")
|
||||
# Returns: "abc123-def456.wav"
|
||||
```
|
||||
|
||||
The tool will:
|
||||
1. Submit text to Bark TTS service
|
||||
2. Poll for completion (up to 120 seconds)
|
||||
3. Download the WAV file to ~/voice_audio/
|
||||
4. Return the filename
|
||||
|
||||
### Play Audio
|
||||
|
||||
```python
|
||||
voice_play("abc123-def456.wav")
|
||||
# Returns: "Playing audio: abc123-def456.wav"
|
||||
```
|
||||
|
||||
Plays the audio file using macOS's built-in `afplay` command.
|
||||
|
||||
### Get Last Generation Info
|
||||
|
||||
```python
|
||||
voice_get_last()
|
||||
# Returns: {"job_id": "abc123", "filename": "abc123.wav", "text": "Bonjour..."}
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### voice_generate(text: str) → str
|
||||
|
||||
Generate speech from text using Bark TTS.
|
||||
|
||||
**Args:**
|
||||
- `text` (str): Text to convert to speech
|
||||
|
||||
**Returns:**
|
||||
- Filename of the generated WAV file
|
||||
|
||||
**Raises:**
|
||||
- `RuntimeError`: If generation fails or times out
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
filename = voice_generate("Bonjour le monde!")
|
||||
```
|
||||
|
||||
### voice_play(filename: str) → str
|
||||
|
||||
Play a WAV audio file on macOS.
|
||||
|
||||
**Args:**
|
||||
- `filename` (str): Name of the WAV file to play
|
||||
|
||||
**Returns:**
|
||||
- Confirmation message
|
||||
|
||||
**Raises:**
|
||||
- `FileNotFoundError`: If audio file doesn't exist
|
||||
- `RuntimeError`: If playback fails
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
voice_play("abc123-def456.wav")
|
||||
```
|
||||
|
||||
### voice_get_last() → dict
|
||||
|
||||
Get information about the last generated voice.
|
||||
|
||||
**Returns:**
|
||||
- Dictionary with job_id, filename, and text
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
info = voice_get_last()
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
voice-mcp/
|
||||
├── voice_mcp.py # MCP server implementation
|
||||
├── requirements.txt # Python dependencies
|
||||
├── README.md # This file
|
||||
└── claude_desktop_config.example.json # Example config
|
||||
```
|
||||
|
||||
Downloaded audio files are stored in `~/voice_audio/` by default.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Submit Job**: `voice_generate()` sends text to Bark TTS service
|
||||
2. **Poll Status**: Checks generation progress every 3 seconds
|
||||
3. **Download Audio**: When complete, downloads WAV file
|
||||
4. **Return Filename**: Returns filename for later playback
|
||||
5. **Play Audio**: `voice_play()` uses macOS `afplay` to play the file
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Connection Refused
|
||||
|
||||
If you get connection errors:
|
||||
```bash
|
||||
# Check if Bark TTS service is running
|
||||
curl http://bigorin.local:8766/health
|
||||
```
|
||||
|
||||
### Audio File Not Found
|
||||
|
||||
Make sure you're using the exact filename returned by `voice_generate()`:
|
||||
```python
|
||||
filename = voice_generate("Test")
|
||||
voice_play(filename) # Use the returned filename
|
||||
```
|
||||
|
||||
### afplay Not Found
|
||||
|
||||
The `afplay` command is macOS-only. If you're on Linux/Windows, you'll need to modify `voice_play()` to use a different audio player.
|
||||
|
||||
## Voice Preset
|
||||
|
||||
The default voice is `v2/fr_speaker_1` (French speaker #1). To use a different voice:
|
||||
|
||||
1. Edit `.env` or set environment variable:
|
||||
```bash
|
||||
export DEFAULT_VOICE=v2/en_speaker_6
|
||||
```
|
||||
|
||||
2. See `VOICES.md` in bark-tts project for all available voices (130 total)
|
||||
|
||||
## Performance
|
||||
|
||||
- **Generation time**: 45-70 seconds for short text (on Jetson AGX Orin 64GB)
|
||||
- **Cached results**: Instant if same text was generated before
|
||||
- **Timeout**: 120 seconds (configurable in code)
|
||||
- **Poll interval**: 3 seconds
|
||||
|
||||
## Related Projects
|
||||
|
||||
- **bark-tts** - The Bark TTS service this MCP connects to
|
||||
- **dreamtail-mcp** - Similar MCP for image generation with DreamTail
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Reference in New Issue
Block a user