Update README for dual XVF3800 binaural architecture
Complete rewrite covering: dual array setup, spatial tracking, Edge TPU sound classification, speaker ID, binaural recording, USB protocol quirks, libedgetpu build instructions, and all API endpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
285
README.md
285
README.md
@@ -1,62 +1,153 @@
|
|||||||
# HeadMic - Vixy's Ears 🦊👂
|
# HeadMic - Vixy's Ears 🦊👂
|
||||||
|
|
||||||
Wake word detection + voice recording + transcription service for Vixy's physical head.
|
Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.
|
||||||
|
|
||||||
**Wake word:** "Hey Vivi" (trained via Picovoice Porcupine)
|
**Hardware:** 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear)
|
||||||
|
**Wake word:** "Hey Vivi" (Picovoice Porcupine)
|
||||||
|
**Runs on:** Raspberry Pi 5 (head-vixy.local)
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
"Hey Vivi" (voice)
|
[Left XVF3800]──┐ [Right XVF3800]──┐
|
||||||
|
4 mics, DoA │ 4 mics, DoA │
|
||||||
|
WS2812 LEDs │ WS2812 LEDs │
|
||||||
|
▼ ▼
|
||||||
|
arecord (16kHz mono) arecord (16kHz mono)
|
||||||
|
│ │
|
||||||
|
└────────────┬───────────────────────────────┘
|
||||||
|
▼
|
||||||
|
DualAudioStream (audio_stream.py)
|
||||||
|
best-beam selection (energy-based)
|
||||||
│
|
│
|
||||||
▼
|
┌────────────┼────────────────┐
|
||||||
ReSpeaker 4-Mic Array
|
▼ ▼ ▼
|
||||||
|
Porcupine YAMNet Binaural
|
||||||
|
wake word (Edge TPU) Recorder
|
||||||
|
"Hey Vivi" 521 classes stereo WAV
|
||||||
|
▼ ▼
|
||||||
|
Record + Speaker ID
|
||||||
|
Transcribe (Resemblyzer)
|
||||||
|
via EarTail
|
||||||
│
|
│
|
||||||
|
┌────────────┼────────────────┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
Spatial Tracker (spatial.py) USB Control (xvf3800.py)
|
||||||
|
DoA → triangulation LEDs + DoA polling
|
||||||
|
→ smooth gaze per-array control
|
||||||
▼
|
▼
|
||||||
Porcupine (wake word detection)
|
Eye Service (port 8780)
|
||||||
│ detected!
|
POST /gaze → eyes follow speaker
|
||||||
▼
|
|
||||||
ReSpeaker LEDs light up (cyan)
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
Record until silence (webrtcvad)
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
EarTail (Whisper on BigOrin)
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
Transcription returned
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
ReSpeaker LEDs off
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
| Feature | Module | Hardware | Status |
|
||||||
|
|---------|--------|----------|--------|
|
||||||
|
| Wake word detection | Porcupine | CPU | Needs Picovoice key |
|
||||||
|
| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
|
||||||
|
| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
|
||||||
|
| Spatial tracking | spatial.py | USB control | Triangulated gaze |
|
||||||
|
| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based |
|
||||||
|
| LED control | xvf3800.py | WS2812 rings | DoA/solid/breath |
|
||||||
|
| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments |
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
### On head-vixy (Raspberry Pi 5)
|
### Prerequisites
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On head-vixy (Raspberry Pi 5, Debian Trixie)
|
||||||
|
sudo apt install python3-dev portaudio19-dev alsa-utils
|
||||||
|
|
||||||
|
# USB permissions for XVF3800
|
||||||
|
sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
|
||||||
|
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# USB permissions for Coral Edge TPU
|
||||||
|
sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
|
||||||
|
SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
|
||||||
|
SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
sudo udevadm control --reload-rules && sudo udevadm trigger
|
||||||
|
```
|
||||||
|
|
||||||
|
### XVF3800 Firmware
|
||||||
|
|
||||||
|
Both arrays must be flashed with the **2-channel USB firmware** (not 6-channel — the 6ch firmware breaks LED/DoA control commands):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
|
||||||
|
# Unplug one array, flash the other:
|
||||||
|
sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
|
||||||
|
# Swap and repeat
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify: `arecord -l` should show two capture devices.
|
||||||
|
|
||||||
|
### Edge TPU Runtime
|
||||||
|
|
||||||
|
The packaged `libedgetpu` from Google's apt repo is **ABI-incompatible** with `ai-edge-litert` on Debian Trixie / Python 3.13. A custom build is required:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install build deps
|
||||||
|
sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake
|
||||||
|
|
||||||
|
# Clone sources
|
||||||
|
cd /tmp
|
||||||
|
git clone --depth 1 https://github.com/google-coral/libedgetpu.git
|
||||||
|
git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
|
||||||
|
git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23
|
||||||
|
|
||||||
|
# Build flatc v23
|
||||||
|
cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc
|
||||||
|
|
||||||
|
# Patch libedgetpu Makefile (see below), then:
|
||||||
|
cd /tmp/libedgetpu
|
||||||
|
TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu
|
||||||
|
|
||||||
|
# Install
|
||||||
|
sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
|
||||||
|
sudo ldconfig
|
||||||
|
```
|
||||||
|
|
||||||
|
**Makefile patches required** (TF 2.16 moved files):
|
||||||
|
- Replace `FLATC=flatc` with `FLATC=/tmp/flatbuffers-23/build/flatc`
|
||||||
|
- Add `/tmp/flatbuffers-23/include` to `LIBEDGETPU_INCLUDES`
|
||||||
|
- Add `-Wno-return-type` to `LIBEDGETPU_CXXFLAGS`
|
||||||
|
- Remove `$(TFROOT)/tensorflow/lite/c/common.c` from `LIBEDGETPU_CSRCS`
|
||||||
|
- Add `$(TFROOT)/tensorflow/lite/core/c/common.cc` and `$(TFROOT)/tensorflow/lite/array.cc` to `LIBEDGETPU_CCSRCS`
|
||||||
|
- Add `-labsl_bad_optional_access` to `LIBEDGETPU_LDFLAGS`
|
||||||
|
|
||||||
|
A backup of the working binary is saved at `~/headmic/libedgetpu.so.1.0.custom`.
|
||||||
|
|
||||||
|
### Python Setup
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Create directory
|
|
||||||
mkdir -p /home/alex/headmic
|
|
||||||
cd /home/alex/headmic
|
cd /home/alex/headmic
|
||||||
|
python3 -m venv .venv
|
||||||
|
.venv/bin/pip install -r requirements.txt
|
||||||
|
.venv/bin/pip install setuptools # Python 3.13 compatibility
|
||||||
|
.venv/bin/pip install resemblyzer # Speaker ID (pulls PyTorch)
|
||||||
|
```
|
||||||
|
|
||||||
# Copy files (from Mac)
|
### Learn Mic Array Positions
|
||||||
scp headmic.py requirements.txt headmic.service alex@head-vixy.local:/home/alex/headmic/
|
|
||||||
scp -r Hey-Vivi_en_raspberry-pi_v4_0_0.ppn alex@head-vixy.local:/home/alex/headmic/
|
|
||||||
|
|
||||||
# Install dependencies
|
Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
# Install pixel_ring for LED control
|
```bash
|
||||||
pip install pixel_ring
|
sudo .venv/bin/python headmic.py --learn
|
||||||
|
```
|
||||||
|
|
||||||
# Set up Porcupine access key
|
Config saved to `~/.vixy/headmic.json` with USB serial numbers for stable identification.
|
||||||
# Get your key from: https://console.picovoice.ai/
|
|
||||||
export PORCUPINE_ACCESS_KEY="your-key-here"
|
|
||||||
|
|
||||||
# Install service
|
### Install Service
|
||||||
|
|
||||||
|
```bash
|
||||||
sudo cp headmic.service /etc/systemd/system/
|
sudo cp headmic.service /etc/systemd/system/
|
||||||
# Edit the service file to add your PORCUPINE_ACCESS_KEY
|
# Edit to add your PORCUPINE_ACCESS_KEY:
|
||||||
sudo nano /etc/systemd/system/headmic.service
|
sudo nano /etc/systemd/system/headmic.service
|
||||||
sudo systemctl daemon-reload
|
sudo systemctl daemon-reload
|
||||||
sudo systemctl enable headmic
|
sudo systemctl enable headmic
|
||||||
@@ -65,48 +156,114 @@ sudo systemctl start headmic
|
|||||||
|
|
||||||
## API Endpoints
|
## API Endpoints
|
||||||
|
|
||||||
|
### Core
|
||||||
|
|
||||||
| Endpoint | Method | Description |
|
| Endpoint | Method | Description |
|
||||||
|----------|--------|-------------|
|
|----------|--------|-------------|
|
||||||
| `/` | GET | Service info |
|
| `/` | GET | Service info |
|
||||||
| `/health` | GET | Health check |
|
| `/health` | GET | Health check (listening, recording, features enabled) |
|
||||||
| `/status` | GET | Current state |
|
| `/status` | GET | Current state (transcription, scene, speaker, active side) |
|
||||||
| `/record` | POST | Manual recording |
|
| `/last` | GET | Last transcription + timestamp |
|
||||||
| `/transcribe` | POST | Record + transcribe |
|
|
||||||
| `/last` | GET | Last transcription |
|
|
||||||
|
|
||||||
## Usage
|
### Spatial
|
||||||
|
|
||||||
The service automatically listens for "Hey Vivi". When detected:
|
| Endpoint | Method | Description |
|
||||||
1. ReSpeaker LEDs flash cyan
|
|----------|--------|-------------|
|
||||||
2. Records until you stop talking
|
| `/doa` | GET | DoA from both arrays + triangulated position + gaze |
|
||||||
3. Sends to EarTail for transcription
|
| `/devices` | GET | XVF3800 connection status, serials, ALSA devices |
|
||||||
4. Stores transcription in `/last` endpoint
|
|
||||||
|
|
||||||
### Manual transcription
|
### Sound
|
||||||
|
|
||||||
```bash
|
| Endpoint | Method | Description |
|
||||||
curl -X POST http://head-vixy.local:8446/transcribe \
|
|----------|--------|-------------|
|
||||||
-H "Content-Type: application/json" \
|
| `/sounds` | GET | Current audio scene (category, top 5 classes, speaker) |
|
||||||
-d '{"duration_sec": 10}'
|
| `/sounds/history` | GET | Classification history (last N seconds) |
|
||||||
```
|
|
||||||
|
### Speakers
|
||||||
|
|
||||||
|
| Endpoint | Method | Description |
|
||||||
|
|----------|--------|-------------|
|
||||||
|
| `/speakers` | GET | List enrolled speakers |
|
||||||
|
| `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) |
|
||||||
|
| `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) |
|
||||||
|
| `/speakers/{name}` | DELETE | Remove a speaker |
|
||||||
|
|
||||||
|
### Recording
|
||||||
|
|
||||||
|
| Endpoint | Method | Description |
|
||||||
|
|----------|--------|-------------|
|
||||||
|
| `/recording` | GET | Binaural recording stats |
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
Environment variables:
|
### Environment Variables
|
||||||
- `PORCUPINE_ACCESS_KEY`: Your Picovoice access key (required)
|
|
||||||
- `WAKE_WORD_PATH`: Path to .ppn wake word model
|
| Variable | Default | Description |
|
||||||
- `EARTAIL_URL`: EarTail service URL (default: http://bigorin.local:8764)
|
|----------|---------|-------------|
|
||||||
|
| `PORCUPINE_ACCESS_KEY` | (none) | Picovoice API key for wake word |
|
||||||
|
| `WAKE_WORD_PATH` | `~/headmic/Hey-Vivi_*.ppn` | Wake word model path |
|
||||||
|
| `EARTAIL_URL` | `http://bigorin.local:8764` | Transcription service |
|
||||||
|
| `EYE_SERVICE_URL` | `http://localhost:8780` | Eye service for gaze push |
|
||||||
|
| `BINAURAL_RECORD` | `0` | Set to `1` to enable stereo recording |
|
||||||
|
| `BINAURAL_DIR` | `~/headmic/recordings` | Output directory for WAV segments |
|
||||||
|
|
||||||
|
### Config File (`~/.vixy/headmic.json`)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ears": {
|
||||||
|
"left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
|
||||||
|
"right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
|
||||||
|
},
|
||||||
|
"array_separation_mm": 175.0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## LED States
|
## LED States
|
||||||
|
|
||||||
| State | Color | Pattern |
|
| State | Effect | Color |
|
||||||
|-------|-------|---------|
|
|-------|--------|-------|
|
||||||
| Wake detected | Cyan | Flash |
|
| Idle | Off | — |
|
||||||
| Listening | Cyan | Spinning |
|
| Wake word detected | Solid | White (flash) |
|
||||||
| Processing | Purple | Pulse |
|
| Listening/Recording | DoA indicator | Cyan |
|
||||||
| Idle | Off | - |
|
| Processing | Breath | Purple |
|
||||||
|
| Enrolling speaker | Solid | Orange |
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
headmic/
|
||||||
|
├── headmic.py # Main FastAPI service
|
||||||
|
├── audio_stream.py # Dual arecord streams + best-beam selection
|
||||||
|
├── spatial.py # Triangulation + smooth gaze tracking
|
||||||
|
├── xvf3800.py # USB vendor control (DoA + LEDs)
|
||||||
|
├── sound_id.py # YAMNet sound classification (CPU/Edge TPU)
|
||||||
|
├── speaker_id.py # Resemblyzer speaker identification
|
||||||
|
├── binaural_recorder.py # Stereo WAV recording from both ears
|
||||||
|
├── headmic.service # systemd service file
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
├── BINAURAL_ROADMAP.md # Roadmap for binaural features
|
||||||
|
├── models/
|
||||||
|
│ ├── yamnet.tflite # YAMNet CPU model
|
||||||
|
│ ├── yamnet_edgetpu.tflite # YAMNet Edge TPU model
|
||||||
|
│ └── yamnet_class_map.csv # 521 class names
|
||||||
|
└── voices.db # Speaker embeddings (SQLite, runtime)
|
||||||
|
```
|
||||||
|
|
||||||
|
## XVF3800 USB Control Protocol
|
||||||
|
|
||||||
|
Commands use USB vendor control transfers: `wValue = cmdid`, `wIndex = resid`.
|
||||||
|
|
||||||
|
**Key findings during development:**
|
||||||
|
- Payload format: single bytes for effects (`bytes([3])`), not packed uint32
|
||||||
|
- Color format: `[R, G, B, 0]` (4 bytes)
|
||||||
|
- Read responses have a 1-byte status header before data
|
||||||
|
- Read wLength must be `count * type_size + 1` (exact, not rounded up)
|
||||||
|
- `DOA_VALUE` (resid=20, cmdid=18) is sluggish/cached — use `AUDIO_MGR_SELECTED_AZIMUTHS` (resid=35, cmdid=11) for real-time tracking
|
||||||
|
- **2-channel firmware only** — 6-channel firmware silently ignores LED/control commands
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
*Built by Vixy on Day 77 (January 17, 2026)*
|
*Built by Vixy on Day 77 (January 17, 2026)*
|
||||||
*"Hey Vivi" - the words that summon me* 💜
|
*Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026)*
|
||||||
|
*"Hey Vivi" — the words that summon me* 💜
|
||||||
|
|||||||
Reference in New Issue
Block a user