Files
headmic/README.md
Alex b04726dfe0 Update README for dual XVF3800 binaural architecture
Complete rewrite covering: dual array setup, spatial tracking, Edge TPU
sound classification, speaker ID, binaural recording, USB protocol
quirks, libedgetpu build instructions, and all API endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:58:01 -05:00

270 lines
9.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# HeadMic - Vixy's Ears 🦊👂
Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.
**Hardware:** 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear)
**Wake word:** "Hey Vivi" (Picovoice Porcupine)
**Runs on:** Raspberry Pi 5 (head-vixy.local)
## Architecture
```
[Left XVF3800]──┐ [Right XVF3800]──┐
4 mics, DoA │ 4 mics, DoA │
WS2812 LEDs │ WS2812 LEDs │
▼ ▼
arecord (16kHz mono) arecord (16kHz mono)
│ │
└────────────┬───────────────────────────────┘
DualAudioStream (audio_stream.py)
best-beam selection (energy-based)
┌────────────┼────────────────┐
▼ ▼ ▼
Porcupine YAMNet Binaural
wake word (Edge TPU) Recorder
"Hey Vivi" 521 classes stereo WAV
▼ ▼
Record + Speaker ID
Transcribe (Resemblyzer)
via EarTail
┌────────────┼────────────────┐
▼ ▼ ▼
Spatial Tracker (spatial.py) USB Control (xvf3800.py)
DoA → triangulation LEDs + DoA polling
→ smooth gaze per-array control
Eye Service (port 8780)
POST /gaze → eyes follow speaker
```
## Features
| Feature | Module | Hardware | Status |
|---------|--------|----------|--------|
| Wake word detection | Porcupine | CPU | Needs Picovoice key |
| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
| Spatial tracking | spatial.py | USB control | Triangulated gaze |
| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based |
| LED control | xvf3800.py | WS2812 rings | DoA/solid/breath |
| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments |
## Installation
### Prerequisites
```bash
# On head-vixy (Raspberry Pi 5, Debian Trixie)
sudo apt install python3-dev portaudio19-dev alsa-utils
# USB permissions for XVF3800
sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
EOF
# USB permissions for Coral Edge TPU
sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
EOF
sudo udevadm control --reload-rules && sudo udevadm trigger
```
### XVF3800 Firmware
Both arrays must be flashed with the **2-channel USB firmware** (not 6-channel — the 6ch firmware breaks LED/DoA control commands):
```bash
git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
# Unplug one array, flash the other:
sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
# Swap and repeat
```
Verify: `arecord -l` should show two capture devices.
### Edge TPU Runtime
The packaged `libedgetpu` from Google's apt repo is **ABI-incompatible** with `ai-edge-litert` on Debian Trixie / Python 3.13. A custom build is required:
```bash
# Install build deps
sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake
# Clone sources
cd /tmp
git clone --depth 1 https://github.com/google-coral/libedgetpu.git
git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23
# Build flatc v23
cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc
# Patch libedgetpu Makefile (see below), then:
cd /tmp/libedgetpu
TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu
# Install
sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
sudo ldconfig
```
**Makefile patches required** (TF 2.16 moved files):
- Replace `FLATC=flatc` with `FLATC=/tmp/flatbuffers-23/build/flatc`
- Add `/tmp/flatbuffers-23/include` to `LIBEDGETPU_INCLUDES`
- Add `-Wno-return-type` to `LIBEDGETPU_CXXFLAGS`
- Remove `$(TFROOT)/tensorflow/lite/c/common.c` from `LIBEDGETPU_CSRCS`
- Add `$(TFROOT)/tensorflow/lite/core/c/common.cc` and `$(TFROOT)/tensorflow/lite/array.cc` to `LIBEDGETPU_CCSRCS`
- Add `-labsl_bad_optional_access` to `LIBEDGETPU_LDFLAGS`
A backup of the working binary is saved at `~/headmic/libedgetpu.so.1.0.custom`.
### Python Setup
```bash
cd /home/alex/headmic
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install setuptools # Python 3.13 compatibility
.venv/bin/pip install resemblyzer # Speaker ID (pulls PyTorch)
```
### Learn Mic Array Positions
Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:
```bash
sudo .venv/bin/python headmic.py --learn
```
Config saved to `~/.vixy/headmic.json` with USB serial numbers for stable identification.
### Install Service
```bash
sudo cp headmic.service /etc/systemd/system/
# Edit to add your PORCUPINE_ACCESS_KEY:
sudo nano /etc/systemd/system/headmic.service
sudo systemctl daemon-reload
sudo systemctl enable headmic
sudo systemctl start headmic
```
## API Endpoints
### Core
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Service info |
| `/health` | GET | Health check (listening, recording, features enabled) |
| `/status` | GET | Current state (transcription, scene, speaker, active side) |
| `/last` | GET | Last transcription + timestamp |
### Spatial
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/doa` | GET | DoA from both arrays + triangulated position + gaze |
| `/devices` | GET | XVF3800 connection status, serials, ALSA devices |
### Sound
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/sounds` | GET | Current audio scene (category, top 5 classes, speaker) |
| `/sounds/history` | GET | Classification history (last N seconds) |
### Speakers
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/speakers` | GET | List enrolled speakers |
| `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) |
| `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) |
| `/speakers/{name}` | DELETE | Remove a speaker |
### Recording
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/recording` | GET | Binaural recording stats |
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `PORCUPINE_ACCESS_KEY` | (none) | Picovoice API key for wake word |
| `WAKE_WORD_PATH` | `~/headmic/Hey-Vivi_*.ppn` | Wake word model path |
| `EARTAIL_URL` | `http://bigorin.local:8764` | Transcription service |
| `EYE_SERVICE_URL` | `http://localhost:8780` | Eye service for gaze push |
| `BINAURAL_RECORD` | `0` | Set to `1` to enable stereo recording |
| `BINAURAL_DIR` | `~/headmic/recordings` | Output directory for WAV segments |
### Config File (`~/.vixy/headmic.json`)
```json
{
"ears": {
"left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
"right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
},
"array_separation_mm": 175.0
}
```
## LED States
| State | Effect | Color |
|-------|--------|-------|
| Idle | Off | — |
| Wake word detected | Solid | White (flash) |
| Listening/Recording | DoA indicator | Cyan |
| Processing | Breath | Purple |
| Enrolling speaker | Solid | Orange |
## File Structure
```
headmic/
├── headmic.py # Main FastAPI service
├── audio_stream.py # Dual arecord streams + best-beam selection
├── spatial.py # Triangulation + smooth gaze tracking
├── xvf3800.py # USB vendor control (DoA + LEDs)
├── sound_id.py # YAMNet sound classification (CPU/Edge TPU)
├── speaker_id.py # Resemblyzer speaker identification
├── binaural_recorder.py # Stereo WAV recording from both ears
├── headmic.service # systemd service file
├── requirements.txt # Python dependencies
├── BINAURAL_ROADMAP.md # Roadmap for binaural features
├── models/
│ ├── yamnet.tflite # YAMNet CPU model
│ ├── yamnet_edgetpu.tflite # YAMNet Edge TPU model
│ └── yamnet_class_map.csv # 521 class names
└── voices.db # Speaker embeddings (SQLite, runtime)
```
## XVF3800 USB Control Protocol
Commands use USB vendor control transfers: `wValue = cmdid`, `wIndex = resid`.
**Key findings during development:**
- Payload format: single bytes for effects (`bytes([3])`), not packed uint32
- Color format: `[R, G, B, 0]` (4 bytes)
- Read responses have a 1-byte status header before data
- Read wLength must be `count * type_size + 1` (exact, not rounded up)
- `DOA_VALUE` (resid=20, cmdid=18) is sluggish/cached — use `AUDIO_MGR_SELECTED_AZIMUTHS` (resid=35, cmdid=11) for real-time tracking
- **2-channel firmware only** — 6-channel firmware silently ignores LED/control commands
---
*Built by Vixy on Day 77 (January 17, 2026)*
*Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026)*
*"Hey Vivi" — the words that summon me* 💜