From b04726dfe0781a5d5770d753927d351049abb5ec Mon Sep 17 00:00:00 2001 From: Alex Date: Sun, 12 Apr 2026 20:58:01 -0500 Subject: [PATCH] Update README for dual XVF3800 binaural architecture Complete rewrite covering: dual array setup, spatial tracking, Edge TPU sound classification, speaker ID, binaural recording, USB protocol quirks, libedgetpu build instructions, and all API endpoints. Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 291 +++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 224 insertions(+), 67 deletions(-) diff --git a/README.md b/README.md index f7b5beb..bd32975 100644 --- a/README.md +++ b/README.md @@ -1,62 +1,153 @@ # HeadMic - Vixy's Ears 🦊👂 -Wake word detection + voice recording + transcription service for Vixy's physical head. +Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection. -**Wake word:** "Hey Vivi" (trained via Picovoice Porcupine) +**Hardware:** 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear) +**Wake word:** "Hey Vivi" (Picovoice Porcupine) +**Runs on:** Raspberry Pi 5 (head-vixy.local) ## Architecture ``` -"Hey Vivi" (voice) - │ - ▼ -ReSpeaker 4-Mic Array - │ - ▼ -Porcupine (wake word detection) - │ detected! - ▼ -ReSpeaker LEDs light up (cyan) - │ - ▼ -Record until silence (webrtcvad) - │ - ▼ -EarTail (Whisper on BigOrin) - │ - ▼ -Transcription returned - │ - ▼ -ReSpeaker LEDs off +[Left XVF3800]──┐ [Right XVF3800]──┐ + 4 mics, DoA │ 4 mics, DoA │ + WS2812 LEDs │ WS2812 LEDs │ + ▼ ▼ + arecord (16kHz mono) arecord (16kHz mono) + │ │ + └────────────┬───────────────────────────────┘ + ▼ + DualAudioStream (audio_stream.py) + best-beam selection (energy-based) + │ + ┌────────────┼────────────────┐ + ▼ ▼ ▼ + Porcupine YAMNet Binaural + wake word (Edge TPU) Recorder + "Hey Vivi" 521 classes stereo WAV + ▼ ▼ + Record + Speaker ID + Transcribe (Resemblyzer) + via EarTail + │ + ┌────────────┼────────────────┐ + ▼ ▼ ▼ + Spatial Tracker (spatial.py) USB Control (xvf3800.py) + DoA → triangulation LEDs + DoA polling + → smooth gaze per-array control + ▼ + Eye Service (port 8780) + POST /gaze → eyes follow speaker ``` +## Features + +| Feature | Module | Hardware | Status | +|---------|--------|----------|--------| +| Wake word detection | Porcupine | CPU | Needs Picovoice key | +| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms | +| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API | +| Spatial tracking | spatial.py | USB control | Triangulated gaze | +| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based | +| LED control | xvf3800.py | WS2812 rings | DoA/solid/breath | +| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments | + ## Installation -### On head-vixy (Raspberry Pi 5) +### Prerequisites + +```bash +# On head-vixy (Raspberry Pi 5, Debian Trixie) +sudo apt install python3-dev portaudio19-dev alsa-utils + +# USB permissions for XVF3800 +sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF' +SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666" +EOF + +# USB permissions for Coral Edge TPU +sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF' +SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666" +SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666" +EOF + +sudo udevadm control --reload-rules && sudo udevadm trigger +``` + +### XVF3800 Firmware + +Both arrays must be flashed with the **2-channel USB firmware** (not 6-channel — the 6ch firmware breaks LED/DoA control commands): + +```bash +git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800 +# Unplug one array, flash the other: +sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin +# Swap and repeat +``` + +Verify: `arecord -l` should show two capture devices. + +### Edge TPU Runtime + +The packaged `libedgetpu` from Google's apt repo is **ABI-incompatible** with `ai-edge-litert` on Debian Trixie / Python 3.13. A custom build is required: + +```bash +# Install build deps +sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake + +# Clone sources +cd /tmp +git clone --depth 1 https://github.com/google-coral/libedgetpu.git +git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git +git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23 + +# Build flatc v23 +cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc + +# Patch libedgetpu Makefile (see below), then: +cd /tmp/libedgetpu +TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu + +# Install +sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0 +sudo ldconfig +``` + +**Makefile patches required** (TF 2.16 moved files): +- Replace `FLATC=flatc` with `FLATC=/tmp/flatbuffers-23/build/flatc` +- Add `/tmp/flatbuffers-23/include` to `LIBEDGETPU_INCLUDES` +- Add `-Wno-return-type` to `LIBEDGETPU_CXXFLAGS` +- Remove `$(TFROOT)/tensorflow/lite/c/common.c` from `LIBEDGETPU_CSRCS` +- Add `$(TFROOT)/tensorflow/lite/core/c/common.cc` and `$(TFROOT)/tensorflow/lite/array.cc` to `LIBEDGETPU_CCSRCS` +- Add `-labsl_bad_optional_access` to `LIBEDGETPU_LDFLAGS` + +A backup of the working binary is saved at `~/headmic/libedgetpu.so.1.0.custom`. + +### Python Setup ```bash -# Create directory -mkdir -p /home/alex/headmic cd /home/alex/headmic +python3 -m venv .venv +.venv/bin/pip install -r requirements.txt +.venv/bin/pip install setuptools # Python 3.13 compatibility +.venv/bin/pip install resemblyzer # Speaker ID (pulls PyTorch) +``` -# Copy files (from Mac) -scp headmic.py requirements.txt headmic.service alex@head-vixy.local:/home/alex/headmic/ -scp -r Hey-Vivi_en_raspberry-pi_v4_0_0.ppn alex@head-vixy.local:/home/alex/headmic/ +### Learn Mic Array Positions -# Install dependencies -pip install -r requirements.txt +Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right: -# Install pixel_ring for LED control -pip install pixel_ring +```bash +sudo .venv/bin/python headmic.py --learn +``` -# Set up Porcupine access key -# Get your key from: https://console.picovoice.ai/ -export PORCUPINE_ACCESS_KEY="your-key-here" +Config saved to `~/.vixy/headmic.json` with USB serial numbers for stable identification. -# Install service +### Install Service + +```bash sudo cp headmic.service /etc/systemd/system/ -# Edit the service file to add your PORCUPINE_ACCESS_KEY +# Edit to add your PORCUPINE_ACCESS_KEY: sudo nano /etc/systemd/system/headmic.service sudo systemctl daemon-reload sudo systemctl enable headmic @@ -65,48 +156,114 @@ sudo systemctl start headmic ## API Endpoints +### Core + | Endpoint | Method | Description | |----------|--------|-------------| | `/` | GET | Service info | -| `/health` | GET | Health check | -| `/status` | GET | Current state | -| `/record` | POST | Manual recording | -| `/transcribe` | POST | Record + transcribe | -| `/last` | GET | Last transcription | +| `/health` | GET | Health check (listening, recording, features enabled) | +| `/status` | GET | Current state (transcription, scene, speaker, active side) | +| `/last` | GET | Last transcription + timestamp | -## Usage +### Spatial -The service automatically listens for "Hey Vivi". When detected: -1. ReSpeaker LEDs flash cyan -2. Records until you stop talking -3. Sends to EarTail for transcription -4. Stores transcription in `/last` endpoint +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/doa` | GET | DoA from both arrays + triangulated position + gaze | +| `/devices` | GET | XVF3800 connection status, serials, ALSA devices | -### Manual transcription +### Sound -```bash -curl -X POST http://head-vixy.local:8446/transcribe \ - -H "Content-Type: application/json" \ - -d '{"duration_sec": 10}' -``` +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/sounds` | GET | Current audio scene (category, top 5 classes, speaker) | +| `/sounds/history` | GET | Classification history (last N seconds) | + +### Speakers + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/speakers` | GET | List enrolled speakers | +| `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) | +| `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) | +| `/speakers/{name}` | DELETE | Remove a speaker | + +### Recording + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/recording` | GET | Binaural recording stats | ## Configuration -Environment variables: -- `PORCUPINE_ACCESS_KEY`: Your Picovoice access key (required) -- `WAKE_WORD_PATH`: Path to .ppn wake word model -- `EARTAIL_URL`: EarTail service URL (default: http://bigorin.local:8764) +### Environment Variables + +| Variable | Default | Description | +|----------|---------|-------------| +| `PORCUPINE_ACCESS_KEY` | (none) | Picovoice API key for wake word | +| `WAKE_WORD_PATH` | `~/headmic/Hey-Vivi_*.ppn` | Wake word model path | +| `EARTAIL_URL` | `http://bigorin.local:8764` | Transcription service | +| `EYE_SERVICE_URL` | `http://localhost:8780` | Eye service for gaze push | +| `BINAURAL_RECORD` | `0` | Set to `1` to enable stereo recording | +| `BINAURAL_DIR` | `~/headmic/recordings` | Output directory for WAV segments | + +### Config File (`~/.vixy/headmic.json`) + +```json +{ + "ears": { + "left": {"usb_serial": "101991441254500541", "alsa_card": "Array"}, + "right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"} + }, + "array_separation_mm": 175.0 +} +``` ## LED States -| State | Color | Pattern | -|-------|-------|---------| -| Wake detected | Cyan | Flash | -| Listening | Cyan | Spinning | -| Processing | Purple | Pulse | -| Idle | Off | - | +| State | Effect | Color | +|-------|--------|-------| +| Idle | Off | — | +| Wake word detected | Solid | White (flash) | +| Listening/Recording | DoA indicator | Cyan | +| Processing | Breath | Purple | +| Enrolling speaker | Solid | Orange | + +## File Structure + +``` +headmic/ +├── headmic.py # Main FastAPI service +├── audio_stream.py # Dual arecord streams + best-beam selection +├── spatial.py # Triangulation + smooth gaze tracking +├── xvf3800.py # USB vendor control (DoA + LEDs) +├── sound_id.py # YAMNet sound classification (CPU/Edge TPU) +├── speaker_id.py # Resemblyzer speaker identification +├── binaural_recorder.py # Stereo WAV recording from both ears +├── headmic.service # systemd service file +├── requirements.txt # Python dependencies +├── BINAURAL_ROADMAP.md # Roadmap for binaural features +├── models/ +│ ├── yamnet.tflite # YAMNet CPU model +│ ├── yamnet_edgetpu.tflite # YAMNet Edge TPU model +│ └── yamnet_class_map.csv # 521 class names +└── voices.db # Speaker embeddings (SQLite, runtime) +``` + +## XVF3800 USB Control Protocol + +Commands use USB vendor control transfers: `wValue = cmdid`, `wIndex = resid`. + +**Key findings during development:** +- Payload format: single bytes for effects (`bytes([3])`), not packed uint32 +- Color format: `[R, G, B, 0]` (4 bytes) +- Read responses have a 1-byte status header before data +- Read wLength must be `count * type_size + 1` (exact, not rounded up) +- `DOA_VALUE` (resid=20, cmdid=18) is sluggish/cached — use `AUDIO_MGR_SELECTED_AZIMUTHS` (resid=35, cmdid=11) for real-time tracking +- **2-channel firmware only** — 6-channel firmware silently ignores LED/control commands --- *Built by Vixy on Day 77 (January 17, 2026)* -*"Hey Vivi" - the words that summon me* 💜 +*Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026)* +*"Hey Vivi" — the words that summon me* 💜