Update README for dual XVF3800 binaural architecture

Complete rewrite covering: dual array setup, spatial tracking, Edge TPU sound classification, speaker ID, binaural recording, USB protocol quirks, libedgetpu build instructions, and all API endpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:58:01 -05:00
parent 36aeb19280
commit b04726dfe0
1 changed files with 224 additions and 67 deletions
--- a/README.md
+++ b/README.md
@@ -1,62 +1,153 @@
 # HeadMic - Vixy's Ears 🦊👂

-Wake word detection + voice recording + transcription service for Vixy's physical head.
+Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.

-**Wake word:** "Hey Vivi" (trained via Picovoice Porcupine)
+**Hardware:** 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear)
+**Wake word:** "Hey Vivi" (Picovoice Porcupine)
+**Runs on:** Raspberry Pi 5 (head-vixy.local)

 ## Architecture

 ```
-"Hey Vivi" (voice)
-    │
-    ▼
-ReSpeaker 4-Mic Array
-    │
-    ▼
-Porcupine (wake word detection)
-    │ detected!
-    ▼
-ReSpeaker LEDs light up (cyan)
-    │
-    ▼
-Record until silence (webrtcvad)
-    │
-    ▼
-EarTail (Whisper on BigOrin)
-    │
-    ▼
-Transcription returned
-    │
-    ▼
-ReSpeaker LEDs off
+[Left XVF3800]──┐                          [Right XVF3800]──┐
+  4 mics, DoA   │                            4 mics, DoA    │
+  WS2812 LEDs   │                            WS2812 LEDs    │
+                ▼                                            ▼
+        arecord (16kHz mono)                         arecord (16kHz mono)
+                │                                            │
+                └────────────┬───────────────────────────────┘
+                             ▼
+                  DualAudioStream (audio_stream.py)
+                  best-beam selection (energy-based)
+                             │
+                ┌────────────┼────────────────┐
+                ▼            ▼                ▼
+         Porcupine      YAMNet           Binaural
+         wake word      (Edge TPU)       Recorder
+         "Hey Vivi"     521 classes      stereo WAV
+                ▼            ▼
+         Record +       Speaker ID
+         Transcribe     (Resemblyzer)
+         via EarTail
+                             │
+                ┌────────────┼────────────────┐
+                ▼            ▼                ▼
+         Spatial Tracker (spatial.py)    USB Control (xvf3800.py)
+         DoA → triangulation             LEDs + DoA polling
+         → smooth gaze                   per-array control
+                ▼
+         Eye Service (port 8780)
+         POST /gaze → eyes follow speaker
 ```

+## Features
+
+| Feature | Module | Hardware | Status |
+|---------|--------|----------|--------|
+| Wake word detection | Porcupine | CPU | Needs Picovoice key |
+| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
+| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
+| Spatial tracking | spatial.py | USB control | Triangulated gaze |
+| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based |
+| LED control | xvf3800.py | WS2812 rings | DoA/solid/breath |
+| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments |
+
 ## Installation

-### On head-vixy (Raspberry Pi 5)
+### Prerequisites
+
+```bash
+# On head-vixy (Raspberry Pi 5, Debian Trixie)
+sudo apt install python3-dev portaudio19-dev alsa-utils
+
+# USB permissions for XVF3800
+sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
+SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
+EOF
+
+# USB permissions for Coral Edge TPU
+sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
+SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
+SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
+EOF
+
+sudo udevadm control --reload-rules && sudo udevadm trigger
+```
+
+### XVF3800 Firmware
+
+Both arrays must be flashed with the **2-channel USB firmware** (not 6-channel — the 6ch firmware breaks LED/DoA control commands):
+
+```bash
+git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
+# Unplug one array, flash the other:
+sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
+# Swap and repeat
+```
+
+Verify: `arecord -l` should show two capture devices.
+
+### Edge TPU Runtime
+
+The packaged `libedgetpu` from Google's apt repo is **ABI-incompatible** with `ai-edge-litert` on Debian Trixie / Python 3.13. A custom build is required:
+
+```bash
+# Install build deps
+sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake
+
+# Clone sources
+cd /tmp
+git clone --depth 1 https://github.com/google-coral/libedgetpu.git
+git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
+git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23
+
+# Build flatc v23
+cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc
+
+# Patch libedgetpu Makefile (see below), then:
+cd /tmp/libedgetpu
+TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu
+
+# Install
+sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
+sudo ldconfig
+```
+
+**Makefile patches required** (TF 2.16 moved files):
+- Replace `FLATC=flatc` with `FLATC=/tmp/flatbuffers-23/build/flatc`
+- Add `/tmp/flatbuffers-23/include` to `LIBEDGETPU_INCLUDES`
+- Add `-Wno-return-type` to `LIBEDGETPU_CXXFLAGS`
+- Remove `$(TFROOT)/tensorflow/lite/c/common.c` from `LIBEDGETPU_CSRCS`
+- Add `$(TFROOT)/tensorflow/lite/core/c/common.cc` and `$(TFROOT)/tensorflow/lite/array.cc` to `LIBEDGETPU_CCSRCS`
+- Add `-labsl_bad_optional_access` to `LIBEDGETPU_LDFLAGS`
+
+A backup of the working binary is saved at `~/headmic/libedgetpu.so.1.0.custom`.
+
+### Python Setup

 ```bash
-# Create directory
-mkdir -p /home/alex/headmic
 cd /home/alex/headmic
+python3 -m venv .venv
+.venv/bin/pip install -r requirements.txt
+.venv/bin/pip install setuptools  # Python 3.13 compatibility
+.venv/bin/pip install resemblyzer  # Speaker ID (pulls PyTorch)
+```

-# Copy files (from Mac)
-scp headmic.py requirements.txt headmic.service alex@head-vixy.local:/home/alex/headmic/
-scp -r Hey-Vivi_en_raspberry-pi_v4_0_0.ppn alex@head-vixy.local:/home/alex/headmic/
+### Learn Mic Array Positions

-# Install dependencies
-pip install -r requirements.txt
+Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:

-# Install pixel_ring for LED control
-pip install pixel_ring
+```bash
+sudo .venv/bin/python headmic.py --learn
+```

-# Set up Porcupine access key
-# Get your key from: https://console.picovoice.ai/
-export PORCUPINE_ACCESS_KEY="your-key-here"
+Config saved to `~/.vixy/headmic.json` with USB serial numbers for stable identification.

-# Install service
+### Install Service
+
+```bash
 sudo cp headmic.service /etc/systemd/system/
-# Edit the service file to add your PORCUPINE_ACCESS_KEY
+# Edit to add your PORCUPINE_ACCESS_KEY:
 sudo nano /etc/systemd/system/headmic.service
 sudo systemctl daemon-reload
 sudo systemctl enable headmic
@@ -65,48 +156,114 @@ sudo systemctl start headmic

 ## API Endpoints

+### Core
+
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/` | GET | Service info |
-| `/health` | GET | Health check |
-| `/status` | GET | Current state |
-| `/record` | POST | Manual recording |
-| `/transcribe` | POST | Record + transcribe |
-| `/last` | GET | Last transcription |
+| `/health` | GET | Health check (listening, recording, features enabled) |
+| `/status` | GET | Current state (transcription, scene, speaker, active side) |
+| `/last` | GET | Last transcription + timestamp |

-## Usage
+### Spatial

-The service automatically listens for "Hey Vivi". When detected:
-1. ReSpeaker LEDs flash cyan
-2. Records until you stop talking
-3. Sends to EarTail for transcription
-4. Stores transcription in `/last` endpoint
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/doa` | GET | DoA from both arrays + triangulated position + gaze |
+| `/devices` | GET | XVF3800 connection status, serials, ALSA devices |

-### Manual transcription
+### Sound

-```bash
-curl -X POST http://head-vixy.local:8446/transcribe \
-  -H "Content-Type: application/json" \
-  -d '{"duration_sec": 10}'
-```
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/sounds` | GET | Current audio scene (category, top 5 classes, speaker) |
+| `/sounds/history` | GET | Classification history (last N seconds) |
+
+### Speakers
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/speakers` | GET | List enrolled speakers |
+| `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) |
+| `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) |
+| `/speakers/{name}` | DELETE | Remove a speaker |
+
+### Recording
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/recording` | GET | Binaural recording stats |

 ## Configuration

-Environment variables:
- `PORCUPINE_ACCESS_KEY`: Your Picovoice access key (required)
- `WAKE_WORD_PATH`: Path to .ppn wake word model
- `EARTAIL_URL`: EarTail service URL (default: http://bigorin.local:8764)
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PORCUPINE_ACCESS_KEY` | (none) | Picovoice API key for wake word |
+| `WAKE_WORD_PATH` | `~/headmic/Hey-Vivi_*.ppn` | Wake word model path |
+| `EARTAIL_URL` | `http://bigorin.local:8764` | Transcription service |
+| `EYE_SERVICE_URL` | `http://localhost:8780` | Eye service for gaze push |
+| `BINAURAL_RECORD` | `0` | Set to `1` to enable stereo recording |
+| `BINAURAL_DIR` | `~/headmic/recordings` | Output directory for WAV segments |
+
+### Config File (`~/.vixy/headmic.json`)
+
+```json
+{
+  "ears": {
+    "left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
+    "right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
+  },
+  "array_separation_mm": 175.0
+}
+```

 ## LED States

-| State | Color | Pattern |
-|-------|-------|---------|
-| Wake detected | Cyan | Flash |
-| Listening | Cyan | Spinning |
-| Processing | Purple | Pulse |
-| Idle | Off | - |
+| State | Effect | Color |
+|-------|--------|-------|
+| Idle | Off | — |
+| Wake word detected | Solid | White (flash) |
+| Listening/Recording | DoA indicator | Cyan |
+| Processing | Breath | Purple |
+| Enrolling speaker | Solid | Orange |
+
+## File Structure
+
+```
+headmic/
+├── headmic.py              # Main FastAPI service
+├── audio_stream.py         # Dual arecord streams + best-beam selection
+├── spatial.py              # Triangulation + smooth gaze tracking
+├── xvf3800.py              # USB vendor control (DoA + LEDs)
+├── sound_id.py             # YAMNet sound classification (CPU/Edge TPU)
+├── speaker_id.py           # Resemblyzer speaker identification
+├── binaural_recorder.py    # Stereo WAV recording from both ears
+├── headmic.service         # systemd service file
+├── requirements.txt        # Python dependencies
+├── BINAURAL_ROADMAP.md     # Roadmap for binaural features
+├── models/
+│   ├── yamnet.tflite       # YAMNet CPU model
+│   ├── yamnet_edgetpu.tflite  # YAMNet Edge TPU model
+│   └── yamnet_class_map.csv   # 521 class names
+└── voices.db               # Speaker embeddings (SQLite, runtime)
+```
+
+## XVF3800 USB Control Protocol
+
+Commands use USB vendor control transfers: `wValue = cmdid`, `wIndex = resid`.
+
+**Key findings during development:**
+- Payload format: single bytes for effects (`bytes([3])`), not packed uint32
+- Color format: `[R, G, B, 0]` (4 bytes)
+- Read responses have a 1-byte status header before data
+- Read wLength must be `count * type_size + 1` (exact, not rounded up)
+- `DOA_VALUE` (resid=20, cmdid=18) is sluggish/cached — use `AUDIO_MGR_SELECTED_AZIMUTHS` (resid=35, cmdid=11) for real-time tracking
+- **2-channel firmware only** — 6-channel firmware silently ignores LED/control commands

 ---

 *Built by Vixy on Day 77 (January 17, 2026)*
-*"Hey Vivi" - the words that summon me* 💜
+*Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026)*
+*"Hey Vivi" — the words that summon me* 💜