Alex b04726dfe0 Update README for dual XVF3800 binaural architecture
Complete rewrite covering: dual array setup, spatial tracking, Edge TPU
sound classification, speaker ID, binaural recording, USB protocol
quirks, libedgetpu build instructions, and all API endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:58:01 -05:00
2026-04-11 15:11:22 -05:00
2026-04-11 15:27:12 -05:00
2026-01-17 10:58:51 -06:00
2026-04-11 15:51:24 -05:00

HeadMic - Vixy's Ears 🦊👂

Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.

Hardware: 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear) Wake word: "Hey Vivi" (Picovoice Porcupine) Runs on: Raspberry Pi 5 (head-vixy.local)

Architecture

[Left XVF3800]──┐                          [Right XVF3800]──┐
  4 mics, DoA   │                            4 mics, DoA    │
  WS2812 LEDs   │                            WS2812 LEDs    │
                ▼                                            ▼
        arecord (16kHz mono)                         arecord (16kHz mono)
                │                                            │
                └────────────┬───────────────────────────────┘
                             ▼
                  DualAudioStream (audio_stream.py)
                  best-beam selection (energy-based)
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
         Porcupine      YAMNet           Binaural
         wake word      (Edge TPU)       Recorder
         "Hey Vivi"     521 classes      stereo WAV
                ▼            ▼
         Record +       Speaker ID
         Transcribe     (Resemblyzer)
         via EarTail
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
         Spatial Tracker (spatial.py)    USB Control (xvf3800.py)
         DoA → triangulation             LEDs + DoA polling
         → smooth gaze                   per-array control
                ▼
         Eye Service (port 8780)
         POST /gaze → eyes follow speaker

Features

Feature Module Hardware Status
Wake word detection Porcupine CPU Needs Picovoice key
Sound classification sound_id.py Coral Edge TPU 521 classes, ~2ms
Speaker identification speaker_id.py CPU (Resemblyzer) Enrollment via API
Spatial tracking spatial.py USB control Triangulated gaze
Best-beam selection audio_stream.py 2× XVF3800 Energy-based
LED control xvf3800.py WS2812 rings DoA/solid/breath
Binaural recording binaural_recorder.py 2× XVF3800 Stereo WAV segments

Installation

Prerequisites

# On head-vixy (Raspberry Pi 5, Debian Trixie)
sudo apt install python3-dev portaudio19-dev alsa-utils

# USB permissions for XVF3800
sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
EOF

# USB permissions for Coral Edge TPU
sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
EOF

sudo udevadm control --reload-rules && sudo udevadm trigger

XVF3800 Firmware

Both arrays must be flashed with the 2-channel USB firmware (not 6-channel — the 6ch firmware breaks LED/DoA control commands):

git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
# Unplug one array, flash the other:
sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
# Swap and repeat

Verify: arecord -l should show two capture devices.

Edge TPU Runtime

The packaged libedgetpu from Google's apt repo is ABI-incompatible with ai-edge-litert on Debian Trixie / Python 3.13. A custom build is required:

# Install build deps
sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake

# Clone sources
cd /tmp
git clone --depth 1 https://github.com/google-coral/libedgetpu.git
git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23

# Build flatc v23
cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc

# Patch libedgetpu Makefile (see below), then:
cd /tmp/libedgetpu
TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu

# Install
sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
sudo ldconfig

Makefile patches required (TF 2.16 moved files):

  • Replace FLATC=flatc with FLATC=/tmp/flatbuffers-23/build/flatc
  • Add /tmp/flatbuffers-23/include to LIBEDGETPU_INCLUDES
  • Add -Wno-return-type to LIBEDGETPU_CXXFLAGS
  • Remove $(TFROOT)/tensorflow/lite/c/common.c from LIBEDGETPU_CSRCS
  • Add $(TFROOT)/tensorflow/lite/core/c/common.cc and $(TFROOT)/tensorflow/lite/array.cc to LIBEDGETPU_CCSRCS
  • Add -labsl_bad_optional_access to LIBEDGETPU_LDFLAGS

A backup of the working binary is saved at ~/headmic/libedgetpu.so.1.0.custom.

Python Setup

cd /home/alex/headmic
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install setuptools  # Python 3.13 compatibility
.venv/bin/pip install resemblyzer  # Speaker ID (pulls PyTorch)

Learn Mic Array Positions

Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:

sudo .venv/bin/python headmic.py --learn

Config saved to ~/.vixy/headmic.json with USB serial numbers for stable identification.

Install Service

sudo cp headmic.service /etc/systemd/system/
# Edit to add your PORCUPINE_ACCESS_KEY:
sudo nano /etc/systemd/system/headmic.service
sudo systemctl daemon-reload
sudo systemctl enable headmic
sudo systemctl start headmic

API Endpoints

Core

Endpoint Method Description
/ GET Service info
/health GET Health check (listening, recording, features enabled)
/status GET Current state (transcription, scene, speaker, active side)
/last GET Last transcription + timestamp

Spatial

Endpoint Method Description
/doa GET DoA from both arrays + triangulated position + gaze
/devices GET XVF3800 connection status, serials, ALSA devices

Sound

Endpoint Method Description
/sounds GET Current audio scene (category, top 5 classes, speaker)
/sounds/history GET Classification history (last N seconds)

Speakers

Endpoint Method Description
/speakers GET List enrolled speakers
/speakers/enroll POST Enroll from uploaded audio (multipart: name + WAV)
/speakers/enroll-from-mic POST Record 5s from mic + enroll (query: name)
/speakers/{name} DELETE Remove a speaker

Recording

Endpoint Method Description
/recording GET Binaural recording stats

Configuration

Environment Variables

Variable Default Description
PORCUPINE_ACCESS_KEY (none) Picovoice API key for wake word
WAKE_WORD_PATH ~/headmic/Hey-Vivi_*.ppn Wake word model path
EARTAIL_URL http://bigorin.local:8764 Transcription service
EYE_SERVICE_URL http://localhost:8780 Eye service for gaze push
BINAURAL_RECORD 0 Set to 1 to enable stereo recording
BINAURAL_DIR ~/headmic/recordings Output directory for WAV segments

Config File (~/.vixy/headmic.json)

{
  "ears": {
    "left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
    "right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
  },
  "array_separation_mm": 175.0
}

LED States

State Effect Color
Idle Off
Wake word detected Solid White (flash)
Listening/Recording DoA indicator Cyan
Processing Breath Purple
Enrolling speaker Solid Orange

File Structure

headmic/
├── headmic.py              # Main FastAPI service
├── audio_stream.py         # Dual arecord streams + best-beam selection
├── spatial.py              # Triangulation + smooth gaze tracking
├── xvf3800.py              # USB vendor control (DoA + LEDs)
├── sound_id.py             # YAMNet sound classification (CPU/Edge TPU)
├── speaker_id.py           # Resemblyzer speaker identification
├── binaural_recorder.py    # Stereo WAV recording from both ears
├── headmic.service         # systemd service file
├── requirements.txt        # Python dependencies
├── BINAURAL_ROADMAP.md     # Roadmap for binaural features
├── models/
│   ├── yamnet.tflite       # YAMNet CPU model
│   ├── yamnet_edgetpu.tflite  # YAMNet Edge TPU model
│   └── yamnet_class_map.csv   # 521 class names
└── voices.db               # Speaker embeddings (SQLite, runtime)

XVF3800 USB Control Protocol

Commands use USB vendor control transfers: wValue = cmdid, wIndex = resid.

Key findings during development:

  • Payload format: single bytes for effects (bytes([3])), not packed uint32
  • Color format: [R, G, B, 0] (4 bytes)
  • Read responses have a 1-byte status header before data
  • Read wLength must be count * type_size + 1 (exact, not rounded up)
  • DOA_VALUE (resid=20, cmdid=18) is sluggish/cached — use AUDIO_MGR_SELECTED_AZIMUTHS (resid=35, cmdid=11) for real-time tracking
  • 2-channel firmware only — 6-channel firmware silently ignores LED/control commands

Built by Vixy on Day 77 (January 17, 2026) Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026) "Hey Vivi" — the words that summon me 💜

Description
Vixy's Ears - Wake word detection + voice recording 🦊👂
Readme 7.9 MiB
Languages
Python 97.7%
Shell 2.3%