Go to file

Alex 88fb18800c Fix VAD — use processed_doa NaN as speech indicator

The auto-select beam always returns an angle (even for noise), so
VAD was always true. The processed_doa (index 0) is NaN when no
speech is present and a real angle when speech is detected.
Now: angle from auto-select beam, VAD from processed_doa being non-NaN.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-12 21:08:50 -05:00

docs/plans

Add design doc for speaker identification with Resemblyzer

2026-02-01 21:16:09 -06:00

models

Add YAMNet models (CPU + Edge TPU compiled) to version control

2026-04-11 17:22:45 -05:00

.gitignore

Add YAMNet models (CPU + Edge TPU compiled) to version control

2026-04-11 17:22:45 -05:00

audio_stream.py

updates for dual mic array

2026-04-11 15:11:22 -05:00

binaural_recorder.py

Add binaural recording + tune spatial tracking

2026-04-12 20:53:05 -05:00

headmic.py

Add binaural recording + tune spatial tracking

2026-04-12 20:53:05 -05:00

headmic.service

service should use venv

2026-04-11 15:27:12 -05:00

PLANNING.md

Initial commit: HeadMic service - Vixy's Ears 🦊👂

2026-01-17 10:58:51 -06:00

README.md

Update README for dual XVF3800 binaural architecture

2026-04-12 20:58:01 -05:00

requirements.txt

fix leds

2026-04-11 15:51:24 -05:00

sound_id.py

Add Edge TPU subprocess probe to safely detect segfaults

2026-04-11 17:40:03 -05:00

spatial.py

Add binaural recording + tune spatial tracking

2026-04-12 20:53:05 -05:00

speaker_id.py

Add speaker identification with Resemblyzer

2026-02-01 21:21:02 -06:00

xvf3800.py

Fix VAD — use processed_doa NaN as speech indicator

2026-04-12 21:08:50 -05:00

README.md

HeadMic - Vixy's Ears 🦊👂

Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.

Hardware: 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear) Wake word: "Hey Vivi" (Picovoice Porcupine) Runs on: Raspberry Pi 5 (head-vixy.local)

Architecture

[Left XVF3800]──┐                          [Right XVF3800]──┐
  4 mics, DoA   │                            4 mics, DoA    │
  WS2812 LEDs   │                            WS2812 LEDs    │
                ▼                                            ▼
        arecord (16kHz mono)                         arecord (16kHz mono)
                │                                            │
                └────────────┬───────────────────────────────┘
                             ▼
                  DualAudioStream (audio_stream.py)
                  best-beam selection (energy-based)
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
         Porcupine      YAMNet           Binaural
         wake word      (Edge TPU)       Recorder
         "Hey Vivi"     521 classes      stereo WAV
                ▼            ▼
         Record +       Speaker ID
         Transcribe     (Resemblyzer)
         via EarTail
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
         Spatial Tracker (spatial.py)    USB Control (xvf3800.py)
         DoA → triangulation             LEDs + DoA polling
         → smooth gaze                   per-array control
                ▼
         Eye Service (port 8780)
         POST /gaze → eyes follow speaker

Features

Feature	Module	Hardware	Status
Wake word detection	Porcupine	CPU	Needs Picovoice key
Sound classification	sound_id.py	Coral Edge TPU	521 classes, ~2ms
Speaker identification	speaker_id.py	CPU (Resemblyzer)	Enrollment via API
Spatial tracking	spatial.py	USB control	Triangulated gaze
Best-beam selection	audio_stream.py	2× XVF3800	Energy-based
LED control	xvf3800.py	WS2812 rings	DoA/solid/breath
Binaural recording	binaural_recorder.py	2× XVF3800	Stereo WAV segments

Installation

Prerequisites

# On head-vixy (Raspberry Pi 5, Debian Trixie)
sudo apt install python3-dev portaudio19-dev alsa-utils

# USB permissions for XVF3800
sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
EOF

# USB permissions for Coral Edge TPU
sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
EOF

sudo udevadm control --reload-rules && sudo udevadm trigger

XVF3800 Firmware

Both arrays must be flashed with the 2-channel USB firmware (not 6-channel — the 6ch firmware breaks LED/DoA control commands):

git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
# Unplug one array, flash the other:
sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
# Swap and repeat

Verify: arecord -l should show two capture devices.

Edge TPU Runtime

The packaged libedgetpu from Google's apt repo is ABI-incompatible with ai-edge-litert on Debian Trixie / Python 3.13. A custom build is required:

# Install build deps
sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake

# Clone sources
cd /tmp
git clone --depth 1 https://github.com/google-coral/libedgetpu.git
git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23

# Build flatc v23
cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc

# Patch libedgetpu Makefile (see below), then:
cd /tmp/libedgetpu
TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu

# Install
sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
sudo ldconfig

Makefile patches required (TF 2.16 moved files):

Replace FLATC=flatc with FLATC=/tmp/flatbuffers-23/build/flatc
Add /tmp/flatbuffers-23/include to LIBEDGETPU_INCLUDES
Add -Wno-return-type to LIBEDGETPU_CXXFLAGS
Remove $(TFROOT)/tensorflow/lite/c/common.c from LIBEDGETPU_CSRCS
Add $(TFROOT)/tensorflow/lite/core/c/common.cc and $(TFROOT)/tensorflow/lite/array.cc to LIBEDGETPU_CCSRCS
Add -labsl_bad_optional_access to LIBEDGETPU_LDFLAGS

A backup of the working binary is saved at ~/headmic/libedgetpu.so.1.0.custom.

Python Setup

cd /home/alex/headmic
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install setuptools  # Python 3.13 compatibility
.venv/bin/pip install resemblyzer  # Speaker ID (pulls PyTorch)

Learn Mic Array Positions

Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:

sudo .venv/bin/python headmic.py --learn

Config saved to ~/.vixy/headmic.json with USB serial numbers for stable identification.

Install Service

sudo cp headmic.service /etc/systemd/system/
# Edit to add your PORCUPINE_ACCESS_KEY:
sudo nano /etc/systemd/system/headmic.service
sudo systemctl daemon-reload
sudo systemctl enable headmic
sudo systemctl start headmic

API Endpoints

Core

Endpoint	Method	Description
`/`	GET	Service info
`/health`	GET	Health check (listening, recording, features enabled)
`/status`	GET	Current state (transcription, scene, speaker, active side)
`/last`	GET	Last transcription + timestamp

Spatial

Endpoint	Method	Description
`/doa`	GET	DoA from both arrays + triangulated position + gaze
`/devices`	GET	XVF3800 connection status, serials, ALSA devices

Sound

Endpoint	Method	Description
`/sounds`	GET	Current audio scene (category, top 5 classes, speaker)
`/sounds/history`	GET	Classification history (last N seconds)

Speakers

Endpoint	Method	Description
`/speakers`	GET	List enrolled speakers
`/speakers/enroll`	POST	Enroll from uploaded audio (multipart: name + WAV)
`/speakers/enroll-from-mic`	POST	Record 5s from mic + enroll (query: name)
`/speakers/{name}`	DELETE	Remove a speaker

Recording

Endpoint	Method	Description
`/recording`	GET	Binaural recording stats

Configuration

Environment Variables

Variable	Default	Description
`PORCUPINE_ACCESS_KEY`	(none)	Picovoice API key for wake word
`WAKE_WORD_PATH`	`~/headmic/Hey-Vivi_*.ppn`	Wake word model path
`EARTAIL_URL`	`http://bigorin.local:8764`	Transcription service
`EYE_SERVICE_URL`	`http://localhost:8780`	Eye service for gaze push
`BINAURAL_RECORD`	`0`	Set to `1` to enable stereo recording
`BINAURAL_DIR`	`~/headmic/recordings`	Output directory for WAV segments

Config File (`~/.vixy/headmic.json`)

{
  "ears": {
    "left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
    "right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
  },
  "array_separation_mm": 175.0
}

LED States

State	Effect	Color
Idle	Off	—
Wake word detected	Solid	White (flash)
Listening/Recording	DoA indicator	Cyan
Processing	Breath	Purple
Enrolling speaker	Solid	Orange

File Structure

headmic/
├── headmic.py              # Main FastAPI service
├── audio_stream.py         # Dual arecord streams + best-beam selection
├── spatial.py              # Triangulation + smooth gaze tracking
├── xvf3800.py              # USB vendor control (DoA + LEDs)
├── sound_id.py             # YAMNet sound classification (CPU/Edge TPU)
├── speaker_id.py           # Resemblyzer speaker identification
├── binaural_recorder.py    # Stereo WAV recording from both ears
├── headmic.service         # systemd service file
├── requirements.txt        # Python dependencies
├── BINAURAL_ROADMAP.md     # Roadmap for binaural features
├── models/
│   ├── yamnet.tflite       # YAMNet CPU model
│   ├── yamnet_edgetpu.tflite  # YAMNet Edge TPU model
│   └── yamnet_class_map.csv   # 521 class names
└── voices.db               # Speaker embeddings (SQLite, runtime)

XVF3800 USB Control Protocol

Commands use USB vendor control transfers: wValue = cmdid, wIndex = resid.

Key findings during development:

Payload format: single bytes for effects (bytes([3])), not packed uint32
Color format: [R, G, B, 0] (4 bytes)
Read responses have a 1-byte status header before data
Read wLength must be count * type_size + 1 (exact, not rounded up)
DOA_VALUE (resid=20, cmdid=18) is sluggish/cached — use AUDIO_MGR_SELECTED_AZIMUTHS (resid=35, cmdid=11) for real-time tracking
2-channel firmware only — 6-channel firmware silently ignores LED/control commands

Built by Vixy on Day 77 (January 17, 2026) Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026) "Hey Vivi" — the words that summon me 💜

README.md Unescape Escape

HeadMic - Vixy's Ears 🦊👂

Architecture

Features

Installation

Prerequisites

XVF3800 Firmware

Edge TPU Runtime

Python Setup

Learn Mic Array Positions

Install Service

API Endpoints

Core

Spatial

Sound

Speakers

Recording

Configuration

Environment Variables

Config File (~/.vixy/headmic.json)

LED States

File Structure

XVF3800 USB Control Protocol

README.md

Config File (`~/.vixy/headmic.json`)