Update docs — spatial scene, distance estimation, roadmap progress
README: Updated architecture diagram, features table, new endpoints (/scene, /scene/events, /scene/heatmap), file structure, USB protocol notes (VAD from processed_doa NaN, spenergy always zero). BINAURAL_ROADMAP: Mark #1-4, #6, #8, #10 as done. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
55
README.md
55
README.md
@@ -18,26 +18,28 @@ Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial
|
||||
└────────────┬───────────────────────────────┘
|
||||
▼
|
||||
DualAudioStream (audio_stream.py)
|
||||
best-beam selection (energy-based)
|
||||
best-beam selection (energy-based, 10% hysteresis)
|
||||
│
|
||||
┌──────────────────┼──────────────────────┐
|
||||
▼ ▼ ▼
|
||||
Porcupine YAMNet Binaural
|
||||
wake word (Edge TPU) Recorder
|
||||
"Hey Vivi" 521 classes stereo WAV
|
||||
▼ ▼
|
||||
Record + Speaker ID
|
||||
Transcribe (Resemblyzer)
|
||||
via EarTail │
|
||||
▼
|
||||
Spatial Tracker (spatial.py)
|
||||
DoA → triangulation → ILD distance
|
||||
→ smooth gaze → proximity zones
|
||||
│
|
||||
┌────────────┼────────────────┐
|
||||
▼ ▼ ▼
|
||||
Porcupine YAMNet Binaural
|
||||
wake word (Edge TPU) Recorder
|
||||
"Hey Vivi" 521 classes stereo WAV
|
||||
▼ ▼
|
||||
Record + Speaker ID
|
||||
Transcribe (Resemblyzer)
|
||||
via EarTail
|
||||
│
|
||||
┌────────────┼────────────────┐
|
||||
▼ ▼ ▼
|
||||
Spatial Tracker (spatial.py) USB Control (xvf3800.py)
|
||||
DoA → triangulation LEDs + DoA polling
|
||||
→ smooth gaze per-array control
|
||||
▼
|
||||
Eye Service (port 8780)
|
||||
POST /gaze → eyes follow speaker
|
||||
Eye Service Spatial Scene USB Control
|
||||
POST /gaze (spatial_scene) (xvf3800.py)
|
||||
eyes follow what+where map LEDs + DoA
|
||||
the speaker anomaly detect per-array
|
||||
```
|
||||
|
||||
## Features
|
||||
@@ -47,10 +49,13 @@ Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial
|
||||
| Wake word detection | Porcupine | CPU | Needs Picovoice key |
|
||||
| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
|
||||
| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
|
||||
| Spatial tracking | spatial.py | USB control | Triangulated gaze |
|
||||
| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based |
|
||||
| Spatial tracking | spatial.py | USB control | Triangulated gaze + ILD distance |
|
||||
| Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) |
|
||||
| Spatial scene mapping | spatial_scene.py | — | Learns where sounds come from, anomaly detection |
|
||||
| Sound event localization | spatial_scene.py | — | What + where + when log |
|
||||
| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based, 10% hysteresis |
|
||||
| LED control | xvf3800.py | WS2812 rings | DoA/solid/breath |
|
||||
| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments |
|
||||
| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments (opt-in) |
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -169,8 +174,11 @@ sudo systemctl start headmic
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/doa` | GET | DoA from both arrays + triangulated position + gaze |
|
||||
| `/doa` | GET | DoA from both arrays + triangulated position + gaze + distance + proximity |
|
||||
| `/devices` | GET | XVF3800 connection status, serials, ALSA devices |
|
||||
| `/scene` | GET | Learned spatial scene (usual direction per category) + last anomaly |
|
||||
| `/scene/events` | GET | Recent sound events with what + where + when (query: seconds, category) |
|
||||
| `/scene/heatmap` | GET | Per-category angular distribution for visualization |
|
||||
|
||||
### Sound
|
||||
|
||||
@@ -235,7 +243,8 @@ sudo systemctl start headmic
|
||||
headmic/
|
||||
├── headmic.py # Main FastAPI service
|
||||
├── audio_stream.py # Dual arecord streams + best-beam selection
|
||||
├── spatial.py # Triangulation + smooth gaze tracking
|
||||
├── spatial.py # Triangulation + ILD distance + smooth gaze + proximity
|
||||
├── spatial_scene.py # Spatial audio scene map + anomaly detection
|
||||
├── xvf3800.py # USB vendor control (DoA + LEDs)
|
||||
├── sound_id.py # YAMNet sound classification (CPU/Edge TPU)
|
||||
├── speaker_id.py # Resemblyzer speaker identification
|
||||
@@ -260,6 +269,8 @@ Commands use USB vendor control transfers: `wValue = cmdid`, `wIndex = resid`.
|
||||
- Read responses have a 1-byte status header before data
|
||||
- Read wLength must be `count * type_size + 1` (exact, not rounded up)
|
||||
- `DOA_VALUE` (resid=20, cmdid=18) is sluggish/cached — use `AUDIO_MGR_SELECTED_AZIMUTHS` (resid=35, cmdid=11) for real-time tracking
|
||||
- `AUDIO_MGR_SELECTED_AZIMUTHS` returns 2 floats (radians): index 0 = processed DoA (NaN = no speech = VAD indicator), index 1 = auto-select beam (always tracks strongest source)
|
||||
- `AEC_SPENERGY_VALUES` (resid=33, cmdid=80) is always zero on 2-channel firmware — don't rely on it
|
||||
- **2-channel firmware only** — 6-channel firmware silently ignores LED/control commands
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user