Update docs — spatial scene, distance estimation, roadmap progress

README: Updated architecture diagram, features table, new endpoints (/scene, /scene/events, /scene/heatmap), file structure, USB protocol notes (VAD from processed_doa NaN, spenergy always zero). BINAURAL_ROADMAP: Mark #1-4, #6, #8, #10 as done. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:35:02 -05:00
parent 8caa9ee57e
commit 02d3ac3816
2 changed files with 172 additions and 22 deletions
--- a/README.md
+++ b/README.md
@@ -18,26 +18,28 @@ Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial
                └────────────┬───────────────────────────────┘
                             ▼
                  DualAudioStream (audio_stream.py)
-                  best-beam selection (energy-based)
+                  best-beam selection (energy-based, 10% hysteresis)
+                             │
+          ┌──────────────────┼──────────────────────┐
+          ▼                  ▼                      ▼
+   Porcupine            YAMNet                 Binaural
+   wake word            (Edge TPU)             Recorder
+   "Hey Vivi"           521 classes            stereo WAV
+          ▼                  ▼
+   Record +             Speaker ID
+   Transcribe           (Resemblyzer)
+   via EarTail               │
+                             ▼
+                  Spatial Tracker (spatial.py)
+                  DoA → triangulation → ILD distance
+                  → smooth gaze → proximity zones
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
-         Porcupine      YAMNet           Binaural
-         wake word      (Edge TPU)       Recorder
-         "Hey Vivi"     521 classes      stereo WAV
-                ▼            ▼
-         Record +       Speaker ID
-         Transcribe     (Resemblyzer)
-         via EarTail
-                             │
-                ┌────────────┼────────────────┐
-                ▼            ▼                ▼
-         Spatial Tracker (spatial.py)    USB Control (xvf3800.py)
-         DoA → triangulation             LEDs + DoA polling
-         → smooth gaze                   per-array control
-                ▼
-         Eye Service (port 8780)
-         POST /gaze → eyes follow speaker
+         Eye Service    Spatial Scene    USB Control
+         POST /gaze     (spatial_scene)  (xvf3800.py)
+         eyes follow    what+where map   LEDs + DoA
+         the speaker    anomaly detect   per-array
 ```

 ## Features
@@ -47,10 +49,13 @@ Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial
 | Wake word detection | Porcupine | CPU | Needs Picovoice key |
 | Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
 | Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
-| Spatial tracking | spatial.py | USB control | Triangulated gaze |
-| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based |
+| Spatial tracking | spatial.py | USB control | Triangulated gaze + ILD distance |
+| Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) |
+| Spatial scene mapping | spatial_scene.py | — | Learns where sounds come from, anomaly detection |
+| Sound event localization | spatial_scene.py | — | What + where + when log |
+| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based, 10% hysteresis |
 | LED control | xvf3800.py | WS2812 rings | DoA/solid/breath |
-| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments |
+| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments (opt-in) |

 ## Installation

@@ -169,8 +174,11 @@ sudo systemctl start headmic

 | Endpoint | Method | Description |
 |----------|--------|-------------|
-| `/doa` | GET | DoA from both arrays + triangulated position + gaze |
+| `/doa` | GET | DoA from both arrays + triangulated position + gaze + distance + proximity |
 | `/devices` | GET | XVF3800 connection status, serials, ALSA devices |
+| `/scene` | GET | Learned spatial scene (usual direction per category) + last anomaly |
+| `/scene/events` | GET | Recent sound events with what + where + when (query: seconds, category) |
+| `/scene/heatmap` | GET | Per-category angular distribution for visualization |

 ### Sound

@@ -235,7 +243,8 @@ sudo systemctl start headmic
 headmic/
 ├── headmic.py              # Main FastAPI service
 ├── audio_stream.py         # Dual arecord streams + best-beam selection
-├── spatial.py              # Triangulation + smooth gaze tracking
+├── spatial.py              # Triangulation + ILD distance + smooth gaze + proximity
+├── spatial_scene.py        # Spatial audio scene map + anomaly detection
 ├── xvf3800.py              # USB vendor control (DoA + LEDs)
 ├── sound_id.py             # YAMNet sound classification (CPU/Edge TPU)
 ├── speaker_id.py           # Resemblyzer speaker identification
@@ -260,6 +269,8 @@ Commands use USB vendor control transfers: `wValue = cmdid`, `wIndex = resid`.
 - Read responses have a 1-byte status header before data
 - Read wLength must be `count * type_size + 1` (exact, not rounded up)
 - `DOA_VALUE` (resid=20, cmdid=18) is sluggish/cached — use `AUDIO_MGR_SELECTED_AZIMUTHS` (resid=35, cmdid=11) for real-time tracking
+- `AUDIO_MGR_SELECTED_AZIMUTHS` returns 2 floats (radians): index 0 = processed DoA (NaN = no speech = VAD indicator), index 1 = auto-select beam (always tracks strongest source)
+- `AEC_SPENERGY_VALUES` (resid=33, cmdid=80) is always zero on 2-channel firmware — don't rely on it
 - **2-channel firmware only** — 6-channel firmware silently ignores LED/control commands

 ---