Document anonymous speaker tracking + promote workflow

Added speaker identification section explaining the three-tier system (enrolled/anonymous/unidentified), the promote workflow, and enrollment options. Updated speakers API table with /speakers/promote endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 22:01:49 -05:00
parent 05034acd27
commit fde3b98554
1 changed files with 32 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -48,7 +48,7 @@ Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial
 |---------|--------|----------|--------|
 | Wake word detection | Porcupine | CPU | Needs Picovoice key |
 | Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
-| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
+| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrolled + anonymous tracking |
 | Spatial tracking | spatial.py | USB control | 3-signal fusion: DoA + ILD + ITD |
 | Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) |
 | ITD processing | spatial.py | audio cross-correlation | Sub-ms delay → bearing angle |
@@ -196,9 +196,10 @@ sudo systemctl start headmic
 | Endpoint | Method | Description |
 |----------|--------|-------------|
-| `/speakers` | GET | List enrolled speakers |
+| `/speakers` | GET | List all speakers (enrolled + anonymous) |
 | `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) |
 | `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) |
 | `/speakers/promote` | POST | Promote anonymous → enrolled (query: anon_id, name) |
 | `/speakers/{name}` | DELETE | Remove a speaker |
 ### Recording
@@ -232,6 +233,35 @@ sudo systemctl start headmic
 }
 ```
 ## Speaker Identification
 Three-tier recognition using Resemblyzer 256-dim GE2E embeddings:
 | Tier | Name format | How it works |
 |------|-------------|-------------|
 | Enrolled | `"Alex"` | Matched against stored embeddings (cosine ≥ 0.75) |
 | Anonymous | `"unknown_bfa1"` | Clustered online from unrecognized voices (cosine ≥ 0.70) |
 | Unidentified | `null` | Audio too short or no speech detected |
 Anonymous speakers get a stable 4-character hex ID derived from their voice embedding. The same person consistently gets the same ID across observations. IDs expire after 1 hour of silence, max 10 tracked simultaneously.
 **Workflow:**
 ```
 Unknown person speaks → "unknown_bfa1" (auto-created)
    ↓
 You ask "who's that?" → check /speakers
    ↓
 curl -X POST "http://head:8446/speakers/promote?anon_id=unknown_bfa1&name=Bob"
    ↓
 Now recognized as "Bob" going forward (embedding saved to voices.db)
 ```
 Alternatively, enroll directly from mic:
 ```bash
 curl -X POST "http://head:8446/speakers/enroll-from-mic?name=Alex"
 # Speak for 5 seconds
 ```
 ## LED States
 | State | Effect | Color |