Document anonymous speaker tracking + promote workflow

Added speaker identification section explaining the three-tier system
(enrolled/anonymous/unidentified), the promote workflow, and enrollment
options. Updated speakers API table with /speakers/promote endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex
2026-04-12 22:01:49 -05:00
parent 05034acd27
commit fde3b98554

View File

@@ -48,7 +48,7 @@ Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial
|---------|--------|----------|--------| |---------|--------|----------|--------|
| Wake word detection | Porcupine | CPU | Needs Picovoice key | | Wake word detection | Porcupine | CPU | Needs Picovoice key |
| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms | | Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API | | Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrolled + anonymous tracking |
| Spatial tracking | spatial.py | USB control | 3-signal fusion: DoA + ILD + ITD | | Spatial tracking | spatial.py | USB control | 3-signal fusion: DoA + ILD + ITD |
| Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) | | Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) |
| ITD processing | spatial.py | audio cross-correlation | Sub-ms delay → bearing angle | | ITD processing | spatial.py | audio cross-correlation | Sub-ms delay → bearing angle |
@@ -196,9 +196,10 @@ sudo systemctl start headmic
| Endpoint | Method | Description | | Endpoint | Method | Description |
|----------|--------|-------------| |----------|--------|-------------|
| `/speakers` | GET | List enrolled speakers | | `/speakers` | GET | List all speakers (enrolled + anonymous) |
| `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) | | `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) |
| `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) | | `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) |
| `/speakers/promote` | POST | Promote anonymous → enrolled (query: anon_id, name) |
| `/speakers/{name}` | DELETE | Remove a speaker | | `/speakers/{name}` | DELETE | Remove a speaker |
### Recording ### Recording
@@ -232,6 +233,35 @@ sudo systemctl start headmic
} }
``` ```
## Speaker Identification
Three-tier recognition using Resemblyzer 256-dim GE2E embeddings:
| Tier | Name format | How it works |
|------|-------------|-------------|
| Enrolled | `"Alex"` | Matched against stored embeddings (cosine ≥ 0.75) |
| Anonymous | `"unknown_bfa1"` | Clustered online from unrecognized voices (cosine ≥ 0.70) |
| Unidentified | `null` | Audio too short or no speech detected |
Anonymous speakers get a stable 4-character hex ID derived from their voice embedding. The same person consistently gets the same ID across observations. IDs expire after 1 hour of silence, max 10 tracked simultaneously.
**Workflow:**
```
Unknown person speaks → "unknown_bfa1" (auto-created)
You ask "who's that?" → check /speakers
curl -X POST "http://head:8446/speakers/promote?anon_id=unknown_bfa1&name=Bob"
Now recognized as "Bob" going forward (embedding saved to voices.db)
```
Alternatively, enroll directly from mic:
```bash
curl -X POST "http://head:8446/speakers/enroll-from-mic?name=Alex"
# Speak for 5 seconds
```
## LED States ## LED States
| State | Effect | Color | | State | Effect | Color |