Document anonymous speaker tracking + promote workflow
Added speaker identification section explaining the three-tier system (enrolled/anonymous/unidentified), the promote workflow, and enrollment options. Updated speakers API table with /speakers/promote endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
34
README.md
34
README.md
@@ -48,7 +48,7 @@ Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial
|
|||||||
|---------|--------|----------|--------|
|
|---------|--------|----------|--------|
|
||||||
| Wake word detection | Porcupine | CPU | Needs Picovoice key |
|
| Wake word detection | Porcupine | CPU | Needs Picovoice key |
|
||||||
| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
|
| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
|
||||||
| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
|
| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrolled + anonymous tracking |
|
||||||
| Spatial tracking | spatial.py | USB control | 3-signal fusion: DoA + ILD + ITD |
|
| Spatial tracking | spatial.py | USB control | 3-signal fusion: DoA + ILD + ITD |
|
||||||
| Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) |
|
| Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) |
|
||||||
| ITD processing | spatial.py | audio cross-correlation | Sub-ms delay → bearing angle |
|
| ITD processing | spatial.py | audio cross-correlation | Sub-ms delay → bearing angle |
|
||||||
@@ -196,9 +196,10 @@ sudo systemctl start headmic
|
|||||||
|
|
||||||
| Endpoint | Method | Description |
|
| Endpoint | Method | Description |
|
||||||
|----------|--------|-------------|
|
|----------|--------|-------------|
|
||||||
| `/speakers` | GET | List enrolled speakers |
|
| `/speakers` | GET | List all speakers (enrolled + anonymous) |
|
||||||
| `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) |
|
| `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) |
|
||||||
| `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) |
|
| `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) |
|
||||||
|
| `/speakers/promote` | POST | Promote anonymous → enrolled (query: anon_id, name) |
|
||||||
| `/speakers/{name}` | DELETE | Remove a speaker |
|
| `/speakers/{name}` | DELETE | Remove a speaker |
|
||||||
|
|
||||||
### Recording
|
### Recording
|
||||||
@@ -232,6 +233,35 @@ sudo systemctl start headmic
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Speaker Identification
|
||||||
|
|
||||||
|
Three-tier recognition using Resemblyzer 256-dim GE2E embeddings:
|
||||||
|
|
||||||
|
| Tier | Name format | How it works |
|
||||||
|
|------|-------------|-------------|
|
||||||
|
| Enrolled | `"Alex"` | Matched against stored embeddings (cosine ≥ 0.75) |
|
||||||
|
| Anonymous | `"unknown_bfa1"` | Clustered online from unrecognized voices (cosine ≥ 0.70) |
|
||||||
|
| Unidentified | `null` | Audio too short or no speech detected |
|
||||||
|
|
||||||
|
Anonymous speakers get a stable 4-character hex ID derived from their voice embedding. The same person consistently gets the same ID across observations. IDs expire after 1 hour of silence, max 10 tracked simultaneously.
|
||||||
|
|
||||||
|
**Workflow:**
|
||||||
|
```
|
||||||
|
Unknown person speaks → "unknown_bfa1" (auto-created)
|
||||||
|
↓
|
||||||
|
You ask "who's that?" → check /speakers
|
||||||
|
↓
|
||||||
|
curl -X POST "http://head:8446/speakers/promote?anon_id=unknown_bfa1&name=Bob"
|
||||||
|
↓
|
||||||
|
Now recognized as "Bob" going forward (embedding saved to voices.db)
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternatively, enroll directly from mic:
|
||||||
|
```bash
|
||||||
|
curl -X POST "http://head:8446/speakers/enroll-from-mic?name=Alex"
|
||||||
|
# Speak for 5 seconds
|
||||||
|
```
|
||||||
|
|
||||||
## LED States
|
## LED States
|
||||||
|
|
||||||
| State | Effect | Color |
|
| State | Effect | Color |
|
||||||
|
|||||||
Reference in New Issue
Block a user