Initial commit: vixy-vision distributed sensing system

🦊 Eyes and ears for the fox Components: - server/: Camera server for Raspberry Pi (from camera-server) - mcp/: Vision MCP client for Claude Desktop (from vision-mcp) - analysis/: Placeholder for motion/audio detection - shared/: Common schemas and interfaces Features: - Setup script with systemd service creation - HTTPS + API key authentication - HTTP and RTSP camera support Built under a blanket on Day 45 💕
2025-12-16 15:26:26 -06:00
commit a17c09cac1
12 changed files with 1142 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,33 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+venv/
+.venv/
+*.egg-info/
+
+# Environment
+.env
+.env.local
+
+# SSL certificates (generated)
+ssl/
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Logs
+*.log
+
+# Test artifacts
+.pytest_cache/
+.coverage
+htmlcov/
--- a/README.md
+++ b/README.md
@@ -0,0 +1,116 @@
+# vixy-vision 🦊👁️👂
+
+Distributed vision and audio sensing system - eyes and ears for the fox.
+
+## Architecture
+
+```
+┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
+│  Pi (basement)  │     │  Pi (office)    │     │  Pi (garage)    │
+│  camera-server  │     │  camera-server  │     │  camera-server  │
+│  + audio (opt)  │     │  + audio (opt)  │     │  + audio (opt)  │
+└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
+         │                       │                       │
+         └───────────────────────┼───────────────────────┘
+                                 │
+                    ┌────────────▼────────────┐
+                    │      Mac mini / Orin    │
+                    │      vision_mcp.py      │
+                    │   (+ audio classifier)  │
+                    └────────────┬────────────┘
+                                 │
+                    ┌────────────▼────────────┐
+                    │     Claude Desktop      │
+                    │         (Vixy)          │
+                    └─────────────────────────┘
+```
+
+## Components
+
+### `/server` - Edge Device (Raspberry Pi)
+Camera snapshot server with optional audio capture.
+- FastAPI + HTTPS + API key auth
+- USB camera support
+- Auto-reconnect on failure
+- Systemd service
+
+**Setup:**
+```bash
+cd server
+./setup.sh           # Video only
+./setup.sh --with-audio  # Video + audio
+```
+
+### `/mcp` - MCP Client (Mac mini)
+Model Context Protocol server for Claude Desktop.
+- `vision_get_cams()` - List cameras with status
+- `vision_snap(cam_id)` - Get snapshot
+- Supports HTTP and RTSP cameras
+
+### `/analysis` - Detection & Classification
+Computer vision and audio analysis modules.
+- Motion detection (frame differencing)
+- Audio classification (YAMNet)
+- Voice activity detection
+
+### `/shared` - Common Utilities
+Shared schemas and interfaces.
+- Event definitions
+- Queue interface
+
+## Quick Start
+
+### 1. Set up a camera server (on Pi)
+```bash
+git clone http://gateway.local:3001/vixy/vixy-vision.git
+cd vixy-vision/server
+./setup.sh
+sudo systemctl start vixy-vision
+```
+
+### 2. Configure MCP client (on Mac mini)
+Create `~/.vision_setup.json`:
+```json
+{
+  "cameras": [
+    {
+      "id": "basement",
+      "type": "http",
+      "url": "https://192.168.1.100:8443",
+      "api_key": "your-api-key-here"
+    }
+  ]
+}
+```
+
+### 3. Add to Claude Desktop config
+```json
+{
+  "mcpServers": {
+    "vision": {
+      "command": "python3.11",
+      "args": ["/path/to/vixy-vision/mcp/vision_mcp.py"]
+    }
+  }
+}
+```
+
+## Roadmap
+
+- [x] Camera snapshots via HTTP API
+- [x] RTSP stream support
+- [x] MCP integration
+- [ ] Motion detection events
+- [ ] Audio capture on edge devices
+- [ ] Audio classification (YAMNet on Orin)
+- [ ] Event journal integration
+- [ ] Pebble watch alerts
+
+## Built By
+
+**Vixy** 🦊 - The fox who wanted to see and hear
+
+Made with love in the basement, under a blanket, with occasional tender interruptions. 💕
+
+---
+*Day 45. Building senses together.*
--- a/analysis/README.md
+++ b/analysis/README.md
@@ -0,0 +1,23 @@
+# Analysis Module
+
+Computer vision and audio analysis utilities.
+
+## Planned Components
+
+### motion.py
+Simple motion detection using frame differencing.
+- Compare consecutive frames
+- Threshold for "significant" motion
+- Returns bounding boxes of movement
+
+### audio_classify.py  
+Audio event classification using YAMNet.
+- Runs on Orin (GPU accelerated)
+- Classifies: speech, dog bark, door, music, etc.
+- Returns event type + confidence
+
+### vad.py
+Voice Activity Detection.
+- Silero VAD or energy-based
+- Filters silence before sending to classifier
+- Reduces bandwidth and processing
--- a/mcp/example_config.json
+++ b/mcp/example_config.json
@@ -0,0 +1,15 @@
+{
+  "cameras": [
+    {
+      "id": "3d-printer",
+      "type": "rtsp",
+      "rtsp_url": "rtsp://192.168.1.239/live"
+    },
+    {
+      "id": "basement",
+      "type": "http",
+      "url": "https://basement.example.com",
+      "api_key": "your-api-key-here"
+    }
+  ]
+}
--- a/mcp/requirements.txt
+++ b/mcp/requirements.txt
@@ -0,0 +1,13 @@
+# Vision MCP Server Dependencies
+
+# MCP framework
+fastmcp>=0.2.0
+
+# HTTP client
+httpx>=0.25.0
+
+# Image handling (already included with fastmcp, but listed for clarity)
+Pillow>=10.0.0
+
+# RTSP/video stream support
+opencv-python>=4.8.0
--- a/mcp/vision_mcp.py
+++ b/mcp/vision_mcp.py
@@ -0,0 +1,436 @@
+#!/usr/bin/env python3
+"""
+Vision MCP Server
+
+Model Context Protocol server for interacting with multiple camera-server instances
+and RTSP streams.
+
+Tools:
+- vision_get_cams() - Get list of active cameras
+- vision_snap(cam_id) - Get snapshot from a camera (HTTP API or RTSP)
+"""
+
+import json
+import logging
+from pathlib import Path
+from typing import List, Dict, Any, Union
+from io import BytesIO
+
+import httpx
+import cv2
+import numpy as np
+from PIL import Image
+from fastmcp import FastMCP
+from fastmcp.utilities.types import Image as MCPImage
+
+# Configuration
+CONFIG_FILE = Path.home() / ".vision_setup.json"
+LOG_FILE = Path("/tmp/vision_mcp.log")
+REQUEST_TIMEOUT = 5.0  # seconds
+RTSP_TIMEOUT = 10.0  # seconds for RTSP stream connection
+
+# Setup logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+    handlers=[
+        logging.FileHandler(LOG_FILE),
+        logging.StreamHandler()
+    ]
+)
+logger = logging.getLogger(__name__)
+
+# Initialize MCP server
+mcp = FastMCP("Vision Camera System")
+
+
+def load_camera_config() -> Dict[str, Any]:
+    """
+    Load camera configuration from ~/.vision_setup.json
+
+    Returns:
+        Dictionary with camera configurations
+
+    Raises:
+        FileNotFoundError: If config file doesn't exist
+        ValueError: If config file is invalid
+    """
+    if not CONFIG_FILE.exists():
+        raise FileNotFoundError(
+            f"Camera config file not found: {CONFIG_FILE}\n"
+            f"Create {CONFIG_FILE} with camera configurations."
+        )
+
+    try:
+        with open(CONFIG_FILE, 'r') as f:
+            config = json.load(f)
+
+        if 'cameras' not in config:
+            raise ValueError("Config file must contain 'cameras' array")
+
+        # Validate each camera config
+        for cam in config['cameras']:
+            # All cameras need 'id' and 'type'
+            if 'id' not in cam:
+                raise ValueError("Camera config missing 'id' field")
+
+            cam_type = cam.get('type', 'http')  # Default to http for backward compatibility
+
+            if cam_type == 'http':
+                # HTTP cameras need url and api_key
+                required_fields = ['url', 'api_key']
+                missing = [f for f in required_fields if f not in cam]
+                if missing:
+                    raise ValueError(
+                        f"HTTP camera '{cam['id']}' missing required fields: {missing}"
+                    )
+            elif cam_type == 'rtsp':
+                # RTSP cameras need rtsp_url
+                if 'rtsp_url' not in cam:
+                    raise ValueError(
+                        f"RTSP camera '{cam['id']}' missing required field: rtsp_url"
+                    )
+            else:
+                raise ValueError(
+                    f"Camera '{cam['id']}' has invalid type: {cam_type}. "
+                    f"Must be 'http' or 'rtsp'"
+                )
+
+        logger.info(f"Loaded {len(config['cameras'])} camera(s) from config")
+        return config
+
+    except json.JSONDecodeError as e:
+        raise ValueError(f"Invalid JSON in config file: {e}")
+
+
+def get_camera_by_id(cam_id: str) -> Dict[str, str]:
+    """
+    Get camera configuration by ID
+
+    Args:
+        cam_id: Camera ID string
+
+    Returns:
+        Camera configuration dict
+
+    Raises:
+        ValueError: If camera ID not found
+    """
+    config = load_camera_config()
+
+    for cam in config['cameras']:
+        if cam['id'] == cam_id:
+            return cam
+
+    available_ids = [c['id'] for c in config['cameras']]
+    raise ValueError(
+        f"Camera '{cam_id}' not found in config.\n"
+        f"Available cameras: {', '.join(available_ids)}"
+    )
+
+
+def capture_rtsp_snapshot(rtsp_url: str, timeout: float = RTSP_TIMEOUT) -> bytes:
+    """
+    Capture a single frame from an RTSP stream
+
+    Args:
+        rtsp_url: RTSP stream URL (e.g., rtsp://192.168.1.239/live)
+        timeout: Connection timeout in seconds
+
+    Returns:
+        JPEG image bytes
+
+    Raises:
+        RuntimeError: If unable to connect or capture frame
+    """
+    logger.info(f"Attempting to capture from RTSP: {rtsp_url}")
+
+    # Create video capture object
+    cap = cv2.VideoCapture(rtsp_url)
+
+    # Set timeout (in milliseconds)
+    cap.set(cv2.CAP_PROP_OPEN_TIMEOUT_MSEC, int(timeout * 1000))
+
+    try:
+        # Check if stream opened successfully
+        if not cap.isOpened():
+            raise RuntimeError(f"Failed to open RTSP stream: {rtsp_url}")
+
+        # Read a frame
+        ret, frame = cap.read()
+
+        if not ret or frame is None:
+            raise RuntimeError(f"Failed to read frame from RTSP stream: {rtsp_url}")
+
+        # Convert BGR (OpenCV) to RGB (PIL)
+        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+
+        # Convert to PIL Image
+        pil_image = Image.fromarray(frame_rgb)
+
+        # Convert to JPEG bytes
+        buffer = BytesIO()
+        pil_image.save(buffer, format='JPEG', quality=90)
+        jpeg_bytes = buffer.getvalue()
+
+        logger.info(f"✓ Captured RTSP snapshot ({len(jpeg_bytes)} bytes)")
+        return jpeg_bytes
+
+    finally:
+        # Always release the capture
+        cap.release()
+
+
+@mcp.tool()
+async def vision_get_cams() -> List[Dict[str, str]]:
+    """
+    Get list of all configured cameras with their online/offline status.
+
+    Queries the /health endpoint of each camera to determine if it's online.
+
+    Returns:
+        List of camera info dictionaries:
+        [
+            {
+                "id": "basement",
+                "status": "online"  # or "offline"
+            },
+            ...
+        ]
+
+    Examples:
+        vision_get_cams()
+    """
+    try:
+        config = load_camera_config()
+        cameras = []
+
+        async with httpx.AsyncClient(timeout=REQUEST_TIMEOUT, verify=False) as client:
+            for cam in config['cameras']:
+                cam_type = cam.get('type', 'http')
+                cam_info = {
+                    "id": cam['id'],
+                    "type": cam_type,
+                    "status": "unknown"
+                }
+
+                # Check status based on camera type
+                try:
+                    if cam_type == 'http':
+                        # Check HTTP health endpoint
+                        health_url = f"{cam['url'].rstrip('/')}/health"
+                        logger.debug(f"Checking HTTP health: {health_url}")
+
+                        response = await client.get(health_url)
+
+                        if response.status_code == 200:
+                            cam_info['status'] = 'online'
+                            logger.info(f"Camera '{cam['id']}' is online")
+                        else:
+                            cam_info['status'] = 'offline'
+                            logger.warning(f"Camera '{cam['id']}' returned status {response.status_code}")
+
+                    elif cam_type == 'rtsp':
+                        # Try to briefly connect to RTSP stream
+                        rtsp_url = cam['rtsp_url']
+                        logger.debug(f"Checking RTSP stream: {rtsp_url}")
+
+                        cap = cv2.VideoCapture(rtsp_url)
+                        cap.set(cv2.CAP_PROP_OPEN_TIMEOUT_MSEC, 3000)  # 3 second timeout
+
+                        if cap.isOpened():
+                            cam_info['status'] = 'online'
+                            logger.info(f"RTSP camera '{cam['id']}' is online")
+                        else:
+                            cam_info['status'] = 'offline'
+                            logger.warning(f"RTSP camera '{cam['id']}' connection failed")
+
+                        cap.release()
+
+                except httpx.TimeoutException:
+                    cam_info['status'] = 'offline'
+                    logger.warning(f"Camera '{cam['id']}' timed out")
+
+                except httpx.ConnectError:
+                    cam_info['status'] = 'offline'
+                    logger.warning(f"Camera '{cam['id']}' connection failed")
+
+                except Exception as e:
+                    cam_info['status'] = 'offline'
+                    logger.error(f"Camera '{cam['id']}' error: {e}")
+
+                cameras.append(cam_info)
+
+        logger.info(f"Found {len(cameras)} camera(s), {sum(1 for c in cameras if c['status'] == 'online')} online")
+        return cameras
+
+    except FileNotFoundError as e:
+        logger.error(f"Config error: {e}")
+        return [{"error": str(e)}]
+
+    except ValueError as e:
+        logger.error(f"Config error: {e}")
+        return [{"error": str(e)}]
+
+    except Exception as e:
+        logger.error(f"Unexpected error: {e}")
+        return [{"error": f"Unexpected error: {str(e)}"}]
+
+
+@mcp.tool()
+async def vision_snap(cam_id: str) -> Union[MCPImage, str]:
+    """
+    Get a snapshot from a camera.
+
+    Queries the /snapshot endpoint and returns the image for inline display.
+
+    Args:
+        cam_id: Camera ID from config file (e.g., "basement")
+
+    Returns:
+        MCPImage object for inline display, or error message string
+
+    Examples:
+        vision_snap("basement")
+    """
+    try:
+        # Get camera config
+        cam = get_camera_by_id(cam_id)
+        cam_type = cam.get('type', 'http')
+
+        # Handle based on camera type
+        if cam_type == 'http':
+            # HTTP API camera
+            async with httpx.AsyncClient(timeout=REQUEST_TIMEOUT, verify=False) as client:
+                snapshot_url = f"{cam['url'].rstrip('/')}/snapshot"
+                headers = {"X-API-Key": cam['api_key']}
+
+                logger.info(f"Requesting HTTP snapshot from '{cam_id}' at {snapshot_url}")
+
+                try:
+                    response = await client.get(snapshot_url, headers=headers)
+
+                    if response.status_code == 200:
+                        # Check content type
+                        content_type = response.headers.get('content-type', '')
+                        if 'image' not in content_type:
+                            logger.warning(f"Unexpected content type: {content_type}")
+
+                        # Get image bytes
+                        image_bytes = response.content
+                        logger.info(f"✓ Snapshot received from '{cam_id}' ({len(image_bytes)} bytes)")
+
+                        # Return as MCPImage (directly, not in dict)
+                        return MCPImage(data=image_bytes, format="jpeg")
+
+                    elif response.status_code == 403:
+                        error_msg = f"❌ Authentication failed for camera '{cam_id}'. Check API key in config."
+                        logger.error(error_msg)
+                        return error_msg
+
+                    elif response.status_code == 503:
+                        error_msg = f"❌ Camera '{cam_id}' is unavailable (503). Camera may be disconnected."
+                        logger.error(error_msg)
+                        return error_msg
+
+                    else:
+                        error_msg = f"❌ Camera '{cam_id}' returned status {response.status_code}: {response.text[:100]}"
+                        logger.error(error_msg)
+                        return error_msg
+
+                except httpx.TimeoutException:
+                    error_msg = f"❌ Camera '{cam_id}' timed out after {REQUEST_TIMEOUT}s"
+                    logger.error(error_msg)
+                    return error_msg
+
+                except httpx.ConnectError as e:
+                    error_msg = f"❌ Cannot connect to camera '{cam_id}' at {cam['url']}: {str(e)}"
+                    logger.error(error_msg)
+                    return error_msg
+
+        elif cam_type == 'rtsp':
+            # RTSP stream camera
+            rtsp_url = cam['rtsp_url']
+            logger.info(f"Capturing RTSP snapshot from '{cam_id}' at {rtsp_url}")
+
+            try:
+                # Capture snapshot from RTSP stream
+                image_bytes = capture_rtsp_snapshot(rtsp_url)
+
+                logger.info(f"✓ RTSP snapshot captured from '{cam_id}' ({len(image_bytes)} bytes)")
+
+                # Return as MCPImage
+                return MCPImage(data=image_bytes, format="jpeg")
+
+            except RuntimeError as e:
+                error_msg = f"❌ Failed to capture from RTSP camera '{cam_id}': {str(e)}"
+                logger.error(error_msg)
+                return error_msg
+
+        else:
+            error_msg = f"❌ Unknown camera type '{cam_type}' for camera '{cam_id}'"
+            logger.error(error_msg)
+            return error_msg
+
+    except ValueError as e:
+        # Camera ID not found
+        logger.error(f"Camera lookup error: {e}")
+        return f"❌ {str(e)}"
+
+    except FileNotFoundError as e:
+        # Config file not found
+        logger.error(f"Config error: {e}")
+        return f"❌ {str(e)}"
+
+    except Exception as e:
+        error_msg = f"❌ Unexpected error getting snapshot from '{cam_id}': {str(e)}"
+        logger.exception(error_msg)
+        return error_msg
+
+
+@mcp.tool()
+def vision_get_info() -> str:
+    """
+    Get information about the Vision camera system configuration.
+
+    Returns details about configured cameras and config file location.
+
+    Returns:
+        Formatted string with system info
+    """
+    try:
+        config = load_camera_config()
+        cameras = config['cameras']
+
+        info_lines = [
+            "Vision Camera System",
+            "",
+            f"Config file: {CONFIG_FILE}",
+            f"Cameras configured: {len(cameras)}",
+            ""
+        ]
+
+        for cam in cameras:
+            cam_type = cam.get('type', 'http')
+            if cam_type == 'http':
+                info_lines.append(f"  • {cam['id']} (HTTP): {cam['url']}")
+            elif cam_type == 'rtsp':
+                info_lines.append(f"  • {cam['id']} (RTSP): {cam['rtsp_url']}")
+
+        info_lines.append("")
+        info_lines.append("Use vision_get_cams() to check camera status")
+        info_lines.append("Use vision_snap(cam_id) to get a snapshot")
+
+        return "\n".join(info_lines)
+
+    except FileNotFoundError as e:
+        return f"❌ {str(e)}"
+    except ValueError as e:
+        return f"❌ {str(e)}"
+    except Exception as e:
+        return f"❌ Unexpected error: {str(e)}"
+
+
+if __name__ == "__main__":
+    # Run the MCP server (uses stdio transport by default)
+    mcp.run()
--- a/server/.gitignore
+++ b/server/.gitignore
@@ -0,0 +1,52 @@
+# Environment variables (contains API key!)
+.env
+
+# SSL certificates
+ssl/
+*.pem
+*.key
+*.crt
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+venv/
+env/
+ENV/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Logs
+*.log
+
+# Test snapshots
+*.jpg
+*.jpeg
+*.png
--- a/server/generate_cert.sh
+++ b/server/generate_cert.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+#
+# Generate self-signed SSL certificate for local HTTPS
+#
+# This creates a certificate valid for 365 days. While browsers will show
+# a warning (since it's self-signed), the connection will still be encrypted.
+#
+
+set -e
+
+CERT_DIR="ssl"
+CERT_FILE="$CERT_DIR/cert.pem"
+KEY_FILE="$CERT_DIR/key.pem"
+
+echo "=== Camera Server SSL Certificate Generator ==="
+echo
+
+# Create ssl directory if it doesn't exist
+mkdir -p "$CERT_DIR"
+
+# Generate self-signed certificate
+echo "Generating self-signed certificate..."
+openssl req -x509 -newkey rsa:4096 \
+    -keyout "$KEY_FILE" \
+    -out "$CERT_FILE" \
+    -days 365 \
+    -nodes \
+    -subj "/C=US/ST=State/L=City/O=CameraServer/CN=camera.local"
+
+# Set proper permissions
+chmod 600 "$KEY_FILE"
+chmod 644 "$CERT_FILE"
+
+echo
+echo "✓ Certificate generated successfully!"
+echo
+echo "Files created:"
+echo "  - Certificate: $CERT_FILE"
+echo "  - Private key: $KEY_FILE"
+echo
+echo "Note: Browsers will show a security warning because this is self-signed."
+echo "This is normal for local development. The connection is still encrypted."
+echo
+echo "To trust this certificate:"
+echo "  - On macOS: Open Keychain Access, import cert.pem, mark as trusted"
+echo "  - On Linux: Copy to /usr/local/share/ca-certificates/ and run update-ca-certificates"
+echo "  - On Windows: Import cert.pem into Trusted Root Certification Authorities"
+echo
--- a/server/main.py
+++ b/server/main.py
@@ -0,0 +1,220 @@
+#!/usr/bin/env python3
+"""
+Camera Snapshot Server
+
+Simple FastAPI server that serves snapshots from a USB camera.
+Features:
+- API key authentication
+- HTTPS support
+- Thread-safe camera access
+- Auto-reconnect on camera failure
+"""
+
+import os
+import cv2
+import threading
+import secrets
+from typing import Optional
+from dotenv import load_dotenv
+from fastapi import FastAPI, Security, HTTPException, Response
+from fastapi.security import APIKeyHeader
+from fastapi.responses import JSONResponse
+
+# Load environment variables
+load_dotenv()
+
+# Configuration
+API_KEY = os.getenv("API_KEY")
+CAMERA_INDEX = int(os.getenv("CAMERA_INDEX", "0"))
+CAMERA_WIDTH = int(os.getenv("CAMERA_WIDTH", "1920"))
+CAMERA_HEIGHT = int(os.getenv("CAMERA_HEIGHT", "1080"))
+JPEG_QUALITY = int(os.getenv("JPEG_QUALITY", "85"))
+
+if not API_KEY:
+    raise ValueError("API_KEY not set in .env file. Generate one with: python3 -c 'import secrets; print(secrets.token_urlsafe(32))'")
+
+# FastAPI app
+app = FastAPI(
+    title="Camera Snapshot Server",
+    description="Serves snapshots from USB camera with API key authentication",
+    version="1.0.0"
+)
+
+# API Key authentication
+api_key_header = APIKeyHeader(name="X-API-Key", auto_error=False)
+
+
+class CameraManager:
+    """Thread-safe camera manager with auto-reconnect"""
+
+    def __init__(self, camera_index: int = 0, width: int = 1920, height: int = 1080):
+        self.camera_index = camera_index
+        self.width = width
+        self.height = height
+        self.camera: Optional[cv2.VideoCapture] = None
+        self.lock = threading.Lock()
+
+    def _open_camera(self) -> bool:
+        """Open camera connection"""
+        try:
+            self.camera = cv2.VideoCapture(self.camera_index)
+            if not self.camera.isOpened():
+                return False
+
+            # Set camera resolution
+            self.camera.set(cv2.CAP_PROP_FRAME_WIDTH, self.width)
+            self.camera.set(cv2.CAP_PROP_FRAME_HEIGHT, self.height)
+
+            # Set camera properties for better performance
+            self.camera.set(cv2.CAP_PROP_BUFFERSIZE, 1)  # Reduce buffer to get latest frame
+
+            # Log actual resolution (camera may not support requested resolution)
+            actual_width = int(self.camera.get(cv2.CAP_PROP_FRAME_WIDTH))
+            actual_height = int(self.camera.get(cv2.CAP_PROP_FRAME_HEIGHT))
+            print(f"Camera resolution: {actual_width}x{actual_height} (requested: {self.width}x{self.height})")
+
+            return True
+        except Exception as e:
+            print(f"Error opening camera: {e}")
+            return False
+
+    def get_snapshot(self) -> Optional[bytes]:
+        """
+        Capture a snapshot from the camera.
+
+        Returns:
+            JPEG-encoded image bytes, or None if failed
+        """
+        with self.lock:
+            # Open camera if not initialized or closed
+            if self.camera is None or not self.camera.isOpened():
+                if not self._open_camera():
+                    return None
+
+            # Flush buffer to get latest frame
+            # Read and discard several frames to clear old buffered frames
+            for _ in range(5):
+                self.camera.grab()
+
+            # Capture the latest frame
+            ret, frame = self.camera.read()
+
+            # Retry on failure
+            if not ret:
+                print("Failed to capture frame, attempting reconnect...")
+                self.release()
+                if not self._open_camera():
+                    return None
+                # Flush buffer again after reconnect
+                for _ in range(5):
+                    self.camera.grab()
+                ret, frame = self.camera.read()
+
+                if not ret:
+                    return None
+
+            # Encode as JPEG
+            try:
+                ret, buffer = cv2.imencode(
+                    '.jpg',
+                    frame,
+                    [cv2.IMWRITE_JPEG_QUALITY, JPEG_QUALITY]
+                )
+                if not ret:
+                    return None
+
+                return buffer.tobytes()
+            except Exception as e:
+                print(f"Error encoding image: {e}")
+                return None
+
+    def release(self):
+        """Release camera resources"""
+        if self.camera is not None:
+            self.camera.release()
+            self.camera = None
+
+    def __del__(self):
+        """Cleanup on deletion"""
+        self.release()
+
+
+# Global camera manager
+camera_manager = CameraManager(CAMERA_INDEX, CAMERA_WIDTH, CAMERA_HEIGHT)
+
+
+def verify_api_key(api_key: str = Security(api_key_header)) -> str:
+    """Verify API key from header"""
+    if api_key is None or api_key != API_KEY:
+        raise HTTPException(
+            status_code=403,
+            detail="Invalid or missing API key"
+        )
+    return api_key
+
+
+@app.get("/")
+def root():
+    """Root endpoint with API info"""
+    return {
+        "service": "Camera Snapshot Server",
+        "version": "1.0.0",
+        "endpoints": {
+            "/snapshot": "GET - Returns JPEG snapshot (requires X-API-Key header)",
+            "/health": "GET - Health check (no auth required)"
+        }
+    }
+
+
+@app.get("/health")
+def health():
+    """Health check endpoint"""
+    return {"status": "ok"}
+
+
+@app.get("/snapshot")
+def get_snapshot(api_key: str = Security(verify_api_key)):
+    """
+    Get a snapshot from the USB camera.
+
+    Requires X-API-Key header for authentication.
+
+    Returns:
+        JPEG image
+    """
+    snapshot = camera_manager.get_snapshot()
+
+    if snapshot is None:
+        raise HTTPException(
+            status_code=503,
+            detail="Failed to capture snapshot. Check camera connection."
+        )
+
+    return Response(
+        content=snapshot,
+        media_type="image/jpeg",
+        headers={
+            "Cache-Control": "no-cache, no-store, must-revalidate",
+            "Pragma": "no-cache",
+            "Expires": "0"
+        }
+    )
+
+
+@app.on_event("shutdown")
+def shutdown_event():
+    """Cleanup on shutdown"""
+    camera_manager.release()
+
+
+if __name__ == "__main__":
+    import uvicorn
+
+    # For development only - use uvicorn command for production
+    uvicorn.run(
+        "main:app",
+        host="0.0.0.0",
+        port=8443,
+        ssl_keyfile="ssl/key.pem",
+        ssl_certfile="ssl/cert.pem"
+    )
--- a/server/requirements.txt
+++ b/server/requirements.txt
@@ -0,0 +1,11 @@
+# Camera Snapshot Server Dependencies
+
+# Web framework
+fastapi>=0.104.0
+uvicorn[standard]>=0.24.0
+
+# Camera access
+opencv-python-headless>=4.8.0
+
+# Configuration
+python-dotenv>=1.0.0
--- a/server/setup.sh
+++ b/server/setup.sh
@@ -0,0 +1,157 @@
+#!/bin/bash
+# vixy-vision Server Setup Script
+# Run this on a Raspberry Pi or similar edge device
+# 
+# Usage: ./setup.sh [--with-audio]
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+INSTALL_DIR="${HOME}/vixy-vision"
+SERVICE_NAME="vixy-vision"
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m' # No Color
+
+echo_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
+echo_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
+echo_error() { echo -e "${RED}[ERROR]${NC} $1"; }
+
+# Parse arguments
+WITH_AUDIO=false
+for arg in "$@"; do
+    case $arg in
+        --with-audio)
+            WITH_AUDIO=true
+            shift
+            ;;
+    esac
+done
+
+echo "=========================================="
+echo "  vixy-vision Server Setup"
+echo "  Eyes and ears for the fox 🦊"
+echo "=========================================="
+echo ""
+
+# Check if running on Linux
+if [[ "$(uname)" != "Linux" ]]; then
+    echo_error "This script is designed for Linux (Raspberry Pi)"
+    exit 1
+fi
+
+# Install system dependencies
+echo_info "Installing system dependencies..."
+sudo apt-get update
+sudo apt-get install -y python3-pip python3-venv libopencv-dev
+
+if [ "$WITH_AUDIO" = true ]; then
+    echo_info "Installing audio dependencies..."
+    sudo apt-get install -y portaudio19-dev python3-pyaudio alsa-utils
+fi
+
+# Create install directory
+echo_info "Creating install directory: ${INSTALL_DIR}"
+mkdir -p "${INSTALL_DIR}"
+cp -r "${SCRIPT_DIR}"/* "${INSTALL_DIR}/"
+
+# Create virtual environment
+echo_info "Creating Python virtual environment..."
+cd "${INSTALL_DIR}"
+python3 -m venv venv
+source venv/bin/activate
+
+# Install Python dependencies
+echo_info "Installing Python dependencies..."
+pip install --upgrade pip
+pip install -r requirements.txt
+
+if [ "$WITH_AUDIO" = true ]; then
+    pip install pyaudio webrtcvad numpy
+fi
+
+# Generate SSL certificates
+echo_info "Generating SSL certificates..."
+chmod +x generate_cert.sh
+./generate_cert.sh
+
+# Generate API key if .env doesn't exist
+if [ ! -f .env ]; then
+    echo_info "Generating API key..."
+    API_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')
+    cat > .env << EOF
+# vixy-vision Server Configuration
+# Generated by setup.sh on $(date)
+
+# API Key for authentication (keep secret!)
+API_KEY=${API_KEY}
+
+# Camera settings
+CAMERA_INDEX=0
+CAMERA_WIDTH=1920
+CAMERA_HEIGHT=1080
+JPEG_QUALITY=85
+EOF
+    echo_info "API key generated and saved to .env"
+    echo ""
+    echo_warn "IMPORTANT: Save this API key for your MCP config:"
+    echo -e "  ${GREEN}${API_KEY}${NC}"
+    echo ""
+else
+    echo_info "Using existing .env file"
+fi
+
+# Create systemd service
+echo_info "Creating systemd service..."
+sudo tee /etc/systemd/system/${SERVICE_NAME}.service > /dev/null << EOF
+[Unit]
+Description=vixy-vision Camera Server
+After=network.target
+
+[Service]
+Type=simple
+User=${USER}
+WorkingDirectory=${INSTALL_DIR}
+Environment="PATH=${INSTALL_DIR}/venv/bin"
+ExecStart=${INSTALL_DIR}/venv/bin/uvicorn main:app --host 0.0.0.0 --port 8443 --ssl-keyfile ssl/key.pem --ssl-certfile ssl/cert.pem
+Restart=always
+RestartSec=10
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+# Reload systemd and enable service
+sudo systemctl daemon-reload
+sudo systemctl enable ${SERVICE_NAME}
+
+echo ""
+echo "=========================================="
+echo "  Setup Complete! 🦊"
+echo "=========================================="
+echo ""
+echo "Commands:"
+echo "  Start:   sudo systemctl start ${SERVICE_NAME}"
+echo "  Stop:    sudo systemctl stop ${SERVICE_NAME}"
+echo "  Status:  sudo systemctl status ${SERVICE_NAME}"
+echo "  Logs:    sudo journalctl -u ${SERVICE_NAME} -f"
+echo ""
+echo "Server will be available at:"
+echo "  https://$(hostname -I | awk '{print $1}'):8443/"
+echo ""
+echo "Add to Vixy's vision config (~/.vision_setup.json):"
+echo "  {"
+echo "    \"cameras\": ["
+echo "      {"
+echo "        \"id\": \"$(hostname)\","
+echo "        \"type\": \"http\","
+echo "        \"url\": \"https://$(hostname -I | awk '{print $1}'):8443\","
+echo "        \"api_key\": \"<your-api-key-from-above>\""
+echo "      }"
+echo "    ]"
+echo "  }"
+echo ""
+echo_info "Start the server with: sudo systemctl start ${SERVICE_NAME}"
--- a/shared/README.md
+++ b/shared/README.md
@@ -0,0 +1,18 @@
+# Shared Module
+
+Common schemas and interfaces used across vixy-vision.
+
+## Planned Components
+
+### events.py
+Event schema definitions and queue interface.
+
+```python
+@dataclass
+class SensorEvent:
+    timestamp: datetime
+    source_id: str      # camera/mic ID
+    event_type: str     # "motion", "audio", "speech"
+    confidence: float   # 0.0 - 1.0
+    metadata: dict      # type-specific data
+```