Exploring techniques to detect faces in real-time using MediaPipe and TypeScript.

Realtime face detection on browser using MediaPipe


In this note, we are going to explore how to detect faces in real-time using MediaPipe and TypeScript. MediaPipe is a powerful framework developed by Google that provides pre-built solutions for various computer vision tasks, including face detection.

It is designed to run efficiently on both mobile and desktop devices including web browsers, making it a great choice for real-time applications on web apps. We will set up a simple TypeScript project that uses MediaPipe to detect faces from the webcam feed.

Demo

MediaPipe Library

MediaPipe Face Detection is a high-performance solution that detects faces in images and video. It uses a machine learning model optimized for real-time performance, capable of detecting multiple faces with facial landmarks (eyes, nose, mouth, and ears).

The library provides two model options: “short-range” for faces within 2 meters and “full-range” for faces up to 5 meters. For web applications, MediaPipe runs entirely in the browser using WebAssembly and GPU acceleration, ensuring privacy and eliminating server round-trips.

Installation and Setup

First, install the MediaPipe Face Detection package in your React/TypeScript project:

npm install @mediapipe/face_detection
# or
bun add @mediapipe/face_detection

MediaPipe loads multiple files at runtime: WebAssembly binaries (.wasm), JavaScript files (.js), model files (.tflite), and configuration files (.binarypb). The npm package includes all these files, but they must be served over HTTP due to how MediaPipe dynamically loads them based on browser capabilities.

Important: Modern bundlers like Vite don’t automatically handle these files because:

  1. MediaPipe uses dynamic imports with runtime-constructed URLs
  2. WASM files require special handling (served with correct MIME type)
  3. The library selects files based on browser features (SIMD support, etc.)

You have three options for serving these files:

Option 1: Use a CDN (Simplest)

const faceDetection = new FaceDetection({
  locateFile: (file) => {
    return `https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/${file}`;
  }
});

Option 2: Serve Files Locally

For Vite, install vite-plugin-static-copy:

npm install -D vite-plugin-static-copy

Then configure Vite to copy MediaPipe files:

// vite.config.js
import { defineConfig } from 'vite'
import { viteStaticCopy } from 'vite-plugin-static-copy'

export default defineConfig({
  plugins: [
    viteStaticCopy({
      targets: [{
        src: 'node_modules/@mediapipe/face_detection/*',
        dest: 'mediapipe/face_detection'
      }]
    })
  ]
})

For webpack, use copy-webpack-plugin:

// webpack.config.js
const CopyPlugin = require("copy-webpack-plugin");

module.exports = {
  plugins: [
    new CopyPlugin({
      patterns: [{
        from: 'node_modules/@mediapipe/face_detection',
        to: 'mediapipe/face_detection',
        globOptions: {
          ignore: ['**/README.md', '**/package.json', '**/*.d.ts']
        }
      }]
    })
  ]
};

Then update your locateFile:

const faceDetection = new FaceDetection({
  locateFile: (file) => `/mediapipe/face_detection/${file}`
});

Creating a Webcam Hook

Let’s start by creating a custom React hook to manage webcam access. This hook handles permissions, device enumeration, and stream management:

// useCamera.ts
import { useState, useCallback, useRef } from 'react';

export const useCamera = () => {
  const [hasPermission, setHasPermission] = useState<boolean | null>(null);
  const [cameras, setCameras] = useState<MediaDeviceInfo[]>([]);
  const [selectedCamera, setSelectedCamera] = useState('');
  const streamRef = useRef<MediaStream | null>(null);

  const loadDevices = useCallback(async () => {
    try {
      const devices = await navigator.mediaDevices.enumerateDevices();
      const videoDevices = devices.filter(d => d.kind === 'videoinput');
      setCameras(videoDevices);
      if (videoDevices.length > 0 && !selectedCamera) {
        setSelectedCamera(videoDevices[0].deviceId);
      }
    } catch (err) {
      console.error('Failed to enumerate devices:', err);
    }
  }, [selectedCamera]);

  const requestCameraStream = useCallback(async (deviceId: string) => {
    try {
      const stream = await navigator.mediaDevices.getUserMedia({
        video: { deviceId: { exact: deviceId } },
        audio: false
      });
      streamRef.current = stream;
      setHasPermission(true);
      return stream;
    } catch (err) {
      setHasPermission(false);
      throw err;
    }
  }, []);

  const stopStream = useCallback(() => {
    if (streamRef.current) {
      streamRef.current.getTracks().forEach(track => track.stop());
      streamRef.current = null;
    }
  }, []);

  return {
    cameras,
    selectedCamera,
    setSelectedCamera,
    hasPermission,
    loadDevices,
    requestCameraStream,
    stopStream
  };
};

This hook provides a clean interface for camera management, handling device selection and stream lifecycle. It automatically enumerates available cameras and manages permissions state.

Creating a MediaPipe Hook

The MediaPipe hook manages the face detection pipeline with the following behavior:

  1. Initialization: Loads the ML model and WebAssembly runtime
  2. Frame Processing: Receives video frames and sends them to MediaPipe
  3. Results Handling: Receives detection results with face locations and landmarks

Understanding the Detection Results

MediaPipe returns results in this structure:

interface Detection {
  boundingBox: {
    xCenter: number;  // 0-1 normalized coordinates
    yCenter: number;
    width: number;
    height: number;
  };
  landmarks?: Array<{
    x: number;      // 0-1 normalized coordinates
    y: number;
    z?: number;     // depth (not used in 2D detection)
  }>;
  score: number[];  // confidence scores (0-1)
}

interface FaceDetectionResults {
  image: HTMLCanvasElement | HTMLVideoElement;
  detections: Detection[];
}

The landmarks represent 6 key facial points:

  • Index 0: Right eye
  • Index 1: Left eye
  • Index 2: Nose tip
  • Index 3: Mouth center
  • Index 4: Right ear
  • Index 5: Left ear

Hook Implementation

Our hook encapsulates this complexity and provides a simple interface:

// useMediaPipe.ts
import { useState, useCallback, useRef } from 'react';
import { FaceDetection } from '@mediapipe/face_detection';

export const useMediaPipe = () => {
  const [isReady, setIsReady] = useState(false);
  const [facesDetected, setFacesDetected] = useState(0);
  const faceDetectionRef = useRef<FaceDetection | null>(null);

  const initialize = useCallback(async () => {
    // Create FaceDetection instance with file loading configuration
    const faceDetection = new FaceDetection({
      locateFile: (file) => {
        return `https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/${file}`;
      }
    });

    // Load the model and WebAssembly runtime
    await faceDetection.initialize();
    
    // Configure detection parameters
    faceDetection.setOptions({
      model: 'short_range',        // 'short_range' (2m) or 'full_range' (5m)
      minDetectionConfidence: 0.5  // 0-1 confidence threshold
    });

    // Set up the results callback - called for every processed frame
    faceDetection.onResults((results) => {
      // Results contain the original image and array of detections
      const detections = results.detections || [];
      setFacesDetected(detections.length);
      
      // Pass results to a global handler for canvas drawing
      // (In production, use a proper callback or state management)
      if (window.onFaceDetectionResults) {
        window.onFaceDetectionResults(results);
      }
    });

    faceDetectionRef.current = faceDetection;
    setIsReady(true);
  }, []);

  // Process a video frame - MediaPipe handles this asynchronously
  const processFrame = useCallback((video: HTMLVideoElement) => {
    if (faceDetectionRef.current && isReady) {
      // Send frame to MediaPipe - results arrive via onResults callback
      faceDetectionRef.current.send({ image: video });
    }
  }, [isReady]);

  return {
    isReady,         // Model loaded and ready to process
    facesDetected,   // Current count of detected faces
    initialize,      // Call once to set up MediaPipe
    processFrame     // Call repeatedly with video frames
  };
};

This hook encapsulates MediaPipe initialization and configuration. It handles model loading, sets detection parameters, and provides a simple interface for processing video frames.

Processing Flow

The face detection pipeline follows an asynchronous pattern:

1. Video Frame → processFrame(video)
2. MediaPipe processes frame in WebAssembly
3. Results arrive via onResults callback
4. Callback updates state and triggers canvas drawing

Key points:

  • Asynchronous Processing: send() returns immediately; results arrive later via callback
  • Normalized Coordinates: All positions are 0-1 values; multiply by canvas dimensions for pixel coordinates
  • Frame Rate: MediaPipe processes as fast as possible; consider throttling for performance

Building the Face Detection Component

Now let’s create a simple React component that ties everything together:

// FaceDetection.tsx
import React, { useRef, useEffect, useCallback } from 'react';
import { useCamera } from './useCamera';
import { useMediaPipe } from './useMediaPipe';

const FaceDetection: React.FC = () => {
  const videoRef = useRef<HTMLVideoElement>(null);
  const canvasRef = useRef<HTMLCanvasElement>(null);
  const animationRef = useRef<number>();
  
  const { 
    cameras, 
    selectedCamera, 
    requestCameraStream, 
    stopStream 
  } = useCamera();
  
  const { 
    isReady, 
    facesDetected, 
    initialize, 
    processFrame 
  } = useMediaPipe();

  // Initialize MediaPipe on mount
  useEffect(() => {
    initialize();
    return () => stopStream();
  }, [initialize, stopStream]);

  // Handle face detection results
  useEffect(() => {
    window.onFaceDetectionResults = (results) => {
      const canvas = canvasRef.current;
      const ctx = canvas?.getContext('2d');
      if (!ctx || !canvas) return;

      // Clear canvas and draw video frame
      ctx.clearRect(0, 0, canvas.width, canvas.height);
      ctx.drawImage(results.image, 0, 0);

      // Draw detection boxes
      for (const detection of results.detections) {
        const box = detection.boundingBox;
        const x = box.xCenter - box.width / 2;
        const y = box.yCenter - box.height / 2;

        ctx.strokeStyle = '#00FF00';
        ctx.lineWidth = 3;
        ctx.strokeRect(
          x * canvas.width,
          y * canvas.height,
          box.width * canvas.width,
          box.height * canvas.height
        );
      }
    };
  }, []);

  // Process video frames
  const processVideo = useCallback(() => {
    if (videoRef.current && isReady) {
      processFrame(videoRef.current);
    }
    animationRef.current = requestAnimationFrame(processVideo);
  }, [processFrame, isReady]);

  // Start camera
  const startCamera = async () => {
    try {
      const stream = await requestCameraStream(selectedCamera);
      if (videoRef.current) {
        videoRef.current.srcObject = stream;
        await videoRef.current.play();
        processVideo();
      }
    } catch (err) {
      console.error('Failed to start camera:', err);
    }
  };

  return (
    <div>
      <button onClick={startCamera}>Start Detection</button>
      <p>Faces detected: {facesDetected}</p>
      
      <div style={{ position: 'relative' }}>
        <video 
          ref={videoRef} 
          style={{ display: 'none' }}
          playsInline 
        />
        <canvas 
          ref={canvasRef}
          width={640}
          height={480}
          style={{ maxWidth: '100%' }}
        />
      </div>
    </div>
  );
};

This component combines the camera and MediaPipe hooks to create a functional face detection interface. The video element captures the webcam feed, while the canvas displays the processed output with detection boxes.

Implementation Pipeline

The following diagram illustrates how the face detection pipeline works in the browser:

Yes

No

User Clicks Start

Request Camera Permission

Permission Granted?

Get Video Stream

Show Error

Initialize MediaPipe

Load ML Models

Start Video Playback

Animation Loop

Capture Video Frame

Send to MediaPipe

ML Processing

Detect Faces

Return Results

Draw on Canvas

Update UI

Implementation Summary

Our face detection implementation follows a modular approach with clear separation of concerns:

  1. Camera Management: The useCamera hook handles all webcam-related operations, including permission requests, device enumeration, and stream management. This abstraction makes it easy to switch cameras or handle permission errors.

  2. MediaPipe Integration: The useMediaPipe hook encapsulates the MediaPipe library initialization and configuration. It provides a simple interface for processing frames and handles the asynchronous model loading process.

  3. Rendering Pipeline: The main component orchestrates the video capture and canvas rendering. It uses requestAnimationFrame for smooth real-time processing and draws detection results directly on the canvas for immediate visual feedback.

This architecture ensures clean code organization, reusable components, and efficient real-time processing suitable for production applications.