
Realtime face detection on browser using MediaPipe
In this note, we are going to explore how to detect faces in real-time using MediaPipe and TypeScript. MediaPipe is a powerful framework developed by Google that provides pre-built solutions for various computer vision tasks, including face detection.
It is designed to run efficiently on both mobile and desktop devices including web browsers, making it a great choice for real-time applications on web apps. We will set up a simple TypeScript project that uses MediaPipe to detect faces from the webcam feed.
Demo
MediaPipe Library
MediaPipe Face Detection is a high-performance solution that detects faces in images and video. It uses a machine learning model optimized for real-time performance, capable of detecting multiple faces with facial landmarks (eyes, nose, mouth, and ears).
The library provides two model options: “short-range” for faces within 2 meters and “full-range” for faces up to 5 meters. For web applications, MediaPipe runs entirely in the browser using WebAssembly and GPU acceleration, ensuring privacy and eliminating server round-trips.
Installation and Setup
First, install the MediaPipe Face Detection package in your React/TypeScript project:
npm install @mediapipe/face_detection
# or
bun add @mediapipe/face_detection
MediaPipe loads multiple files at runtime: WebAssembly binaries (.wasm), JavaScript files (.js), model files (.tflite), and configuration files (.binarypb). The npm package includes all these files, but they must be served over HTTP due to how MediaPipe dynamically loads them based on browser capabilities.
Important: Modern bundlers like Vite don’t automatically handle these files because:
- MediaPipe uses dynamic imports with runtime-constructed URLs
- WASM files require special handling (served with correct MIME type)
- The library selects files based on browser features (SIMD support, etc.)
You have three options for serving these files:
Option 1: Use a CDN (Simplest)
const faceDetection = new FaceDetection({
locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/${file}`;
}
});
Option 2: Serve Files Locally
For Vite, install vite-plugin-static-copy
:
npm install -D vite-plugin-static-copy
Then configure Vite to copy MediaPipe files:
// vite.config.js
import { defineConfig } from 'vite'
import { viteStaticCopy } from 'vite-plugin-static-copy'
export default defineConfig({
plugins: [
viteStaticCopy({
targets: [{
src: 'node_modules/@mediapipe/face_detection/*',
dest: 'mediapipe/face_detection'
}]
})
]
})
For webpack, use copy-webpack-plugin
:
// webpack.config.js
const CopyPlugin = require("copy-webpack-plugin");
module.exports = {
plugins: [
new CopyPlugin({
patterns: [{
from: 'node_modules/@mediapipe/face_detection',
to: 'mediapipe/face_detection',
globOptions: {
ignore: ['**/README.md', '**/package.json', '**/*.d.ts']
}
}]
})
]
};
Then update your locateFile:
const faceDetection = new FaceDetection({
locateFile: (file) => `/mediapipe/face_detection/${file}`
});
Creating a Webcam Hook
Let’s start by creating a custom React hook to manage webcam access. This hook handles permissions, device enumeration, and stream management:
// useCamera.ts
import { useState, useCallback, useRef } from 'react';
export const useCamera = () => {
const [hasPermission, setHasPermission] = useState<boolean | null>(null);
const [cameras, setCameras] = useState<MediaDeviceInfo[]>([]);
const [selectedCamera, setSelectedCamera] = useState('');
const streamRef = useRef<MediaStream | null>(null);
const loadDevices = useCallback(async () => {
try {
const devices = await navigator.mediaDevices.enumerateDevices();
const videoDevices = devices.filter(d => d.kind === 'videoinput');
setCameras(videoDevices);
if (videoDevices.length > 0 && !selectedCamera) {
setSelectedCamera(videoDevices[0].deviceId);
}
} catch (err) {
console.error('Failed to enumerate devices:', err);
}
}, [selectedCamera]);
const requestCameraStream = useCallback(async (deviceId: string) => {
try {
const stream = await navigator.mediaDevices.getUserMedia({
video: { deviceId: { exact: deviceId } },
audio: false
});
streamRef.current = stream;
setHasPermission(true);
return stream;
} catch (err) {
setHasPermission(false);
throw err;
}
}, []);
const stopStream = useCallback(() => {
if (streamRef.current) {
streamRef.current.getTracks().forEach(track => track.stop());
streamRef.current = null;
}
}, []);
return {
cameras,
selectedCamera,
setSelectedCamera,
hasPermission,
loadDevices,
requestCameraStream,
stopStream
};
};
This hook provides a clean interface for camera management, handling device selection and stream lifecycle. It automatically enumerates available cameras and manages permissions state.
Creating a MediaPipe Hook
The MediaPipe hook manages the face detection pipeline with the following behavior:
- Initialization: Loads the ML model and WebAssembly runtime
- Frame Processing: Receives video frames and sends them to MediaPipe
- Results Handling: Receives detection results with face locations and landmarks
Understanding the Detection Results
MediaPipe returns results in this structure:
interface Detection {
boundingBox: {
xCenter: number; // 0-1 normalized coordinates
yCenter: number;
width: number;
height: number;
};
landmarks?: Array<{
x: number; // 0-1 normalized coordinates
y: number;
z?: number; // depth (not used in 2D detection)
}>;
score: number[]; // confidence scores (0-1)
}
interface FaceDetectionResults {
image: HTMLCanvasElement | HTMLVideoElement;
detections: Detection[];
}
The landmarks represent 6 key facial points:
- Index 0: Right eye
- Index 1: Left eye
- Index 2: Nose tip
- Index 3: Mouth center
- Index 4: Right ear
- Index 5: Left ear
Hook Implementation
Our hook encapsulates this complexity and provides a simple interface:
// useMediaPipe.ts
import { useState, useCallback, useRef } from 'react';
import { FaceDetection } from '@mediapipe/face_detection';
export const useMediaPipe = () => {
const [isReady, setIsReady] = useState(false);
const [facesDetected, setFacesDetected] = useState(0);
const faceDetectionRef = useRef<FaceDetection | null>(null);
const initialize = useCallback(async () => {
// Create FaceDetection instance with file loading configuration
const faceDetection = new FaceDetection({
locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/${file}`;
}
});
// Load the model and WebAssembly runtime
await faceDetection.initialize();
// Configure detection parameters
faceDetection.setOptions({
model: 'short_range', // 'short_range' (2m) or 'full_range' (5m)
minDetectionConfidence: 0.5 // 0-1 confidence threshold
});
// Set up the results callback - called for every processed frame
faceDetection.onResults((results) => {
// Results contain the original image and array of detections
const detections = results.detections || [];
setFacesDetected(detections.length);
// Pass results to a global handler for canvas drawing
// (In production, use a proper callback or state management)
if (window.onFaceDetectionResults) {
window.onFaceDetectionResults(results);
}
});
faceDetectionRef.current = faceDetection;
setIsReady(true);
}, []);
// Process a video frame - MediaPipe handles this asynchronously
const processFrame = useCallback((video: HTMLVideoElement) => {
if (faceDetectionRef.current && isReady) {
// Send frame to MediaPipe - results arrive via onResults callback
faceDetectionRef.current.send({ image: video });
}
}, [isReady]);
return {
isReady, // Model loaded and ready to process
facesDetected, // Current count of detected faces
initialize, // Call once to set up MediaPipe
processFrame // Call repeatedly with video frames
};
};
This hook encapsulates MediaPipe initialization and configuration. It handles model loading, sets detection parameters, and provides a simple interface for processing video frames.
Processing Flow
The face detection pipeline follows an asynchronous pattern:
1. Video Frame → processFrame(video)
2. MediaPipe processes frame in WebAssembly
3. Results arrive via onResults callback
4. Callback updates state and triggers canvas drawing
Key points:
- Asynchronous Processing:
send()
returns immediately; results arrive later via callback - Normalized Coordinates: All positions are 0-1 values; multiply by canvas dimensions for pixel coordinates
- Frame Rate: MediaPipe processes as fast as possible; consider throttling for performance
Building the Face Detection Component
Now let’s create a simple React component that ties everything together:
// FaceDetection.tsx
import React, { useRef, useEffect, useCallback } from 'react';
import { useCamera } from './useCamera';
import { useMediaPipe } from './useMediaPipe';
const FaceDetection: React.FC = () => {
const videoRef = useRef<HTMLVideoElement>(null);
const canvasRef = useRef<HTMLCanvasElement>(null);
const animationRef = useRef<number>();
const {
cameras,
selectedCamera,
requestCameraStream,
stopStream
} = useCamera();
const {
isReady,
facesDetected,
initialize,
processFrame
} = useMediaPipe();
// Initialize MediaPipe on mount
useEffect(() => {
initialize();
return () => stopStream();
}, [initialize, stopStream]);
// Handle face detection results
useEffect(() => {
window.onFaceDetectionResults = (results) => {
const canvas = canvasRef.current;
const ctx = canvas?.getContext('2d');
if (!ctx || !canvas) return;
// Clear canvas and draw video frame
ctx.clearRect(0, 0, canvas.width, canvas.height);
ctx.drawImage(results.image, 0, 0);
// Draw detection boxes
for (const detection of results.detections) {
const box = detection.boundingBox;
const x = box.xCenter - box.width / 2;
const y = box.yCenter - box.height / 2;
ctx.strokeStyle = '#00FF00';
ctx.lineWidth = 3;
ctx.strokeRect(
x * canvas.width,
y * canvas.height,
box.width * canvas.width,
box.height * canvas.height
);
}
};
}, []);
// Process video frames
const processVideo = useCallback(() => {
if (videoRef.current && isReady) {
processFrame(videoRef.current);
}
animationRef.current = requestAnimationFrame(processVideo);
}, [processFrame, isReady]);
// Start camera
const startCamera = async () => {
try {
const stream = await requestCameraStream(selectedCamera);
if (videoRef.current) {
videoRef.current.srcObject = stream;
await videoRef.current.play();
processVideo();
}
} catch (err) {
console.error('Failed to start camera:', err);
}
};
return (
<div>
<button onClick={startCamera}>Start Detection</button>
<p>Faces detected: {facesDetected}</p>
<div style={{ position: 'relative' }}>
<video
ref={videoRef}
style={{ display: 'none' }}
playsInline
/>
<canvas
ref={canvasRef}
width={640}
height={480}
style={{ maxWidth: '100%' }}
/>
</div>
</div>
);
};
This component combines the camera and MediaPipe hooks to create a functional face detection interface. The video element captures the webcam feed, while the canvas displays the processed output with detection boxes.
Implementation Pipeline
The following diagram illustrates how the face detection pipeline works in the browser:
Implementation Summary
Our face detection implementation follows a modular approach with clear separation of concerns:
-
Camera Management: The
useCamera
hook handles all webcam-related operations, including permission requests, device enumeration, and stream management. This abstraction makes it easy to switch cameras or handle permission errors. -
MediaPipe Integration: The
useMediaPipe
hook encapsulates the MediaPipe library initialization and configuration. It provides a simple interface for processing frames and handles the asynchronous model loading process. -
Rendering Pipeline: The main component orchestrates the video capture and canvas rendering. It uses
requestAnimationFrame
for smooth real-time processing and draws detection results directly on the canvas for immediate visual feedback.
This architecture ensures clean code organization, reusable components, and efficient real-time processing suitable for production applications.