Project Overview

A browser-based face detection system with real-time AR mask overlays using MediaPipe FaceMesh and HTML Canvas. Built as a class project exploring computer vision and machine learning—we rebuilt face detection from scratch to understand the mechanics behind landmark detection, then applied custom mask rendering to create interactive visualizations. Runs entirely client-side; no server required.

Live Demo

Select a Mask

Bear Bear
Cat Cat
Custom 1 Custom 1
Custom 2 Custom 2

Key Features

  • Real-time Face Detection: Utilizes MediaPipe's face mesh to detect and track facial landmarks in real-time
  • Dynamic Mask Application: Applies masks that follow facial movements and rotations
  • Multiple Mask Options: Choose from various animal masks and custom designs
  • Smooth Performance: Optimized for real-time processing with minimal latency
  • Responsive Design: Works across different devices and screen sizes

Technical Learning & Implementation

Rather than simply using MediaPipe as a black box, the project involved reconstructing face detection fundamentals to understand how modern ML models work:

  • Landmark Detection Pipeline: Face detection begins with a neural network trained to predict 468 3D facial landmarks from 2D input. The model learns to identify subtle geometry—jawlines, eye corners, mouth contours—by detecting local features (edges, textures) and combining them hierarchically. We explored how pooling, convolution, and non-linear activations enable this spatial reasoning.
  • Real-Time Performance Constraints: Running inference every frame (30+ FPS) requires model quantization and optimization. We analyzed trade-offs between model size, latency, and accuracy—understanding why mobile-optimized models sacrifice some precision for speed.
  • Mask Registration & Warping: Once landmarks are detected, applying a 2D mask image to a 3D face requires perspective transformation. We implemented affine and homographic warping using Canvas 2D transforms, learning how to map image coordinates across coordinate systems and handle occlusion (e.g., hair covering landmarks).
  • Interactive Design for Learning: The mask filter adds engagement—converting an abstract ML pipeline into a tangible, fun output. This taught me that effective ML visualization requires bridging technical depth with user experience.

Technical Implementation

Custom Facial Landmark Model WebGL HTML5 Canvas WebRTC JavaScript

Key Technologies

  • Custom-trained 468-point facial landmark detection model, optimized for browser use
  • WebGL & HTML5 Canvas for high-performance rendering
  • Real-time AR experience with sub-50ms latency
  • Cross-platform support, including mobile browsers

This project showcases how real-time computer vision and augmented reality can be delivered through the web without any server-side processing or native mobile apps.

Current Status

The project is fully functional and demonstrates real-time face detection and mask application. It successfully tracks facial movements and applies masks with proper scaling and rotation. The implementation is optimized for performance and provides a smooth user experience across different devices.