Knowledge Base / Computer Vision

MediaPipe Hands

Overview

MediaPipe Hands is Google's real-time hand tracking solution that runs entirely in the browser. It detects hands in video streams and returns 21 3D landmarks per hand - giving you precise finger positions to build interactive experiences.

This guide covers everything from basic setup to building fun projects like virtual pianos, air drawing apps, and animated hand puppets. No special hardware required - just a webcam!

Beginner Friendly
MediaPipe handles all the ML complexity. You just get clean landmark coordinates to work with - perfect for creative coding!

How It Works

MediaPipe Hands uses a two-stage ML pipeline:

  1. Palm Detection - A lightweight model finds hands in the frame and returns bounding boxes.
  2. Hand Landmark Model - A more detailed model runs on each detected hand, outputting 21 precise 3D points.

The pipeline runs at 30+ FPS on most devices, with the models running on GPU via WebGL. Each landmark includes x, y coordinates (0-1 normalized) plus a z value for depth estimation.

Hand Landmarks

MediaPipe tracks 21 landmarks on each hand. Understanding this numbering is key to building gesture recognition:

0 - WRIST 4 - THUMB_TIP 8 - INDEX_TIP 12 - MIDDLE_TIP 16 - RING_TIP 20 - PINKY_TIP

21 hand landmarks: 0 (wrist), 1-4 (thumb), 5-8 (index), 9-12 (middle), 13-16 (ring), 17-20 (pinky)

Setup

MediaPipe offers two ways to integrate: CDN scripts or npm packages. For quick prototyping, CDN is easiest:

CDN Setup

HTML
<!-- MediaPipe Scripts -->
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/hands/hands.js" crossorigin="anonymous"></script>

NPM Setup

Bash
npm install @mediapipe/hands @mediapipe/camera_utils @mediapipe/drawing_utils

Basic Hand Tracking

Here's a complete example that shows hand landmarks overlaid on your webcam feed:

HTML
<!DOCTYPE html>
<html>
<head>
    <title>MediaPipe Hands Demo</title>
    <style>
        body { margin: 0; background: #0a0f1a; display: flex; 
               justify-content: center; align-items: center; 
               min-height: 100vh; }
        .container { position: relative; }
        video { display: none; }
        canvas { border-radius: 12px; }
    </style>
</head>
<body>
    <div class="container">
        <video id="video"></video>
        <canvas id="canvas" width="1280" height="720"></canvas>
    </div>

    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/hands/hands.js"></script>
    
    <script>
        const video = document.getElementById('video');
        const canvas = document.getElementById('canvas');
        const ctx = canvas.getContext('2d');

        // Initialize MediaPipe Hands
        const hands = new Hands({
            locateFile: (file) => 
                `https://cdn.jsdelivr.net/npm/@mediapipe/hands/${file}`
        });

        hands.setOptions({
            maxNumHands: 2,
            modelComplexity: 1,
            minDetectionConfidence: 0.5,
            minTrackingConfidence: 0.5
        });

        // Process results
        hands.onResults((results) => {
            ctx.save();
            ctx.clearRect(0, 0, canvas.width, canvas.height);
            ctx.drawImage(results.image, 0, 0, canvas.width, canvas.height);

            if (results.multiHandLandmarks) {
                for (const landmarks of results.multiHandLandmarks) {
                    // Draw connectors
                    drawConnectors(ctx, landmarks, HAND_CONNECTIONS, 
                        { color: '#06b6d4', lineWidth: 3 });
                    // Draw landmarks
                    drawLandmarks(ctx, landmarks, 
                        { color: '#10b981', lineWidth: 2, radius: 5 });
                }
            }
            ctx.restore();
        });

        // Start camera
        const camera = new Camera(video, {
            onFrame: async () => await hands.send({ image: video }),
            width: 1280,
            height: 720
        });
        camera.start();
    </script>
</body>
</html>

Gesture Recognition

With landmark positions, you can detect gestures by checking finger states. Here are common patterns:

Gesture Detection Logic Use Case
👆Pointing Index extended, others curled Cursor control, selection
✌️Peace Index + Middle extended Screenshot, mode switch
👍Thumbs Up Thumb extended upward Confirm, like action
Fist All fingers curled Grab, drag objects
🖐️Open Palm All fingers extended Stop, release, pause
🤏Pinch Thumb + Index close together Zoom, fine control

Gesture Detection Code

JavaScript
// Landmark indices
const LANDMARKS = {
    WRIST: 0,
    THUMB_TIP: 4,
    INDEX_TIP: 8,
    MIDDLE_TIP: 12,
    RING_TIP: 16,
    PINKY_TIP: 20,
    INDEX_MCP: 5,  // Knuckle
    MIDDLE_MCP: 9,
    RING_MCP: 13,
    PINKY_MCP: 17
};

// Check if finger is extended
function isFingerExtended(landmarks, fingerTip, fingerMcp) {
    return landmarks[fingerTip].y < landmarks[fingerMcp].y;
}

// Check thumb (uses x-axis since thumb points sideways)
function isThumbExtended(landmarks, handedness) {
    const isRightHand = handedness === 'Right';
    const thumbTip = landmarks[LANDMARKS.THUMB_TIP];
    const thumbMcp = landmarks[1]; // THUMB_CMC
    
    return isRightHand 
        ? thumbTip.x < thumbMcp.x 
        : thumbTip.x > thumbMcp.x;
}

// Detect gesture from landmarks
function detectGesture(landmarks, handedness) {
    const thumb = isThumbExtended(landmarks, handedness);
    const index = isFingerExtended(landmarks, 8, 5);
    const middle = isFingerExtended(landmarks, 12, 9);
    const ring = isFingerExtended(landmarks, 16, 13);
    const pinky = isFingerExtended(landmarks, 20, 17);
    
    // Open palm - all extended
    if (thumb && index && middle && ring && pinky) {
        return 'OPEN_PALM';
    }
    
    // Fist - all curled
    if (!thumb && !index && !middle && !ring && !pinky) {
        return 'FIST';
    }
    
    // Pointing - only index extended
    if (!thumb && index && !middle && !ring && !pinky) {
        return 'POINTING';
    }
    
    // Peace sign
    if (!thumb && index && middle && !ring && !pinky) {
        return 'PEACE';
    }
    
    // Thumbs up
    if (thumb && !index && !middle && !ring && !pinky) {
        return 'THUMBS_UP';
    }
    
    // Rock on 🤘
    if (!thumb && index && !middle && !ring && pinky) {
        return 'ROCK';
    }
    
    return 'UNKNOWN';
}

// Calculate pinch distance
function getPinchDistance(landmarks) {
    const thumb = landmarks[LANDMARKS.THUMB_TIP];
    const index = landmarks[LANDMARKS.INDEX_TIP];
    
    return Math.hypot(thumb.x - index.x, thumb.y - index.y);
}

// Usage in onResults callback
hands.onResults((results) => {
    if (results.multiHandLandmarks && results.multiHandedness) {
        results.multiHandLandmarks.forEach((landmarks, i) => {
            const handedness = results.multiHandedness[i].label;
            const gesture = detectGesture(landmarks, handedness);
            const pinch = getPinchDistance(landmarks);
            
            console.log(`${handedness} hand: ${gesture}, pinch: ${pinch.toFixed(3)}`);
        });
    }
});

Fun Projects

Now for the fun part! Here are creative projects you can build with MediaPipe Hands:

Virtual Piano
Intermediate

Play piano keys by tapping your fingers in the air. Each fingertip triggers a different note!

Web Audio API Canvas Tone.js
Air Drawing
Beginner

Draw in the air with your index finger! Pinch to change colors, open palm to clear canvas.

Canvas 2D Gestures
Hand Puppets
Intermediate

Animate cute characters with your hand movements. Open/close hand to make them talk!

SVG Animation GSAP
Gesture Slideshow
Beginner

Control presentations with hand gestures. Swipe to navigate, thumbs up to like slides!

Swipe Detection CSS Transitions
Hand-Controlled Game
Advanced

Build a simple game where you dodge obstacles or catch objects using hand position!

Game Loop Collision Detection
3D Object Viewer
Advanced

Rotate and zoom 3D models using hand gestures. Pinch to zoom, rotate palm to spin!

Three.js OrbitControls

Project: Virtual Piano

Let's build a piano you play by tapping fingers in the air! Each fingertip plays a different note when it moves downward quickly.

JavaScript
// Virtual Piano with MediaPipe Hands
// Requires Tone.js: npm install tone

import * as Tone from 'tone';

// Piano synth
const synth = new Tone.PolySynth(Tone.Synth).toDestination();

// Note mapping for each fingertip
const fingerNotes = {
    4: 'C4',   // Thumb
    8: 'D4',   // Index  
    12: 'E4',  // Middle
    16: 'F4',  // Ring
    20: 'G4'  // Pinky
};

// Track previous Y positions for tap detection
let prevPositions = {};
const TAP_THRESHOLD = 0.03;  // Minimum downward movement
const TAP_VELOCITY = 0.02;   // Speed threshold

// Check for finger taps
function detectTaps(landmarks) {
    const fingerTips = [4, 8, 12, 16, 20];
    const taps = [];
    
    fingerTips.forEach(tip => {
        const currentY = landmarks[tip].y;
        const prevY = prevPositions[tip] || currentY;
        const velocity = currentY - prevY;
        
        // Detect downward movement (increasing Y = moving down)
        if (velocity > TAP_VELOCITY) {
            taps.push({
                finger: tip,
                note: fingerNotes[tip],
                velocity: Math.min(velocity * 10, 1)  // Normalize for volume
            });
        }
        
        prevPositions[tip] = currentY;
    });
    
    return taps;
}

// Play detected taps
function playTaps(taps) {
    taps.forEach(tap => {
        synth.triggerAttackRelease(tap.note, '8n', Tone.now(), tap.velocity);
        
        // Visual feedback
        showTapEffect(tap.finger);
    });
}

// Visual feedback for taps
function showTapEffect(finger) {
    const keyElement = document.querySelector(`[data-finger="${finger}"]`);
    if (keyElement) {
        keyElement.classList.add('active');
        setTimeout(() => keyElement.classList.remove('active'), 150);
    }
}

// In your MediaPipe onResults callback:
hands.onResults((results) => {
    // ... drawing code ...
    
    if (results.multiHandLandmarks) {
        results.multiHandLandmarks.forEach(landmarks => {
            const taps = detectTaps(landmarks);
            if (taps.length > 0) {
                playTaps(taps);
            }
        });
    }
});

// Start audio context on user interaction
document.addEventListener('click', async () => {
    await Tone.start();
    console.log('Audio ready!');
}, { once: true });

Project: Air Drawing

Draw with your index finger, change colors with pinch gestures, and clear the canvas with an open palm:

JavaScript
// Air Drawing Canvas
const drawCanvas = document.getElementById('drawCanvas');
const drawCtx = drawCanvas.getContext('2d');

// Drawing state
let isDrawing = false;
let lastPoint = null;
let currentColor = '#06b6d4';
let brushSize = 5;

const colors = ['#06b6d4', '#10b981', '#f59e0b', '#ef4444', '#8b5cf6'];
let colorIndex = 0;

// Get pinch distance
function getPinchDistance(landmarks) {
    const thumb = landmarks[4];
    const index = landmarks[8];
    return Math.hypot(thumb.x - index.x, thumb.y - index.y);
}

// Check if only index finger is extended (pointing gesture)
function isPointing(landmarks) {
    const indexExtended = landmarks[8].y < landmarks[6].y;
    const middleCurled = landmarks[12].y > landmarks[10].y;
    const ringCurled = landmarks[16].y > landmarks[14].y;
    const pinkyCurled = landmarks[20].y > landmarks[18].y;
    
    return indexExtended && middleCurled && ringCurled && pinkyCurled;
}

// Check for open palm (all fingers extended)
function isOpenPalm(landmarks) {
    const fingers = [[8,6], [12,10], [16,14], [20,18]];
    return fingers.every(([tip, pip]) => landmarks[tip].y < landmarks[pip].y);
}

// Drawing logic
let lastPinchDistance = 0;
let pinchCooldown = 0;

hands.onResults((results) => {
    // Draw video feed
    ctx.drawImage(results.image, 0, 0);
    
    if (results.multiHandLandmarks && results.multiHandLandmarks[0]) {
        const landmarks = results.multiHandLandmarks[0];
        const indexTip = landmarks[8];
        
        // Convert normalized coords to canvas pixels
        const x = indexTip.x * drawCanvas.width;
        const y = indexTip.y * drawCanvas.height;
        
        // Check gestures
        const pinchDist = getPinchDistance(landmarks);
        const pointing = isPointing(landmarks);
        const palm = isOpenPalm(landmarks);
        
        // Pinch to change color
        if (pinchDist < 0.05 && pinchCooldown <= 0) {
            colorIndex = (colorIndex + 1) % colors.length;
            currentColor = colors[colorIndex];
            pinchCooldown = 30;  // Cooldown frames
            showColorIndicator(currentColor);
        }
        pinchCooldown--;
        
        // Open palm to clear
        if (palm) {
            drawCtx.clearRect(0, 0, drawCanvas.width, drawCanvas.height);
            lastPoint = null;
        }
        
        // Pointing to draw
        if (pointing) {
            if (lastPoint) {
                drawCtx.beginPath();
                drawCtx.moveTo(lastPoint.x, lastPoint.y);
                drawCtx.lineTo(x, y);
                drawCtx.strokeStyle = currentColor;
                drawCtx.lineWidth = brushSize;
                drawCtx.lineCap = 'round';
                drawCtx.stroke();
            }
            lastPoint = { x, y };
            
            // Draw cursor
            drawCtx.beginPath();
            drawCtx.arc(x, y, brushSize / 2, 0, Math.PI * 2);
            drawCtx.fillStyle = currentColor;
            drawCtx.fill();
        } else {
            lastPoint = null;
        }
        
        // Draw hand landmarks on video canvas
        drawConnectors(ctx, landmarks, HAND_CONNECTIONS, { color: '#ffffff44' });
    }
});

function showColorIndicator(color) {
    const indicator = document.getElementById('colorIndicator');
    indicator.style.background = color;
    indicator.classList.add('pulse');
    setTimeout(() => indicator.classList.remove('pulse'), 300);
}

Project: Hand Puppets

Create an animated character controlled by your hand. The puppet's mouth opens when you open your hand!

JavaScript
// Hand Puppet Controller
class HandPuppet {
    constructor(svgElement) {
        this.svg = svgElement;
        this.head = this.svg.querySelector('.puppet-head');
        this.topJaw = this.svg.querySelector('.puppet-top-jaw');
        this.bottomJaw = this.svg.querySelector('.puppet-bottom-jaw');
        this.leftEye = this.svg.querySelector('.puppet-eye-left');
        this.rightEye = this.svg.querySelector('.puppet-eye-right');
    }
    
    // Calculate hand openness (0 = closed, 1 = fully open)
    getHandOpenness(landmarks) {
        const wrist = landmarks[0];
        const middleTip = landmarks[12];
        const distance = Math.hypot(
            middleTip.x - wrist.x, 
            middleTip.y - wrist.y
        );
        // Normalize: closed hand ~0.15, open hand ~0.35
        return Math.min(Math.max((distance - 0.15) / 0.2, 0), 1);
    }
    
    // Get hand rotation (tilt)
    getHandRotation(landmarks) {
        const wrist = landmarks[0];
        const middleMcp = landmarks[9];
        const angle = Math.atan2(
            middleMcp.y - wrist.y,
            middleMcp.x - wrist.x
        );
        return angle * (180 / Math.PI) + 90;  // Convert to degrees
    }
    
    // Get hand center position
    getHandCenter(landmarks) {
        const wrist = landmarks[0];
        const middleMcp = landmarks[9];
        return {
            x: (wrist.x + middleMcp.x) / 2,
            y: (wrist.y + middleMcp.y) / 2
        };
    }
    
    // Update puppet based on hand
    update(landmarks) {
        const openness = this.getHandOpenness(landmarks);
        const rotation = this.getHandRotation(landmarks);
        const center = this.getHandCenter(landmarks);
        
        // Position puppet
        const puppetX = center.x * 100;  // Percent
        const puppetY = center.y * 100;
        this.svg.style.left = `${puppetX}%`;
        this.svg.style.top = `${puppetY}%`;
        
        // Rotate puppet
        this.head.style.transform = `rotate(${rotation}deg)`;
        
        // Animate mouth (jaw opens based on hand openness)
        const mouthOpen = openness * 20;  // Max 20 degrees
        this.topJaw.style.transform = `rotate(${-mouthOpen / 2}deg)`;
        this.bottomJaw.style.transform = `rotate(${mouthOpen / 2}deg)`;
        
        // Eyes follow hand movement (slight delay for character)
        const eyeOffsetX = (center.x - 0.5) * 10;
        const eyeOffsetY = (center.y - 0.5) * 10;
        this.leftEye.style.transform = `translate(${eyeOffsetX}px, ${eyeOffsetY}px)`;
        this.rightEye.style.transform = `translate(${eyeOffsetX}px, ${eyeOffsetY}px)`;
        
        // Add "talking" effect when mouth moves
        if (openness > 0.3) {
            this.svg.classList.add('talking');
        } else {
            this.svg.classList.remove('talking');
        }
    }
}

// Initialize puppet
const puppet = new HandPuppet(document.getElementById('puppet'));

// In MediaPipe callback
hands.onResults((results) => {
    if (results.multiHandLandmarks && results.multiHandLandmarks[0]) {
        puppet.update(results.multiHandLandmarks[0]);
    }
});
Puppet SVG Structure
Create your puppet SVG with separate groups for head, jaws, and eyes. Each part needs a class name for JavaScript targeting. Use transform-origin to set rotation pivot points.

Performance Tips

Keep your MediaPipe apps running smoothly with these optimizations:

1. Adjust Model Complexity

JavaScript
hands.setOptions({
    maxNumHands: 1,          // Reduce if you only need one hand
    modelComplexity: 0,      // 0 = Lite (fastest), 1 = Full (accurate)
    minDetectionConfidence: 0.7,  // Higher = fewer false positives
    minTrackingConfidence: 0.5
});

2. Reduce Video Resolution

JavaScript
// Lower resolution = faster processing
const camera = new Camera(video, {
    onFrame: async () => await hands.send({ image: video }),
    width: 640,   // Reduced from 1280
    height: 480   // Reduced from 720
});

3. Throttle Processing

JavaScript
// Process every other frame for better performance
let frameCount = 0;
const camera = new Camera(video, {
    onFrame: async () => {
        frameCount++;
        if (frameCount % 2 === 0) {  // Skip every other frame
            await hands.send({ image: video });
        }
    },
    width: 640,
    height: 480
});

4. Use requestAnimationFrame for Rendering

JavaScript
// Store latest results, render on animation frame
let latestResults = null;

hands.onResults((results) => {
    latestResults = results;
});

function render() {
    if (latestResults) {
        // Do your drawing here
        drawHands(latestResults);
    }
    requestAnimationFrame(render);
}

render();
Mobile Considerations
On mobile devices, use modelComplexity: 0, limit to 1 hand, and consider reducing to 30fps. Test on real devices - emulators don't reflect actual performance.
Ready to Build!
You now have everything needed to create amazing hand-tracking experiences. Start with the basic tracking example, add gesture detection, then build your own creative projects!