Overview
MediaPipe Hands is Google's real-time hand tracking solution that runs entirely in the browser. It detects hands in video streams and returns 21 3D landmarks per hand - giving you precise finger positions to build interactive experiences.
This guide covers everything from basic setup to building fun projects like virtual pianos, air drawing apps, and animated hand puppets. No special hardware required - just a webcam!
How It Works
MediaPipe Hands uses a two-stage ML pipeline:
- Palm Detection - A lightweight model finds hands in the frame and returns bounding boxes.
- Hand Landmark Model - A more detailed model runs on each detected hand, outputting 21 precise 3D points.
The pipeline runs at 30+ FPS on most devices, with the models running on GPU via WebGL. Each landmark includes x, y coordinates (0-1 normalized) plus a z value for depth estimation.
Hand Landmarks
MediaPipe tracks 21 landmarks on each hand. Understanding this numbering is key to building gesture recognition:
21 hand landmarks: 0 (wrist), 1-4 (thumb), 5-8 (index), 9-12 (middle), 13-16 (ring), 17-20 (pinky)
Setup
MediaPipe offers two ways to integrate: CDN scripts or npm packages. For quick prototyping, CDN is easiest:
CDN Setup
<!-- MediaPipe Scripts -->
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/hands/hands.js" crossorigin="anonymous"></script>
NPM Setup
npm install @mediapipe/hands @mediapipe/camera_utils @mediapipe/drawing_utils
Basic Hand Tracking
Here's a complete example that shows hand landmarks overlaid on your webcam feed:
<!DOCTYPE html>
<html>
<head>
<title>MediaPipe Hands Demo</title>
<style>
body { margin: 0; background: #0a0f1a; display: flex;
justify-content: center; align-items: center;
min-height: 100vh; }
.container { position: relative; }
video { display: none; }
canvas { border-radius: 12px; }
</style>
</head>
<body>
<div class="container">
<video id="video"></video>
<canvas id="canvas" width="1280" height="720"></canvas>
</div>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/hands/hands.js"></script>
<script>
const video = document.getElementById('video');
const canvas = document.getElementById('canvas');
const ctx = canvas.getContext('2d');
// Initialize MediaPipe Hands
const hands = new Hands({
locateFile: (file) =>
`https://cdn.jsdelivr.net/npm/@mediapipe/hands/${file}`
});
hands.setOptions({
maxNumHands: 2,
modelComplexity: 1,
minDetectionConfidence: 0.5,
minTrackingConfidence: 0.5
});
// Process results
hands.onResults((results) => {
ctx.save();
ctx.clearRect(0, 0, canvas.width, canvas.height);
ctx.drawImage(results.image, 0, 0, canvas.width, canvas.height);
if (results.multiHandLandmarks) {
for (const landmarks of results.multiHandLandmarks) {
// Draw connectors
drawConnectors(ctx, landmarks, HAND_CONNECTIONS,
{ color: '#06b6d4', lineWidth: 3 });
// Draw landmarks
drawLandmarks(ctx, landmarks,
{ color: '#10b981', lineWidth: 2, radius: 5 });
}
}
ctx.restore();
});
// Start camera
const camera = new Camera(video, {
onFrame: async () => await hands.send({ image: video }),
width: 1280,
height: 720
});
camera.start();
</script>
</body>
</html>
Gesture Recognition
With landmark positions, you can detect gestures by checking finger states. Here are common patterns:
| Gesture | Detection Logic | Use Case |
|---|---|---|
| 👆Pointing | Index extended, others curled | Cursor control, selection |
| ✌️Peace | Index + Middle extended | Screenshot, mode switch |
| 👍Thumbs Up | Thumb extended upward | Confirm, like action |
| ✊Fist | All fingers curled | Grab, drag objects |
| 🖐️Open Palm | All fingers extended | Stop, release, pause |
| 🤏Pinch | Thumb + Index close together | Zoom, fine control |
Gesture Detection Code
// Landmark indices
const LANDMARKS = {
WRIST: 0,
THUMB_TIP: 4,
INDEX_TIP: 8,
MIDDLE_TIP: 12,
RING_TIP: 16,
PINKY_TIP: 20,
INDEX_MCP: 5, // Knuckle
MIDDLE_MCP: 9,
RING_MCP: 13,
PINKY_MCP: 17
};
// Check if finger is extended
function isFingerExtended(landmarks, fingerTip, fingerMcp) {
return landmarks[fingerTip].y < landmarks[fingerMcp].y;
}
// Check thumb (uses x-axis since thumb points sideways)
function isThumbExtended(landmarks, handedness) {
const isRightHand = handedness === 'Right';
const thumbTip = landmarks[LANDMARKS.THUMB_TIP];
const thumbMcp = landmarks[1]; // THUMB_CMC
return isRightHand
? thumbTip.x < thumbMcp.x
: thumbTip.x > thumbMcp.x;
}
// Detect gesture from landmarks
function detectGesture(landmarks, handedness) {
const thumb = isThumbExtended(landmarks, handedness);
const index = isFingerExtended(landmarks, 8, 5);
const middle = isFingerExtended(landmarks, 12, 9);
const ring = isFingerExtended(landmarks, 16, 13);
const pinky = isFingerExtended(landmarks, 20, 17);
// Open palm - all extended
if (thumb && index && middle && ring && pinky) {
return 'OPEN_PALM';
}
// Fist - all curled
if (!thumb && !index && !middle && !ring && !pinky) {
return 'FIST';
}
// Pointing - only index extended
if (!thumb && index && !middle && !ring && !pinky) {
return 'POINTING';
}
// Peace sign
if (!thumb && index && middle && !ring && !pinky) {
return 'PEACE';
}
// Thumbs up
if (thumb && !index && !middle && !ring && !pinky) {
return 'THUMBS_UP';
}
// Rock on 🤘
if (!thumb && index && !middle && !ring && pinky) {
return 'ROCK';
}
return 'UNKNOWN';
}
// Calculate pinch distance
function getPinchDistance(landmarks) {
const thumb = landmarks[LANDMARKS.THUMB_TIP];
const index = landmarks[LANDMARKS.INDEX_TIP];
return Math.hypot(thumb.x - index.x, thumb.y - index.y);
}
// Usage in onResults callback
hands.onResults((results) => {
if (results.multiHandLandmarks && results.multiHandedness) {
results.multiHandLandmarks.forEach((landmarks, i) => {
const handedness = results.multiHandedness[i].label;
const gesture = detectGesture(landmarks, handedness);
const pinch = getPinchDistance(landmarks);
console.log(`${handedness} hand: ${gesture}, pinch: ${pinch.toFixed(3)}`);
});
}
});
Fun Projects
Now for the fun part! Here are creative projects you can build with MediaPipe Hands:
Play piano keys by tapping your fingers in the air. Each fingertip triggers a different note!
Draw in the air with your index finger! Pinch to change colors, open palm to clear canvas.
Animate cute characters with your hand movements. Open/close hand to make them talk!
Control presentations with hand gestures. Swipe to navigate, thumbs up to like slides!
Build a simple game where you dodge obstacles or catch objects using hand position!
Rotate and zoom 3D models using hand gestures. Pinch to zoom, rotate palm to spin!
Project: Virtual Piano
Let's build a piano you play by tapping fingers in the air! Each fingertip plays a different note when it moves downward quickly.
// Virtual Piano with MediaPipe Hands
// Requires Tone.js: npm install tone
import * as Tone from 'tone';
// Piano synth
const synth = new Tone.PolySynth(Tone.Synth).toDestination();
// Note mapping for each fingertip
const fingerNotes = {
4: 'C4', // Thumb
8: 'D4', // Index
12: 'E4', // Middle
16: 'F4', // Ring
20: 'G4' // Pinky
};
// Track previous Y positions for tap detection
let prevPositions = {};
const TAP_THRESHOLD = 0.03; // Minimum downward movement
const TAP_VELOCITY = 0.02; // Speed threshold
// Check for finger taps
function detectTaps(landmarks) {
const fingerTips = [4, 8, 12, 16, 20];
const taps = [];
fingerTips.forEach(tip => {
const currentY = landmarks[tip].y;
const prevY = prevPositions[tip] || currentY;
const velocity = currentY - prevY;
// Detect downward movement (increasing Y = moving down)
if (velocity > TAP_VELOCITY) {
taps.push({
finger: tip,
note: fingerNotes[tip],
velocity: Math.min(velocity * 10, 1) // Normalize for volume
});
}
prevPositions[tip] = currentY;
});
return taps;
}
// Play detected taps
function playTaps(taps) {
taps.forEach(tap => {
synth.triggerAttackRelease(tap.note, '8n', Tone.now(), tap.velocity);
// Visual feedback
showTapEffect(tap.finger);
});
}
// Visual feedback for taps
function showTapEffect(finger) {
const keyElement = document.querySelector(`[data-finger="${finger}"]`);
if (keyElement) {
keyElement.classList.add('active');
setTimeout(() => keyElement.classList.remove('active'), 150);
}
}
// In your MediaPipe onResults callback:
hands.onResults((results) => {
// ... drawing code ...
if (results.multiHandLandmarks) {
results.multiHandLandmarks.forEach(landmarks => {
const taps = detectTaps(landmarks);
if (taps.length > 0) {
playTaps(taps);
}
});
}
});
// Start audio context on user interaction
document.addEventListener('click', async () => {
await Tone.start();
console.log('Audio ready!');
}, { once: true });
Project: Air Drawing
Draw with your index finger, change colors with pinch gestures, and clear the canvas with an open palm:
// Air Drawing Canvas
const drawCanvas = document.getElementById('drawCanvas');
const drawCtx = drawCanvas.getContext('2d');
// Drawing state
let isDrawing = false;
let lastPoint = null;
let currentColor = '#06b6d4';
let brushSize = 5;
const colors = ['#06b6d4', '#10b981', '#f59e0b', '#ef4444', '#8b5cf6'];
let colorIndex = 0;
// Get pinch distance
function getPinchDistance(landmarks) {
const thumb = landmarks[4];
const index = landmarks[8];
return Math.hypot(thumb.x - index.x, thumb.y - index.y);
}
// Check if only index finger is extended (pointing gesture)
function isPointing(landmarks) {
const indexExtended = landmarks[8].y < landmarks[6].y;
const middleCurled = landmarks[12].y > landmarks[10].y;
const ringCurled = landmarks[16].y > landmarks[14].y;
const pinkyCurled = landmarks[20].y > landmarks[18].y;
return indexExtended && middleCurled && ringCurled && pinkyCurled;
}
// Check for open palm (all fingers extended)
function isOpenPalm(landmarks) {
const fingers = [[8,6], [12,10], [16,14], [20,18]];
return fingers.every(([tip, pip]) => landmarks[tip].y < landmarks[pip].y);
}
// Drawing logic
let lastPinchDistance = 0;
let pinchCooldown = 0;
hands.onResults((results) => {
// Draw video feed
ctx.drawImage(results.image, 0, 0);
if (results.multiHandLandmarks && results.multiHandLandmarks[0]) {
const landmarks = results.multiHandLandmarks[0];
const indexTip = landmarks[8];
// Convert normalized coords to canvas pixels
const x = indexTip.x * drawCanvas.width;
const y = indexTip.y * drawCanvas.height;
// Check gestures
const pinchDist = getPinchDistance(landmarks);
const pointing = isPointing(landmarks);
const palm = isOpenPalm(landmarks);
// Pinch to change color
if (pinchDist < 0.05 && pinchCooldown <= 0) {
colorIndex = (colorIndex + 1) % colors.length;
currentColor = colors[colorIndex];
pinchCooldown = 30; // Cooldown frames
showColorIndicator(currentColor);
}
pinchCooldown--;
// Open palm to clear
if (palm) {
drawCtx.clearRect(0, 0, drawCanvas.width, drawCanvas.height);
lastPoint = null;
}
// Pointing to draw
if (pointing) {
if (lastPoint) {
drawCtx.beginPath();
drawCtx.moveTo(lastPoint.x, lastPoint.y);
drawCtx.lineTo(x, y);
drawCtx.strokeStyle = currentColor;
drawCtx.lineWidth = brushSize;
drawCtx.lineCap = 'round';
drawCtx.stroke();
}
lastPoint = { x, y };
// Draw cursor
drawCtx.beginPath();
drawCtx.arc(x, y, brushSize / 2, 0, Math.PI * 2);
drawCtx.fillStyle = currentColor;
drawCtx.fill();
} else {
lastPoint = null;
}
// Draw hand landmarks on video canvas
drawConnectors(ctx, landmarks, HAND_CONNECTIONS, { color: '#ffffff44' });
}
});
function showColorIndicator(color) {
const indicator = document.getElementById('colorIndicator');
indicator.style.background = color;
indicator.classList.add('pulse');
setTimeout(() => indicator.classList.remove('pulse'), 300);
}
Project: Hand Puppets
Create an animated character controlled by your hand. The puppet's mouth opens when you open your hand!
// Hand Puppet Controller
class HandPuppet {
constructor(svgElement) {
this.svg = svgElement;
this.head = this.svg.querySelector('.puppet-head');
this.topJaw = this.svg.querySelector('.puppet-top-jaw');
this.bottomJaw = this.svg.querySelector('.puppet-bottom-jaw');
this.leftEye = this.svg.querySelector('.puppet-eye-left');
this.rightEye = this.svg.querySelector('.puppet-eye-right');
}
// Calculate hand openness (0 = closed, 1 = fully open)
getHandOpenness(landmarks) {
const wrist = landmarks[0];
const middleTip = landmarks[12];
const distance = Math.hypot(
middleTip.x - wrist.x,
middleTip.y - wrist.y
);
// Normalize: closed hand ~0.15, open hand ~0.35
return Math.min(Math.max((distance - 0.15) / 0.2, 0), 1);
}
// Get hand rotation (tilt)
getHandRotation(landmarks) {
const wrist = landmarks[0];
const middleMcp = landmarks[9];
const angle = Math.atan2(
middleMcp.y - wrist.y,
middleMcp.x - wrist.x
);
return angle * (180 / Math.PI) + 90; // Convert to degrees
}
// Get hand center position
getHandCenter(landmarks) {
const wrist = landmarks[0];
const middleMcp = landmarks[9];
return {
x: (wrist.x + middleMcp.x) / 2,
y: (wrist.y + middleMcp.y) / 2
};
}
// Update puppet based on hand
update(landmarks) {
const openness = this.getHandOpenness(landmarks);
const rotation = this.getHandRotation(landmarks);
const center = this.getHandCenter(landmarks);
// Position puppet
const puppetX = center.x * 100; // Percent
const puppetY = center.y * 100;
this.svg.style.left = `${puppetX}%`;
this.svg.style.top = `${puppetY}%`;
// Rotate puppet
this.head.style.transform = `rotate(${rotation}deg)`;
// Animate mouth (jaw opens based on hand openness)
const mouthOpen = openness * 20; // Max 20 degrees
this.topJaw.style.transform = `rotate(${-mouthOpen / 2}deg)`;
this.bottomJaw.style.transform = `rotate(${mouthOpen / 2}deg)`;
// Eyes follow hand movement (slight delay for character)
const eyeOffsetX = (center.x - 0.5) * 10;
const eyeOffsetY = (center.y - 0.5) * 10;
this.leftEye.style.transform = `translate(${eyeOffsetX}px, ${eyeOffsetY}px)`;
this.rightEye.style.transform = `translate(${eyeOffsetX}px, ${eyeOffsetY}px)`;
// Add "talking" effect when mouth moves
if (openness > 0.3) {
this.svg.classList.add('talking');
} else {
this.svg.classList.remove('talking');
}
}
}
// Initialize puppet
const puppet = new HandPuppet(document.getElementById('puppet'));
// In MediaPipe callback
hands.onResults((results) => {
if (results.multiHandLandmarks && results.multiHandLandmarks[0]) {
puppet.update(results.multiHandLandmarks[0]);
}
});
Performance Tips
Keep your MediaPipe apps running smoothly with these optimizations:
1. Adjust Model Complexity
hands.setOptions({
maxNumHands: 1, // Reduce if you only need one hand
modelComplexity: 0, // 0 = Lite (fastest), 1 = Full (accurate)
minDetectionConfidence: 0.7, // Higher = fewer false positives
minTrackingConfidence: 0.5
});
2. Reduce Video Resolution
// Lower resolution = faster processing
const camera = new Camera(video, {
onFrame: async () => await hands.send({ image: video }),
width: 640, // Reduced from 1280
height: 480 // Reduced from 720
});
3. Throttle Processing
// Process every other frame for better performance
let frameCount = 0;
const camera = new Camera(video, {
onFrame: async () => {
frameCount++;
if (frameCount % 2 === 0) { // Skip every other frame
await hands.send({ image: video });
}
},
width: 640,
height: 480
});
4. Use requestAnimationFrame for Rendering
// Store latest results, render on animation frame
let latestResults = null;
hands.onResults((results) => {
latestResults = results;
});
function render() {
if (latestResults) {
// Do your drawing here
drawHands(latestResults);
}
requestAnimationFrame(render);
}
render();