Learn how perceptual hashing creates compact fingerprints of images that survive modifications. Build your own pHash implementation in TypeScript.

Perceptual Image Hashing - Creating Visual Fingerprints


How can we detect similar images even after modifications?

Imagine running a photo sharing platform where users upload millions of images daily. How do you detect when someone uploads the same image twice, even if they’ve cropped it, applied a filter, or saved it in a different format? Traditional cryptographic hashes like SHA-256 fail spectacularly here - change a single pixel and you get a completely different hash.

Perceptual hashing algorithms generate compact signatures (typically 64-256 bits) that represent the visual essence of an image. Unlike cryptographic hashes designed for security, perceptual hashes are designed for similarity - similar images produce similar hashes. It creates a “fingerprint” based on visual characteristics that survive common modifications.

Perceptual Hash Demo
No image loaded
No image loaded

How Perceptual Hashing Works

Most perceptual hashing algorithms follow a similar pattern:

  1. Reduce Size - Shrink image to remove high-frequency details
  2. Simplify Colors - Convert to grayscale to focus on structure
  3. Transform - Apply mathematical transformation (often DCT)
  4. Extract Features - Keep only the most significant information
  5. Generate Hash - Convert features to a binary fingerprint

The magic is in choosing transformations that preserve perceptually important features while discarding noise.

Real-World Applications

Before diving into the implementation, let’s see where perceptual hashing is used:

  • Duplicate Detection - Finding re-posts on social media
  • Copyright Protection - Identifying unauthorized use of images
  • Content Moderation - Blocking previously flagged inappropriate content
  • Similar Image Search - “Find images like this one”
  • Video Deduplication - Detecting re-uploaded videos with minor edits

Building pHash from Scratch

Let’s implement a popular perceptual hashing algorithm called pHash. We’ll start simple and gradually add sophistication.

Step 1: Define the Foundation

First, let’s establish our types and basic structure:

// Core types for our perceptual hash implementation
interface ImageData {
  width: number;
  height: number;
  data: Uint8Array; // RGBA pixel data
}

interface PHashConfig {
  hashSize?: number;      // Size of the hash in bits (default: 64)
  imageSize?: number;     // Size to resize image before hashing (default: 32)
  highFreqFactor?: number; // Factor for DCT coefficient selection (default: 4)
}

// Result of hashing includes the hash and intermediate steps for visualization
interface PHashResult {
  hash: bigint;           // The actual hash value
  hashBits: string;       // Binary string representation
  // Debug info for understanding the algorithm
  debug?: {
    resized: number[][];     // Resized grayscale image
    dct: number[][];        // DCT coefficients
    median: number;         // Median DCT value
  };
}

// Main function signature
function perceptualHash(
  image: ImageData,
  config: PHashConfig = {}
): PHashResult {
  const { hashSize = 64, highFreqFactor = 4 } = config;
  // We'll build this step by step
  return {
    hash: 0n,
    hashBits: ''
  };
}

Step 2: Image Preprocessing

The first step is reducing the image to its essential structure. We resize it to a small square to remove fine details:

// Convert RGBA image data to grayscale
function toGrayscale(image: ImageData): number[][] {
  const { width, height, data } = image;
  const grayscale: number[][] = [];
  
  for (let y = 0; y < height; y++) {
    const row: number[] = [];
    for (let x = 0; x < width; x++) {
      const idx = (y * width + x) * 4;
      // Standard grayscale conversion weights
      const gray = 0.299 * data[idx] +     // Red
                   0.587 * data[idx + 1] +  // Green
                   0.114 * data[idx + 2];   // Blue
      row.push(gray);
    }
    grayscale.push(row);
  }
  
  return grayscale;
}

// Resize image using bilinear interpolation
function resizeImage(
  pixels: number[][],
  newWidth: number,
  newHeight: number
): number[][] {
  const oldHeight = pixels.length;
  const oldWidth = pixels[0].length;
  const resized: number[][] = [];
  
  // Calculate scaling factors
  const xRatio = oldWidth / newWidth;
  const yRatio = oldHeight / newHeight;
  
  for (let y = 0; y < newHeight; y++) {
    const row: number[] = [];
    for (let x = 0; x < newWidth; x++) {
      // Map to original image coordinates
      const srcX = x * xRatio;
      const srcY = y * yRatio;
      
      // Get the four surrounding pixels
      const x0 = Math.floor(srcX);
      const x1 = Math.min(x0 + 1, oldWidth - 1);
      const y0 = Math.floor(srcY);
      const y1 = Math.min(y0 + 1, oldHeight - 1);
      
      // Calculate interpolation weights
      const wx = srcX - x0;
      const wy = srcY - y0;
      
      // Bilinear interpolation
      const value = (1 - wx) * (1 - wy) * pixels[y0][x0] +
                    wx * (1 - wy) * pixels[y0][x1] +
                    (1 - wx) * wy * pixels[y1][x0] +
                    wx * wy * pixels[y1][x1];
      
      row.push(value);
    }
    resized.push(row);
  }
  
  return resized;
}

Step 3: The Heart of pHash - Discrete Cosine Transform

The DCT is what makes pHash robust. It transforms the image into frequency space, where low frequencies represent structure and high frequencies represent details:

// 2D Discrete Cosine Transform Type II
function dct2d(matrix: number[][]): number[][] {
  const size = matrix.length;
  const result: number[][] = Array(size).fill(null)
    .map(() => Array(size).fill(0));
  
  // DCT formula constants
  const c0 = 1 / Math.sqrt(size);
  const c1 = Math.sqrt(2 / size);
  
  for (let u = 0; u < size; u++) {
    for (let v = 0; v < size; v++) {
      let sum = 0;
      
      // Sum over all pixels
      for (let x = 0; x < size; x++) {
        for (let y = 0; y < size; y++) {
          sum += matrix[y][x] * 
                 Math.cos(((2 * x + 1) * u * Math.PI) / (2 * size)) *
                 Math.cos(((2 * y + 1) * v * Math.PI) / (2 * size));
        }
      }
      
      // Apply normalization factors
      const cu = u === 0 ? c0 : c1;
      const cv = v === 0 ? c0 : c1;
      result[v][u] = cu * cv * sum;
    }
  }
  
  return result;
}

// Extract low-frequency DCT coefficients (top-left corner)
function extractLowFrequencies(dct: number[][], size: number): number[] {
  const coefficients: number[] = [];
  
  // Take top-left corner, excluding DC component (0,0)
  for (let y = 0; y < size; y++) {
    for (let x = 0; x < size; x++) {
      if (x === 0 && y === 0) continue; // Skip DC component
      coefficients.push(dct[y][x]);
    }
  }
  
  return coefficients;
}

Step 4: Generate the Binary Hash

Now we convert the DCT coefficients into a binary hash by comparing each to the median:

// Calculate median of an array
function calculateMedian(values: number[]): number {
  const sorted = [...values].sort((a, b) => a - b);
  const mid = Math.floor(sorted.length / 2);
  
  if (sorted.length % 2 === 0) {
    return (sorted[mid - 1] + sorted[mid]) / 2;
  }
  return sorted[mid];
}

// Convert DCT coefficients to binary hash
function generateHash(coefficients: number[]): bigint {
  const median = calculateMedian(coefficients);
  let hash = 0n;
  
  // Each coefficient becomes one bit
  for (let i = 0; i < coefficients.length; i++) {
    if (coefficients[i] > median) {
      // Set bit i to 1
      hash |= (1n << BigInt(i));
    }
  }
  
  return hash;
}

// Convert hash to binary string for visualization
function hashToBinaryString(hash: bigint, bits: number): string {
  return hash.toString(2).padStart(bits, '0');
}

Step 5: Complete Implementation

Let’s put it all together:

function perceptualHash(
  image: ImageData,
  config: PHashConfig = {}
): PHashResult {
  const {
    hashSize = 64,
    imageSize = 32,
    highFreqFactor = 4
  } = config;
  
  // Calculate DCT size needed for desired hash size
  const dctSize = Math.ceil(Math.sqrt(hashSize)) + 1;
  
  // Step 1: Convert to grayscale
  const grayscale = toGrayscale(image);
  
  // Step 2: Resize to small square
  const resized = resizeImage(grayscale, imageSize, imageSize);
  
  // Step 3: Apply DCT
  const dct = dct2d(resized);
  
  // Step 4: Extract low frequencies
  const coefficients = extractLowFrequencies(dct, dctSize);
  
  // Step 5: Generate hash
  const hash = generateHash(coefficients.slice(0, hashSize));
  const hashBits = hashToBinaryString(hash, hashSize);
  
  return {
    hash,
    hashBits,
    debug: {
      resized,
      dct,
      median: calculateMedian(coefficients)
    }
  };
}

Step 6: Comparing Hashes

The beauty of perceptual hashes is how we compare them. We use Hamming distance - counting bits that differ:

// Calculate Hamming distance between two hashes
function hammingDistance(hash1: bigint, hash2: bigint): number {
  // XOR shows bits that differ
  let xor = hash1 ^ hash2;
  let distance = 0;
  
  // Count set bits (Brian Kernighan's algorithm)
  while (xor !== 0n) {
    xor &= xor - 1n;
    distance++;
  }
  
  return distance;
}

// Calculate similarity percentage
function calculateSimilarity(
  hash1: bigint,
  hash2: bigint,
  hashSize: number = 64
): number {
  const distance = hammingDistance(hash1, hash2);
  return ((hashSize - distance) / hashSize) * 100;
}

// Determine if images are similar based on threshold
function areSimilar(
  hash1: bigint,
  hash2: bigint,
  threshold: number = 5
): boolean {
  return hammingDistance(hash1, hash2) <= threshold;
}

Advanced Techniques

Our implementation covers the basics, but production systems often include:

Rotation Invariance

To handle rotated images, compute hashes for multiple rotations:

function rotationInvariantHash(image: ImageData): bigint[] {
  const hashes: bigint[] = [];
  
  // Generate hashes for 0°, 90°, 180°, 270°
  for (let rotation = 0; rotation < 360; rotation += 90) {
    const rotated = rotateImage(image, rotation);
    const result = perceptualHash(rotated);
    hashes.push(result.hash);
  }
  
  return hashes;
}

Radial Hash Projection

For even better rotation invariance, use radial projections:

function radialHash(image: ImageData): bigint {
  // Convert to polar coordinates
  const polar = cartesianToPolar(image);
  
  // Apply 1D DCT to radial projections
  const projections = computeRadialProjections(polar);
  
  // Generate hash from projections
  return generateRadialHash(projections);
}

Multi-Resolution Hashing

Combine hashes at different scales for better accuracy:

function multiResolutionHash(image: ImageData): bigint[] {
  const scales = [16, 32, 64];
  return scales.map(size => {
    const config = { imageSize: size };
    return perceptualHash(image, config).hash;
  });
}

Performance Considerations

For production use, consider these optimizations:

  1. SIMD Instructions - Use WebAssembly SIMD for faster DCT
  2. GPU Acceleration - Compute multiple hashes in parallel
  3. Approximate DCT - Trade accuracy for speed with faster transforms
  4. Cached Preprocessing - Store resized versions for common sizes

Practical Usage Example

Here’s how you might use perceptual hashing in a real application:

class ImageDatabase {
  private hashes = new Map<string, bigint>();
  
  addImage(id: string, image: ImageData): void {
    const { hash } = perceptualHash(image);
    this.hashes.set(id, hash);
  }
  
  findSimilar(image: ImageData, threshold: number = 5): string[] {
    const { hash: queryHash } = perceptualHash(image);
    const similar: string[] = [];
    
    for (const [id, hash] of this.hashes) {
      if (hammingDistance(queryHash, hash) <= threshold) {
        similar.push(id);
      }
    }
    
    return similar;
  }
  detectDuplicate(image: ImageData): boolean {
    return this.findSimilar(image, 3).length > 0;
  }
}

Summary

Perceptual hashing creates robust visual fingerprints that survive common image modifications. By combining image preprocessing, frequency transforms, and binary encoding, we can reduce megabytes of image data to just 8 bytes while preserving visual similarity.

Key takeaways:

  • Use DCT to extract frequency information
  • Low frequencies capture structure, high frequencies capture noise
  • Hamming distance efficiently measures hash similarity
  • Small implementation (~200 lines) with powerful applications

Whether you’re building a photo sharing platform, copyright detection system, or content moderation tool, perceptual hashing provides an elegant solution for image similarity at scale.

References