Learn how to use WASM-SIMD by implementing a NURBS curve editor in Rust

Canvas Rendering with WebAssembly SIMD


Interactive NURBS Curve Editor

Before we dive into the technical details, let’s see what we’re building. This interactive demo showcases a NURBS curve editor powered by WebAssembly SIMD.

The curve is calculated in real-time using our Rust WASM module with SIMD optimizations, processing multiple control points in parallel for smooth performance.

TL;DR

Don’t have time to read the full article? Here’s a quick conversational summary that covers the key points:

Starting Simple: What is WebAssembly?

If you’ve been doing web development for a while, you’ve probably heard about WebAssembly (WASM). At its core, WebAssembly is a way to run code written in languages like Rust, C++, or Go in the browser at near-native speeds.

Think of it this way: JavaScript is great for many things, but when you need to do heavy number crunching - like calculating thousands of points on a curve - it can struggle. That’s where WebAssembly shines.

Taking It Further: What is SIMD?

Now, let’s add another layer. SIMD stands for “Single Instruction, Multiple Data.” Instead of adding numbers one at a time like this:

// Traditional approach - one operation at a time
result[0] = a[0] + b[0];
result[1] = a[1] + b[1];
result[2] = a[2] + b[2];
result[3] = a[3] + b[3];

SIMD lets you do this:

// SIMD approach - all four additions in one instruction!
result = simd_add(a, b);

This parallelism is particularly valuable for graphics, physics simulations, and other computation-heavy applications. WebAssembly SIMD extends the WASM instruction set with 128-bit packed SIMD operations, similar to SSE instructions in x86 processors.

A Real-World Example: NURBS Curves

To demonstrate WASM SIMD in action, I’ll walk you through implementing a NURBS (Non-Uniform Rational B-Spline) curve editor. Why NURBS? They’re widely used in computer graphics and CAD applications, requiring complex mathematical calculations that benefit greatly from SIMD optimizations.

If you’ve ever used vector graphics software like Illustrator or CAD programs, you’ve probably worked with NURBS without knowing it. They’re the math behind those smooth, adjustable curves.

Setting Up Your Development Environment

Let’s start with the basics. To work with WASM SIMD, you’ll need:

  1. Rust toolchain - We’ll use Rust to write our SIMD-optimized code
  2. wasm-pack - For building and packaging Rust-generated WebAssembly
  3. Node.js and npm - For our web application

Step 1: Install Rust

If you don’t have Rust installed:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Then add the WebAssembly target:

rustup target add wasm32-unknown-unknown

Step 2: Install wasm-pack

cargo install wasm-pack

Step 3: Set up your project structure

Create a project structure like this:

project-root/
├── nurbs_wasm/         # Rust WASM module
│   ├── .cargo/
│   │   └── config.toml # SIMD configuration
│   ├── src/
│   │   └── lib.rs      # Rust implementation
│   └── Cargo.toml      # Rust dependencies
├── web/                # Web application
│   ├── src/
│   │   └── ...         # React components
│   └── ...             # Web app configuration
├── copy-wasm.js        # Script to copy WASM files
└── package.json        # Project scripts

Enabling SIMD in Your Rust Project

Here’s where things get interesting. The key to enabling SIMD is proper configuration.

Configure Rust for WASM SIMD

Create a .cargo/config.toml file in your Rust project:

[target.wasm32-unknown-unknown]
rustflags = ["-C", "target-feature=+simd128"]

This tells Rust to enable SIMD instructions when compiling to WebAssembly.

Set up your Cargo.toml

[package]
name = "nurbs_wasm"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib", "rlib"]

[dependencies]
wasm-bindgen = "0.2.87"
js-sys = "0.3.64"
console_error_panic_hook = { version = "0.1.7", optional = true }

[profile.release]
opt-level = 3
lto = true

The crate-type = ["cdylib", "rlib"] is crucial - it tells Cargo to build a dynamic library compatible with WebAssembly.

Writing SIMD-Optimized Rust Code

Now for the fun part - let’s write some Rust code that explicitly uses SIMD instructions:

use wasm_bindgen::prelude::*;
use std::arch::wasm32::*; // Import WASM SIMD intrinsics

#[wasm_bindgen]
pub struct ControlPoint {
    x: f64,
    y: f64,
    weight: f64,
}

#[wasm_bindgen]
impl ControlPoint {
    #[wasm_bindgen(constructor)]
    pub fn new(x: f64, y: f64, weight: f64) -> ControlPoint {
        ControlPoint { x, y, weight }
    }
    
    // Getters and setters...
}

#[wasm_bindgen]
pub struct NurbsCurve {
    control_points: Vec<ControlPoint>,
    knots: Vec<f64>,
    degree: usize,
}

Using SIMD for NURBS Evaluation

Here’s where SIMD really shines. When evaluating a NURBS curve, we need to process multiple control points. Let’s use SIMD to process them in parallel:

// Important: The actual Rust WASM SIMD API uses different names!
// The correct imports and usage would be:
use core::arch::wasm32::*;

// Evaluate the NURBS curve at parameter u using SIMD
pub fn evaluate(&self, u: f64) -> Option<ControlPoint> {
    // ... setup code ...
    
    // SIMD-optimized evaluation using vectorized operations
    let mut i = 0;
    while i + 1 <= self.degree {
        let idx1 = span - self.degree + i;
        let idx2 = span - self.degree + i + 1;
        
        if idx1 < self.control_points.len() && idx2 < self.control_points.len() {
            let cp1 = &self.control_points[idx1];
            let cp2 = &self.control_points[idx2];
            
            // Create SIMD vectors for 2 control points at once
            // Note: In actual Rust, we'd use f64x2 from portable_simd or
            // construct v128 and reinterpret, like:
            let x_vec = f64x2(cp1.x, cp2.x);
            let y_vec = f64x2(cp1.y, cp2.y);
            let weight_vec = f64x2(cp1.weight, cp2.weight);
            let basis_vec = f64x2(basis[i], basis[i + 1]);
            
            // SIMD multiplication - process 2 points in parallel!
            let weighted_basis_vec = f64x2_mul(basis_vec, weight_vec);
            let x_contrib = f64x2_mul(weighted_basis_vec, x_vec);
            let y_contrib = f64x2_mul(weighted_basis_vec, y_vec);
            
            // Extract and accumulate results
            numerator_x += f64x2_extract_lane::<0>(x_contrib) + 
                          f64x2_extract_lane::<1>(x_contrib);
            numerator_y += f64x2_extract_lane::<0>(y_contrib) + 
                          f64x2_extract_lane::<1>(y_contrib);
            denominator += f64x2_extract_lane::<0>(weighted_basis_vec) + 
                          f64x2_extract_lane::<1>(weighted_basis_vec);
        }
        
        i += 2;
    }
    
    // Handle remaining control point if odd number
    // ... handle remainder ...
}

Understanding WASM SIMD Intrinsics

The key SIMD functions we’re using:

  • f64x2(a, b) - Creates a 128-bit SIMD vector containing two 64-bit floats
  • f64x2_mul(a, b) - Multiplies two f64x2 vectors element-wise in parallel
  • f64x2_add(a, b) - Adds two f64x2 vectors element-wise in parallel
  • f64x2_extract_lane::<N>(v) - Extracts the Nth element from the vector

By processing two control points at once, we’re effectively doubling our throughput for these mathematical operations!

How Rust SIMD Maps to WebAssembly Instructions

Here’s what actually happens when our Rust code compiles to WebAssembly:

// Rust code
let x_vec = f64x2(cp1.x, cp2.x);
let y_vec = f64x2(cp1.y, cp2.y);
let result = f64x2_mul(x_vec, y_vec);

Translates to these WASM instructions:

;; Load two f64 values and create a v128 vector
local.get $cp1_x
local.get $cp2_x
f64x2.splat        ;; Actually uses f64x2.make or similar
v128.store         ;; Store as v128 type

;; Multiply two v128 vectors
local.get $x_vec   ;; v128 type
local.get $y_vec   ;; v128 type
f64x2.mul         ;; SIMD multiplication

;; Extract lanes
i32.const 0
f64x2.extract_lane ;; Get first element
i32.const 1
f64x2.extract_lane ;; Get second element

The v128 Type: WASM’s SIMD Foundation

WebAssembly SIMD introduces a new 128-bit value type called v128. This can be interpreted as:

  • 2 × f64 (what we’re using)
  • 4 × f32
  • 4 × i32
  • 8 × i16
  • 16 × i8

Our NURBS implementation uses f64x2 because we need double precision for accurate curve calculations.

Actual Rust WASM SIMD Implementation

In reality, Rust’s WASM SIMD support is still evolving. Here’s a more accurate example using the current stable API:

use core::arch::wasm32::*;

// Create v128 vectors from f64 values
unsafe {
    // Method 1: Using f64x2_splat and replace_lane
    let mut x_vec = f64x2_splat(cp1.x);
    x_vec = f64x2_replace_lane::<1>(x_vec, cp2.x);
    
    // Method 2: Using u64x2 and transmute (more common)
    let x_vec = u64x2(
        cp1.x.to_bits(),
        cp2.x.to_bits()
    );
    
    // Perform operations
    let result = f64x2_mul(x_vec, y_vec);
    
    // Extract values
    let val0 = f64x2_extract_lane::<0>(result);
    let val1 = f64x2_extract_lane::<1>(result);
}

Or using the experimental portable_simd feature (nightly Rust):

#![feature(portable_simd)]
use std::simd::f64x2;

let x_vec = f64x2::from_array([cp1.x, cp2.x]);
let y_vec = f64x2::from_array([cp1.y, cp2.y]);
let result = x_vec * y_vec;
let [val0, val1] = result.to_array();

Available WASM SIMD Instructions for f64x2

The actual WASM instructions available for f64x2 operations include:

;; Construction
f64x2.splat          ;; Create vector with same value in both lanes
f64x2.replace_lane   ;; Replace one lane with a new value

;; Arithmetic
f64x2.abs           ;; Absolute value
f64x2.neg           ;; Negation
f64x2.sqrt          ;; Square root
f64x2.add           ;; Addition
f64x2.sub           ;; Subtraction
f64x2.mul           ;; Multiplication
f64x2.div           ;; Division
f64x2.min           ;; Minimum
f64x2.max           ;; Maximum

;; Comparison
f64x2.eq            ;; Equal
f64x2.ne            ;; Not equal
f64x2.lt            ;; Less than
f64x2.gt            ;; Greater than
f64x2.le            ;; Less than or equal
f64x2.ge            ;; Greater than or equal

;; Lane operations
f64x2.extract_lane  ;; Extract one f64 from vector
f64x2.replace_lane  ;; Replace one f64 in vector

Building Your WASM Module

With your Rust code ready, let’s build it:

Create build scripts

In your package.json:

{
  "scripts": {
    "build:wasm": "cd nurbs_wasm && wasm-pack build --target web",
    "copy:wasm": "node copy-wasm.js",
    "build": "npm run build:wasm && npm run copy:wasm && cd web && npm run build",
    "dev": "npm run build:wasm && npm run copy:wasm && cd web && npm run dev"
  }
}

Create a script to copy WASM files

Create copy-wasm.js to move the built files to your web app:

import { copyFileSync, mkdirSync, existsSync } from 'node:fs';
import { join, dirname } from 'node:path';
import { fileURLToPath } from 'node:url';

// Get the directory of the current module
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);

// Define source and destination paths
const sourceDir = join(__dirname, 'nurbs_wasm', 'pkg');
const destDir = join(__dirname, 'web', 'nurbs_wasm', 'pkg');

// Create destination directory if it doesn't exist
if (!existsSync(destDir)) {
  mkdirSync(destDir, { recursive: true });
}

// Files to copy
const files = [
  'nurbs_wasm.js',
  'nurbs_wasm_bg.wasm',
  'nurbs_wasm.d.ts',
  'nurbs_wasm_bg.wasm.d.ts',
];

// Copy each file
for (const file of files) {
  const sourcePath = join(sourceDir, file);
  const destPath = join(destDir, file);
  
  try {
    copyFileSync(sourcePath, destPath);
    console.log(`Copied ${file} to ${destDir}`);
  } catch (error) {
    console.error(`Error copying ${file}:`, error);
  }
}

Build the WASM module

npm run build:wasm

This compiles your Rust code to WebAssembly with SIMD optimizations enabled.

Integrating WASM with Your Web Application

Now let’s use our WASM module in a web app. I’ll use React, but the principles apply to any framework.

Configure your bundler

If using Vite, configure it to handle WASM:

// vite.config.ts
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
import wasm from 'vite-plugin-wasm';

export default defineConfig({
  plugins: [
    react(),
    wasm(),
  ],
  build: {
    target: 'esnext',
  },
  optimizeDeps: {
    exclude: ['nurbs_wasm'],
  },
});

Use the WASM module in React

import { useEffect, useState } from 'react';
import initWasm, { ControlPoint, NurbsCurve } from '../nurbs_wasm/pkg/nurbs_wasm';

function App() {
  const [wasmLoaded, setWasmLoaded] = useState(false);
  const [curve, setCurve] = useState<NurbsCurve | null>(null);
  
  // Initialize WASM module
  useEffect(() => {
    async function loadWasm() {
      try {
        await initWasm();
        const initialCurve = new NurbsCurve(3);
        setWasmLoaded(true);
        setCurve(initialCurve);
      } catch (error) {
        console.error('Failed to load WASM module:', error);
      }
    }
    
    loadWasm();
  }, []);
  
  // Use the WASM module to generate curve points
  const generateCurvePoints = () => {
    if (!curve) return [];
    
    try {
      return curve.generate_points(100);
    } catch (error) {
      console.error('Error generating curve points:', error);
      return [];
    }
  };
  
  // Rest of your component...
}

Running Your Application

With everything set up:

npm run dev

This will:

  1. Build the Rust WASM module with SIMD optimizations
  2. Copy the WASM files to your web application
  3. Start the development server

Performance Benefits

So, was all this setup worth it? Let’s look at what we’re actually optimizing:

Where SIMD Makes a Difference

  1. Control Point Processing: Instead of processing one control point at a time, we process 2 in parallel using f64x2 vectors
  2. Basis Function Calculation: The Cox-de Boor recursion can process multiple basis functions simultaneously
  3. Array Conversion: Converting points to flat arrays processes 2 points at once

Real Performance Gains

In our NURBS implementation:

  • Scalar version: Processes 1 control point per iteration
  • SIMD version: Processes 2 control points per iteration

For a curve with 100 control points evaluated at 1000 positions:

  • Without SIMD: ~100,000 individual operations
  • With SIMD: ~50,000 SIMD operations (each doing 2x work)

This can result in performance improvements of 1.5-2x for the core evaluation loop. While not quite the theoretical 2x speedup (due to setup overhead and remainder handling), it’s still a significant improvement for computationally intensive operations.

Try It Yourself!

In the demo above, try these experiments to see SIMD in action:

  1. Stress Test: Add 20-30 control points and set resolution to maximum (500). The curve still updates smoothly thanks to SIMD optimization.

  2. Weight Manipulation: Select points and adjust their weights. Notice how the curve responds instantly even with many points.

  3. Complex Shapes: Load the circle preset and increase the degree. The mathematics behind this involves heavy computation, all handled efficiently by our SIMD implementation.

Browser Compatibility

Before you ship to production, note that WASM SIMD is supported in:

  • Chrome/Edge 91+
  • Firefox 90+
  • Safari 16.4+

For older browsers, you’ll want to provide a fallback or use feature detection:

// Feature detection for WASM SIMD
async function loadWasmModule() {
  // Check if SIMD is supported
  const simdSupported = WebAssembly.validate(new Uint8Array([
    0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00,
    0x01, 0x05, 0x01, 0x60, 0x00, 0x01, 0x7b, 0x03,
    0x02, 0x01, 0x00, 0x0a, 0x0a, 0x01, 0x08, 0x00,
    0x41, 0x00, 0xfd, 0x0f, 0xfd, 0x62, 0x0b
  ]));
  
  if (simdSupported) {
    // Load SIMD-optimized version
    return import('./nurbs_wasm_simd');
  } else {
    // Fall back to non-SIMD version
    return import('./nurbs_wasm_scalar');
  }
}

Key Takeaways

  1. SIMD requires explicit usage - Simply enabling SIMD flags isn’t enough; you need to use SIMD intrinsics
  2. Process data in batches - SIMD works best when you can process multiple data points together
  3. Handle remainders - Real-world data isn’t always perfectly divisible by your SIMD vector size
  4. Measure performance - SIMD adds complexity, so make sure the performance gain justifies it

Wrapping Up

WebAssembly SIMD brings powerful parallel processing to the web. In our NURBS implementation, we used explicit SIMD intrinsics to:

  • Process 2 control points simultaneously with f64x2 vectors
  • Parallelize multiplications and additions in the curve evaluation
  • Optimize array conversions for better throughput

The NURBS curve editor demonstrates how WASM SIMD can provide smooth, responsive performance even for complex mathematical operations. While the compiler can sometimes auto-vectorize code, explicit SIMD usage gives you precise control over parallelization.

Remember: start simple, get the basics working, measure performance, then optimize where it matters. Happy coding!