Vercel AI TypeScript SDK: Privacy Middleware

April 25, 2026 by thegreataxios

The Problem: PII Leaks Through Abstractions

Your AI application uses the Vercel AI SDK. It's clean — generateText, streamText, a few lines of code, done. But every user message flows through your system intact:

"I'm Alice Smith from Acme Corp. My email is [email protected] and my phone is 555-0123."

That string hits your:

Logs — Sentry, Datadog, whatever you're using
Context windows — passed to the LLM, stored in conversation history
Tool calls — maybe forwarded to a CRM or booking API
Error traces — when something breaks, the full message is in the stack

One misconfigured logger, one debug endpoint left open, one analytics pipeline you forgot about — and you've got a data breach waiting to happen.

In production AI systems, PII isn't a compliance checkbox — it's a systemic vulnerability that travels through every abstraction layer.

The Solution: Middleware That Redacts

ai-sdk-privacy-filter-middleware is a Vercel AI SDK [1] middleware that wraps any language model and automatically redacts PII from messages. It uses OpenAI's privacy-filter model [2] running locally via Transformers.js. [3]

No API calls. No cloud dependency. The 1.5B parameter model runs in your process — browser or Node.js.

How It Works

Model Architecture Note: The privacy-filter model uses a sparse Mixture-of-Experts (MoE) architecture with 128 experts and top-4 routing per token. [3] While the full model is ~1.5GB, only ~50M parameters are active per forward pass — inference is fast once loaded.

Wrap your model with the middleware:

import { wrapLanguageModel } from 'ai';
import { createOpenRouter } from '@openrouter/ai-sdk-provider';
import { privacyFilterMiddleware } from 'ai-sdk-privacy-filter-middleware';
 
const openrouter = createOpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
 
const model = wrapLanguageModel({
  model: openrouter('openrouter/free'),
  middleware: privacyFilterMiddleware(),
});

Now every generateText or streamText call automatically:

Detects PII in user/system messages
Replaces it with typed placeholders ([PERSON_1], [EMAIL_1])
Restores original values in the LLM's response

User sends:  "I'm Alice, my email is [email protected]"
LLM sees:    "I'm [PERSON_1], my email is [EMAIL_1]"
LLM replies: "Hello [PERSON_1], I'll contact [EMAIL_1]"
User sees:   "Hello Alice, I'll contact [email protected]"

The LLM never sees the real PII. Your logs only contain placeholders. Compliance becomes architectural.

What It Detects

The model catches 8 categories of PII:

Type	Placeholder	Example
`private_person`	`[PERSON_N]`	Alice Smith
`private_email`	`[EMAIL_N]`	`[email protected]`
`private_phone`	`[PHONE_N]`	`+1-555-0123`
`private_address`	`[ADDRESS_N]`	123 Main St
`private_date`	`[DATE_N]`	1990-01-15
`private_url`	`[URL_N]`	`https://example.com`
`account_number`	`[ACCOUNT_N]`	1234-5678-9012
`secret`	`[SECRET_N]`	API keys, passwords

The Middleware Pattern

The Vercel AI SDK's middleware system is the right abstraction for this. It sits between your code and the LLM — exactly where redaction belongs.

Your Code ──► transformParams (detect + redact PII)
                        │
                        ▼
               LLM sees redacted text
                        │
                        ▼
               LLM response (with placeholders)
                        │
                        ▼
          wrapGenerate/wrapStream (unredact placeholders)
                        │
                        ▼
               User sees original PII restored

The middleware implements three hooks:

transformParams — redacts user and system messages before they reach the LLM
wrapGenerate — unredacts placeholders in non-streaming responses
wrapStream — unredacts placeholders in streaming responses via a transform stream

Only user and system messages are redacted. Assistant and tool messages pass through untouched — the LLM never outputs real PII, only placeholders that get restored.

Why Local Inference Matters

You could send messages to a cloud PII detection API. Many services offer this. But now you have:

Network latency — every request adds 100-500ms
Data residency issues — user data leaves your infrastructure
Rate limits and costs — per-request pricing adds up fast
Availability dependency — their downtime is your downtime

Running the model locally via Transformers.js eliminates all of this. The model loads once (~1.5GB), then inference runs in milliseconds on CPU (WASM) or GPU (WebGPU).

Runtime Support

Runtime	Device	Status
Node.js	WASM	Supported
Browser	WebGPU	Supported
Browser	WASM	Supported (fallback)

Auto-detects the best available device. WebGPU when available, WASM fallback otherwise.

Warning: WebGPU does not work in Safari. Use WASM fallback or deploy server-side for Safari users.

Configuration

import { createOpenRouter } from '@openrouter/ai-sdk-provider';
 
const openrouter = createOpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
 
const model = wrapLanguageModel({
  model: openrouter('openrouter/free'),
  middleware: privacyFilterMiddleware({
    // Only redact specific entity types (default: all)
    entityTypes: ['private_person', 'private_email', 'private_phone'],
 
    // Minimum confidence score (default: 0.8)
    minScore: 0.9,
 
    // Whether to unredact LLM responses (default: true)
    redactResponses: true,
 
    // Custom placeholder format
    placeholderFormat: (type, n) => `<<${type}_${n}>>`,
 
    // Device override ('wasm' | 'webgpu')
    device: 'webgpu',
 
    // Model loading progress callback
    onProgress: ({ status, progress }) => {
      console.log(status, progress);
    },
  }),
});

Eager Initialization

The model loads lazily on first use. First request will be slower (~5-15s depending on hardware) while the model downloads and initializes. To pre-load:

import { createPrivacyFilter } from 'ai-sdk-privacy-filter-middleware';
import { createOpenRouter } from '@openrouter/ai-sdk-provider';
 
const openrouter = createOpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
const middleware = await createPrivacyFilter({ device: 'webgpu' });
 
const model = wrapLanguageModel({
  model: openrouter('openrouter/free'),
  middleware,
});

Multi-Provider Support

Works with any AI SDK provider:

// OpenRouter (default example)
import { createOpenRouter } from '@openrouter/ai-sdk-provider';
const openrouter = createOpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
 
// OpenAI
import { openai } from '@ai-sdk/openai';
 
// Anthropic, Gemini, Cohere, etc.
// Any provider that returns the standard AI SDK interface

The middleware is provider-agnostic — it operates on messages, not model implementations.

Browser vs Server: Where to Run It

The middleware works in both environments, but the trade-offs differ.

Architecture Note: The privacy-filter model is a sparse Mixture-of-Experts (MoE) with 128 experts. [2] Only ~50M parameters activate per token, making inference lightweight despite the ~1.5GB download size.

Environment	Device	Model Size	Best For
Browser	WebGPU (or WASM fallback)	Downloaded per user	Client-side apps, data never leaves the device
Server	WASM	Loaded once, shared across requests	API routes, guaranteed availability, no per-user download

Warning: WebGPU is not supported in Safari. Use WASM fallback or server-side deployment for Safari users.

Browser (WebGPU)

Running in the browser means:

Zero server cost — model inference happens on the user's GPU
Maximum privacy — PII never leaves the device, not even to your server
Per-user download — every user downloads ~1.5GB on first use
Hardware variance — WebGPU support varies; WASM fallback is slower

Ideal for: Chat UIs, client-only apps, compliance requirements that demand on-device processing.

Server (Node.js + WASM)

Running on the server means:

Predictable performance — consistent hardware, no WebGPU variability
No per-user overhead — model loads once, serves all requests
Standard deployment — works on any Node.js host (Vercel, Railway, etc.)
Slightly slower inference — WASM vs WebGPU, but still milliseconds

Ideal for: API routes, serverless functions, applications where you control the infrastructure.

Both work. Browser for privacy guarantees; server for operational consistency.

Get Started

Repository: github.com/TheGreatAxios/ai-sdk-privacy-filter-middleware

Install:

npm install ai-sdk-privacy-filter-middleware

Quick start:

import { generateText, wrapLanguageModel } from 'ai';
import { createOpenRouter } from '@openrouter/ai-sdk-provider';
import { privacyFilterMiddleware } from 'ai-sdk-privacy-filter-middleware';
 
const openrouter = createOpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
 
const model = wrapLanguageModel({
  model: openrouter('openrouter/free'),
  middleware: privacyFilterMiddleware(),
});
 
const result = await generateText({
  model,
  prompt: "My name is Alice Smith and my email is [email protected]",
});
 
// PII is redacted before reaching the LLM
// Response placeholders are restored before you see them
console.log(result.text);

Validating Redaction with Debug Logging

To verify the middleware is working correctly, use the debug callbacks to inspect what's happening at each stage:

import { generateText, wrapLanguageModel } from 'ai';
import { createOpenRouter } from '@openrouter/ai-sdk-provider';
import { privacyFilterMiddleware } from 'ai-sdk-privacy-filter-middleware';
 
const openrouter = createOpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
 
const model = wrapLanguageModel({
  model: openrouter('openrouter/free'),
  middleware: privacyFilterMiddleware({
    entityTypes: ['private_person', 'private_email'],
    minScore: 0.8,
    redactResponses: true,
    placeholderFormat: (type, n) => `<<${type}_${n}>>`,
    onRedact: ({ original, redacted, entities }) => {
      console.log('[onRedact] original:', original);
      console.log('[onRedact] redacted:', redacted);
      console.log('[onRedact] entities:', JSON.stringify(entities, null, 2));
    },
    onUnredact: ({ raw, unredacted }) => {
      console.log('[onUnredact] raw:', raw.slice(0, 200));
      console.log('[onUnredact] unredacted:', unredacted.slice(0, 200));
    },
  }),
});
 
const result = await generateText({
  model,
  prompt: "My name is Alice Smith and my email is [email protected]",
});
 
console.log('=== FINAL OUTPUT ===');
console.log(result.text);

Example output:

[onRedact] original: My name is Alice Smith and my email is [email protected]
[onRedact] redacted: My name is <<PERSON_1>> and my email is <<EMAIL_1>>
[onRedact] entities: [
  {
    "entity_group": "private_person",
    "score": 0.9999969899654388,
    "word": " Alice Smith"
  },
  {
    "entity_group": "private_email",
    "score": 0.9995789726575216,
    "word": " [email protected]"
  }
]
[onUnredact] raw: Hello <<PERSON_1>>, I've noted your email <<EMAIL_1>>
[onUnredact] unredacted: Hello Alice Smith, I've noted your email [email protected]
=== FINAL OUTPUT ===
Hello Alice Smith, I've noted your email [email protected]

This confirms the LLM only saw placeholders, never the real PII. The onRedact callback shows exactly what entities were detected and their confidence scores, while onUnredact demonstrates the placeholder-to-original mapping being restored.

When to Use This

Use browser deployment when:

You're building a client-side chat interface
PII must never reach your server (maximum privacy)
Users have modern browsers with WebGPU support
You can tolerate the per-user model download

Use server deployment when:

You're building API routes or serverless functions
You need predictable, consistent performance
You want to avoid pushing 1.5GB to every user
You're already running Node.js infrastructure

Don't use this when:

You need centralized redaction across multiple languages — use privacy-python-server instead
You can't afford 1.5GB memory per instance (browser or server)

Sources

Vercel, "AI SDK" documentation. https://ai-sdk.dev
OpenRouter, "OpenRouter AI SDK Provider" documentation. https://openrouter.ai/docs
OpenAI, "privacy-filter" model card, HuggingFace. https://huggingface.co/openai/privacy-filter
HuggingFace, "Transformers.js" documentation. https://huggingface.co/docs/transformers.js

Sawyer Cutler is VP Developer Success at SKALE and actively building AI systems and agents.

𝕏