Security
Updated
4 min read

Introducing sensitive information detection & redaction + the Arcjet LangChain integration

Detect, block, and redact PII locally without sending it to the cloud using Arcjet's sensitive information detection, plus the new integration with LangChain.

Introducing sensitive information detection & redaction + the Arcjet LangChain integration

Today we’re introducing a new core primitive in the Arcjet developer security SDK: sensitive information detection!

While input validation helps protect your app from attacks like SQL injection, handling PII presents even greater challenges. Arcjet’s Sensitive Information Detection helps you automatically prevent sensitive data leaks by blocking requests containing PII in real-time.

What if?

Imagine a user accidentally submitting their credit card details in a support form, which then gets routed to a third-party ticketing system and emailed to your support team. Suddenly, sensitive data is now scattered across systems and employees, significantly increasing your data exposure risks.

What if you’re leveraging a large language model (LLM) with a long context window for chat responses, and a user unintentionally submits sensitive data, like a credit card number? Without detection, this PII could be included in the AI’s processing and responses, violating privacy regulations.

Arcjet prevents these accidental exposures.

Arcjet Sensitive Information Detection

The new sensitive information detection rule now available in the Arcjet SDK allows you to block requests that contain any detected PII. With Arcjet, sensitive data never leaves your environment. The analysis happens locally in a secure WebAssembly sandbox, ensuring compliance with privacy regulations and minimizing latency.

Arcjet provides out-of-the-box detection for common sensitive data types like email addresses, credit card numbers, IP addresses, and phone numbers. You can also define custom detection rules to meet your specific application needs.

Here’s how you can easily set up Arcjet to detect and block email addresses submitted through a Next.js API endpoint during a GET request. This allows you to stop sensitive data before it’s processed:

// Import the Arcjet SDK
import arcjet, { sensitiveInfo } from "@arcjet/next";
import { NextResponse } from "next/server";

// Create the Arcjet client with the detection rule
const aj = arcjet({
  key: process.env.ARCJET_KEY!,
  rules: [
    // This allows all sensitive entities other than email addresses
    sensitiveInfo({
      deny: ["EMAIL"], // Will block email addresses
      mode: "LIVE", // Will block requests, use "DRY_RUN" to log only
    }),
  ],
});

// A Next.js API route demonstrating the detection feature
export async function GET(req: Request) {
  const decision = await aj.protect(req); // Performs the detection

  // Loop through the results and output them to the logs
  for (const result of decision.results) {
    console.log("Rule Result", result);
  }

  console.log("Conclusion", decision.conclusion);

  if (decision.isDenied() && decision.reason.isSensitiveInfo()) {
    return NextResponse.json(
      {
        error: "PII detected!",
      },
      { status: 400 },
    );
  }

  return NextResponse.json({
    message: "Hello world",
  });
}

If we call the API route with curl:

curl -v http://localhost:3000 --data "My email address is test@example.com"

The logs will show:

Rule Result ArcjetRuleResult {
  state: 'RUN',
  conclusion: 'DENY',
  reason: ArcjetSensitiveInfoReason {
    type: 'SENSITIVE_INFO',
    denied: [ { start: 5, end: 21, identifiedType: 'EMAIL' } ],
    allowed: []
  }
},
Conclusion ALLOW

You can inspect the decision to find out what type of PII was detected and the string position in the body. You can then deny the request, or do further processing.

Redacting PII sent to LLMs with LangChain

When handling sensitive information in user queries, balancing security and user experience is key. Blocking inputs like credit card numbers may be necessary, but for real-time interactions—such as chatbots—redacting PII while continuing the request ensures a smoother user experience.

For example, if a customer accidentally submits their credit card info in an eCommerce chatbot, instead of blocking the request, it’s better to redact the PII and provide a friendly error message. The new @arcjet/redact library (docs) makes this easy by automatically removing detected PII and integrating with LangChain’s LLMs and chat models. All redaction happens locally, so no sensitive data leaves your environment, maintaining privacy and compliance.

Here’s a simple example of how to implement Arcjet Redact in a chatbot to automatically redact sensitive information like email addresses and credit card numbers:

import {
  ArcjetRedact,
  ArcjetSensitiveInfoType,
} from "@langchain/community/chat_models/arcjet";
import { ChatOpenAI } from "@langchain/openai";

// Create an instance of another chat model for Arcjet to wrap
const openai = new ChatOpenAI({
  temperature: 0.8,
  model: "gpt-3.5-turbo-0125",
});

const arcjetRedactOptions = {
  // Specify a LLM that Arcjet Redact will call once it has redacted the input.
  chatModel: openai,

  // Specify the list of entities that should be redacted.
  // If this isn't specified then all entities will be redacted.
  entities: [
    "email",
    "phone-number",
    "ip-address",
    "custom-entity",
  ] as ArcjetSensitiveInfoType[],

  // You can provide a custom detect function to detect entities that we don't support yet.
  // It takes a list of tokens and you return a list of identified types or undefined.
  // The undefined types that you return should be added to the entities list if used.
  detect: (tokens: string[]) => {
    return tokens.map((t) =>
      t === "some-sensitive-info" ? "custom-entity" : undefined
    );
  },

  // The number of tokens to provide to the custom detect function. This defaults to 1.
  // It can be used to provide additional context when detecting custom entity types.
  contextWindowSize: 1,

  // This allows you to provide custom replacements when redacting. Please ensure
  // that the replacements are unique so that unredaction works as expected.
  replace: (identifiedType: string) => {
    return identifiedType === "email" ? "redacted@example.com" : undefined;
  },
};

const arcjetRedact = new ArcjetRedact(arcjetRedactOptions);

const response = await arcjetRedact.invoke(
  "My email address is test@example.com, here is some-sensitive-info"
);

By incorporating Arcjet Redact into your chat interface, you can ensure that all sensitive user information is removed before processing. Like the detect functionality, all redaction happens locally, meaning no sensitive data is ever sent outside your environment - ensuring privacy and compliance at every step.

Get started

Start integrating sensitive information detection in your apps today with Arcjet. Check out our full documentation and see how you can add this powerful protection in just a few lines of code.

Related articles

Does Next.js need a WAF?
Next.js
5 min read

Does Next.js need a WAF?

A WAF can protect your Next.js app from passive scanning as well as active exploitation of known vulnerabilities. If you need to be PCI DSS v4.0 compliant then a WAF is required, but what about other types of application?

Remix Security Checklist
Remix
12 min read

Remix Security Checklist

A security checklist for Remix applications: dependencies & updates, module constraints, environment variables, authentication and authorization, cross-site request forgery, security headers, validation, and file uploads.

Subscribe by email

Get the full posts by email every week.