Arcjet’s framework native SDK helps developers implement security features directly in code. Bot protection is our most popular feature so far, which makes sense given the concern around AI services scraping content from websites.
Your site has probably already been used as part of the training datasets for many existing large language models, but that doesn’t mean you need to continue to allow them!
And if you’re not that bothered about AI, maybe you just want to clean up your website analytics reports, stop form submission spam, and prevent other types of content scraping. That's what bot detection is for.
Today we’re announcing new bot detection functionality that makes it easier to protect your site from specific categories of bot, including AI scrapers. We're also announcing our open source bot list that powers the core of the bot detection built into our SDK.
Blocking specific bot categories
Each identified bot is grouped into a top level category that allows you to allow or deny multiple bots in one go. Whether that's internet achivers, social preview links or monitoring agents, you can easily allow or deny whole groups from our bot list.
For example, if you wanted to allow all search engines, social link preview bots, and curl requests then you can add the CATEGORY:PREVIEW and CATEGORY:SEARCH_ENGINE categories to your rules, plus the specific identifier for curl:
Blocking AI bots
Using CATEGORY:AI you can create a rule to detect AI scrapers and customize the list to allow specific ones.
For example, if you wanted to block all AI bots, but make an exception for Perplexity (because it’s more like a search engine), you could set up a rule like this:
When returning a decision, you can inspect both the category match as well as the specific bot identifier that was detected. This means you can apply different results to different bots.
For example, let’s say you have a commercial agreement with OpenAI to use your content so you want to allow their crawler, but deny all others. You could customize the response logic as follows:
The great thing about defining these rules in code is that you can also test them locally. Use curl with a different user agent to check the response is as expected:
curl -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)" http://localhost:3000
Local testing allows you to confirm that bots get the desired response and that you don’t accidentally block legitimate users. This can be incorporated into automated integration tests to run as part of your test suite. See our Testing docs for more information.
Open source list of bots
Understanding which bots and AI scrapers can be detected and how they’re identified is important to create accurate rules. You want to ensure that only the bots you want to deny access are actually blocked.
We’ve contributed new bot identifiers and some general tidying upstream whilst forking it to add our own identifiers. These additions are used in the Arcjet SDK to provide auto-complete and type checking for valid bot names.
This is part of our basic bot detection functionality which uses the user agent to identify well behaving bots. Contributions are welcome!
New bot functionality available now
This new functionality is available to all users for free today! Sign up now to get going with Arcjet’s security SDK for developers.
Nosecone is an open source library to set security headers like Content Security Policy (CSP) and HTTP Strict Transport Security (HSTS) on Next.js, SvelteKit, and other JavaScript frameworks using Bun, Deno, or Node.js. Security headers as code.
Server actions are an elegant way to handle simple functions for common actions like form submissions, but they're a public API so you still need to consider security.