Low latency global routing with AWS Global Accelerator

Arcjet performs real-time security analysis in the critical path of API and authentication flows. That means latency isn’t just a consideration - it’s a core design constraint.

To meet our end-to-end p50 latency SLA of 20–30ms: we deploy globally, use persistent HTTP/2 connections, and rely on AWS's network to ensure routing to the nearest healthy region.

A core component of our architecture is AWS Global Accelerator. This service routes traffic through a set of Anycast IPs via distributed AWS edge locations (points of presence) to the closest healthy AWS region, utilizing AWS's private global network for more stable and lower-latency pathways compared to the public internet.

This post explains the details of how we use AWS Global Accelerator and other AWS services to achieve our SLA targets. While our context is delivering a low-overhead security product for developers, the principles and benefits discussed are applicable to any mission-critical, latency-sensitive global application.

The challenge - request analysis in the hot path

Arcjet is delivered as an SDK so it can be tightly integrated into the logic of the application and benefit from the full request context. Security rules can be adjusted dynamically based on user and session characteristics, and the results can be incorporated into how the application behaves.

Everything is downstream of latency. That’s why Arcjet takes a local-first approach: rules are evaluated in-process via a WebAssembly module bundled in the SDK. Many decisions can complete within 1-2ms, but others require a network call, such as when our dynamic IP reputation database is used as one of the security signals.

In cases where our cloud decision API is involved, we set ourselves a p50 response time goal of 20ms which leaves 5-10ms for the network round trip so that we can hit our end to end latency goal of 20-30ms.

This goal applies globally. Developers deploy their applications everywhere, so centralizing our API is not an option.

Architecture diagram showing the request being processed by the Arcjet SDK within the application environment. An Arcjet API call may be used to enrich the decision. Read more about the Arcjet Architecture.

The solution - Global Accelerator + multi-region

Arcjet’s cloud decision API is written in Go and provides both JSON and gRPC interfaces to support different environments, with gRPC preferred due to the more optimized Protocol Buffers message format.

Our cloud decision API is containerized and deployed across multiple availability zones within each AWS region using AWS Elastic Kubernetes Service (EKS), fronted by an AWS Application Load Balancer (ALB). The ALB distributes incoming API traffic across our container instances; its cross-zone load balancing capability ensures that if one availability zone experiences an issue, traffic is automatically routed to instances in the other healthy zones within that region, enhancing resilience.

From the developer’s perspective, Arcjet is a single endpoint. Behind the scenes, Global Accelerator ensures each request is routed to the closest healthy AWS region using latency-based routing and active health checks. This happens automatically - no configuration or region awareness required from the developer.

AWS Global Accelerator uses a global network of 119 Points of Presence in 94 cities across 51 countries (source).

Application to edge to region

Setting up a new connection is often the slowest part of communicating with our API. Establishing a TCP connection and performing a TLS handshake takes multiple round trips, adding many milliseconds to each new request. This can quickly eat into our 5-10ms network round trip budget and is particularly problematic in serverless environments where cold starts are common.

To mitigate this our SDK establishes a persistent HTTP/2 connection to our API, allowing multiple requests to be multiplexed over a single, long-lived connection. Global Accelerator helps minimize the round trip time because the initial TLS handshake can be completed by the closest network edge, which is usually much closer than the AWS region serving the request.

The network path between the edge location and the AWS region uses the AWS global network, which is much more optimized compared to routing over the public internet. Global Accelerator helps eliminate network jitter and unpredictable routing by using AWS’s private backbone between edge locations and regions. This stability is critical for maintaining consistent performance, especially in bursty serverless environments.

We run in >10 AWS regions, launching more based on customer demand. Each regional deployment is completely independent, a key design choice for maximizing availability and fault isolation, orchestrated seamlessly by AWS Global Accelerator's intelligent routing. If traffic is re-routed then this happens within AWS’s network without requiring our SDK client to reconnect and avoiding a cold start.

Performance in practice

We measure both internal processing latency and end-to-end cold start latency from the SDK’s perspective. Over the last 7 days, the median internal API response time across all regions was 12ms (p95: 20ms).

In a full cold start scenario where a new connection is initiated by the Arcjet SDK, we recorded a p50 of 25ms from request initiation to response delivery - including 2ms for TCP and 6ms for TLS handshakes.

Conclusions

Low latency isn’t just a feature - it’s fundamental to Arcjet’s design for real-time application security. By leveraging AWS’s global edge network and services like Global Accelerator, we offload the hardest parts of distributed networking and stay focused on building developer-first security features.

If you're building latency-sensitive APIs, especially in a multi-region or serverless world, Global Accelerator can probably help.

Low latency global routing with AWS Global Accelerator

The challenge - request analysis in the hot path

The solution - Global Accelerator + multi-region

Application to edge to region

Performance in practice

Conclusions

Related articles

How to differentiate between DoS attacks and large volume customer traffic

Minimizing latency for humans & machines

Subscribe by email