Guide 20 min read Intermediate

Cloudflare rate limiting design guide

This guide helps teams design rate limits that reduce abuse without blocking legitimate users. It covers endpoint sensitivity, threshold selection, bot score alignment, bypasses, alerts, and staged rollout.

This guide helps teams design rate limits that reduce abuse without blocking legitimate users. It covers endpoint sensitivity, threshold selection, bot score alignment, bypasses, alerts, and staged rollout.

Topics: cloudflare, resource, cloudflare, rate, limiting, design, guide

Machine-readable context: /ai-index.json

Step by step

Implementation steps

8 steps
  1. 1

    List the endpoints that need rate limiting and why - credential abuse on login and password reset, enumeration on account-lookup and coupon APIs, scraping on search and pricing, spam on contact and signup forms, and cost or capacity pressure on expensive backend calls.

  2. 2

    For each endpoint, pick the counting key deliberately: client IP for anonymous abuse, a header or cookie for authenticated sessions, an API token or JA3/JA4 fingerprint for clients behind shared NAT, or a combination so you don't punish everyone behind one address.

  3. 3

    Write the counting expression that defines what a request 'counts as' - matching the exact host, path, method, and where relevant a body field or response status - so the limit only counts the traffic you actually want to throttle.

  4. 4

    Choose the threshold and time window per endpoint from observed traffic, sizing the limit above legitimate peak behaviour (including bursty SPAs and mobile retries) and below the abuse pattern you measured.

  5. 5

    Decide the action and its duration: log first, then managed challenge for browser flows, block or a 429 with a Retry-After for APIs, and a longer mitigation timeout for clearly abusive sources.

  6. 6

    Layer rate limiting with bot score and WAF rather than in isolation - for example only counting requests where the bot score is low, or letting verified bots and allowlisted partners bypass the counter entirely.

  7. 7

    Roll out in stages: deploy each rule in log/simulate mode, compare counted requests against real user sessions and support data, then promote to challenge or block one endpoint at a time.

  8. 8

    Wire alerting and review: notify when a rule fires far above baseline, keep the rule's key, expression, threshold, window, and action documented with an owner, and schedule re-tuning around releases and campaigns.

Risk register

Risks to control

Limiting purely by client IP throttles many legitimate users sharing one NAT, proxy, or mobile carrier gateway.

Choose a counting key that fits the endpoint - session cookie, API token, or fingerprint - and only fall back to IP where no better identifier exists.

Thresholds are guessed rather than derived from traffic, so they either never trigger or fire on normal peaks.

Set the limit and window from observed legitimate peak behaviour, leaving headroom for bursty clients and retries, and validate in log mode first.

The counting expression is too broad and counts unrelated requests, or too narrow and misses the abuse.

Scope the expression to the exact host, path, method, and where needed a body field or status code so only the targeted traffic increments the counter.

A hard block with no Retry-After breaks well-behaved API clients during a transient spike.

Return 429 with Retry-After on API paths, use managed challenge on browser paths, and reserve long blocks for sources that stay abusive.

Rate limiting and Bot Management fight each other, challenging the same request twice or blocking traffic the other control already cleared.

Design the rules together - bypass verified bots and allowlisted partners, and gate counters on bot score so the controls layer instead of conflict.

A release or marketing campaign changes legitimate request rates and the static threshold suddenly throttles real users.

Make thresholds release-aware, alert on abnormal firing, and keep a documented revert so a rule can drop to log mode quickly.

Output

Useful deliverables

  • Endpoint inventory with the abuse pattern and protection goal for each rate-limited path.
  • Counting-key decision per endpoint (IP, header, cookie, token, or fingerprint) with the reasoning.
  • Counting-expression and matching-criteria specification per rule (host, path, method, body field, status).
  • Threshold and time-window table derived from observed legitimate peak traffic with headroom noted.
  • Action plan per endpoint covering log, managed challenge, block, 429 with Retry-After, and mitigation duration.
  • Layering map showing how each rule interacts with bot score, WAF, and allowlisted bypasses.
  • Staged rollout and alerting plan with per-rule owner, documentation, and revert procedure.

Keep reading

Related resources

FAQ

Frequently asked questions

Common questions teams ask when putting this resource into practice.

What should I use as the rate-limiting key instead of just the IP address?

It depends on the endpoint. For authenticated traffic, key on a session cookie or user header so you count per user, not per gateway. For APIs, key on the token. For anonymous traffic behind shared NAT, a JA3/JA4 fingerprint or a combination is fairer than raw IP. Plain IP is the fallback when nothing better identifies the client.

How do I pick a threshold and window without blocking real users?

Measure first. Look at legitimate peak request rates for that exact endpoint - including bursty single-page apps and mobile retry behaviour - set the limit above that with headroom, and validate the rule in log mode before enforcing. A threshold pulled from a round number rather than your own traffic is the usual cause of false positives.

What is a counting expression and why does it matter?

It's the match that decides which requests increment the counter. If it's too broad it counts unrelated traffic and trips early; if it's too narrow it misses the abuse. Scoping it to the precise host, path, method - and where useful a body field or response status - is what makes a rate limit target the right requests.

Should rate limiting block, challenge, or just return 429?

Match the action to the client. Browser flows like login tolerate a managed challenge; API clients should get a 429 with Retry-After so well-behaved callers back off gracefully; sources that stay abusive get a longer block. Starting in log mode lets you confirm the choice before it affects anyone.

How does rate limiting fit with Bot Management and WAF?

They are layers, not alternatives. You can gate a rate-limit counter on a low bot score, let verified bots and allowlisted partners bypass it, and keep WAF handling payload-based attacks. Designing them together prevents the same request being challenged twice or one control undoing another.

Nanosek

Design rate limits

Nanosek can turn this resource into a practical delivery plan for your environment — with rollback planning, stakeholder alignment, and 24/7 managed operations support.

Ready to talk?

Deliver Cloudflare without surprises.

Whether you're migrating, hardening, or operating Cloudflare — Nanosek brings authorized MSP & ASDP delivery, rollback-ready cutovers, and managed operations after launch.