Rate Limiting + DDoS Protection Pattern
How to absorb traffic spikes — both benign (popular feature) and malicious (DDoS / abuse) — without breaking for legitimate users.
Rate Limiting + DDoS Protection Pattern
How to absorb traffic spikes — both benign (popular feature) and malicious (DDoS / abuse) — without breaking for legitimate users.
TL;DR (human)
Multi-layer defense: edge (CDN / WAF) absorbs L3/L4 + bulk L7; per-route rate limits stop per-user abuse; per-tenant quotas stop noisy-neighbor. Different limits for unauth vs auth, for sensitive routes (login, signup, reset). Tokens-bucket / sliding-window algorithms; consistent across services via shared store.
For agents
Defense layers
| Layer | Defends against | Tool |
|---|---|---|
| L3/L4 (network) | Volumetric attacks (SYN flood, UDP flood) | Cloud provider's DDoS service (CloudFront, Cloudflare, AWS Shield) |
| L7 (HTTP) — bulk | High-volume HTTP floods | WAF + CDN rate limit (Cloudflare, Akamai) |
| Per-route rate limit | Per-user / per-IP abuse | Application middleware (Redis-backed) |
| Per-tenant quota | Noisy neighbor | Application logic (per multi-tenant-isolation-pattern.md) |
| Backend circuit breakers | Cascading failure | Library (Polly, resilience4j) |
Edge absorbs volume cheaply. Application layer handles per-identity logic. Both required.
Rate-limit algorithms
| Algorithm | Mechanic | Pros | Cons |
|---|---|---|---|
| Fixed window | Counter per N seconds | Simple | Burst at window boundary |
| Sliding log | Timestamps; remove old | Accurate | Memory cost; slow |
| Sliding window counter | Two windows; weighted | Good balance | Approximation |
| Token bucket | Refill rate + bucket size | Bursty allowed; smooth steady | Tunable |
| Leaky bucket | Constant drain rate | Smooths bursts | No burst allowed |
Default: token bucket for user-facing; leaky bucket for cost-sensitive (downstream-rate-limited).
Configuration discipline
Per route, per identity class, set:
- Rate: requests per time window.
- Burst: short-term excess allowed.
- Identity: by IP, user, tenant, API key.
Examples:
GET /api/users/me:
per-user: 60/min; burst 10
per-ip (unauth): 100/min; burst 20
POST /auth/login:
per-ip: 5/min; burst 0 ← anti-credential-stuffing
per-username: 5/min ← anti-targeted-attack
POST /api/llm/complete:
per-tenant: 100/min (free), 1000/min (pro), 10000/min (enterprise)
per-user: 30/minSensitive routes (login, signup, password reset, OTP, payments) get tighter limits.
Per-identity choice
| Identity | Use when |
|---|---|
| User id (auth) | Default for authenticated routes |
| API key | Programmatic access |
| Tenant id | Multi-tenant; aggregate across tenant's users |
| IP address | Unauth + signup + login |
| Session id | Pre-auth flows |
Combine: a route may have multiple limits (per-user + per-tenant + per-IP); the strictest fires.
IP-only is fragile (NAT, shared corporate networks). Use IP + something else where possible.
Response when limit exceeded
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700000000
{ "code": "RATE_LIMITED", "message": "...", "hint": "Try again in 30 seconds." }Retry-After is honored by good clients; tells them when to retry. Returns code: "RATE_LIMITED" for app-level handling.
Shared store
Per-instance rate limits don't work in horizontal-scale environments. A user hitting different replicas would bypass.
Use a shared store:
- Redis with atomic ops (INCR + EXPIRE; or Lua scripts for sliding window).
- DynamoDB with conditional writes.
- Memcached (less common; eviction can break rate limits).
Cost: one Redis hop per request. Mitigate via:
- Lazy refill (compute bucket on-demand rather than per-request).
- Local approximation + periodic sync.
Login + signup specifics
Specific patterns for abuse-prone endpoints:
Login:
- Per-IP rate: 5-10/min.
- Per-username rate: 5/min (catches credential stuffing against specific accounts).
- Per-failed-attempt backoff: exponential.
- After N failures: CAPTCHA challenge or temp lockout.
- Notify user on multiple failed attempts.
Signup:
- Per-IP rate: 3-5/min.
- Email verification mandatory (no instant access).
- Phone verification for higher-risk products.
- Abuse signals (disposable email domain, VPN/Tor, signup pattern) → manual review queue.
Password reset:
- Per-email rate: 3/hour.
- Per-IP rate: 10/hour.
- Never confirm whether email exists (avoid enumeration).
- Tokens single-use; short-lived (10-30 min); IP-bound where viable.
WAF + bot protection
A managed WAF (Cloudflare, AWS WAF, Akamai) provides:
- Geo blocking (when needed).
- Bot detection (signal-based; not perfect).
- Managed rule sets (OWASP top 10 patterns).
- Bot challenge (CAPTCHA on suspicion).
- IP reputation lists.
Configure to block / challenge:
- Suspicious user-agents.
- Known bad IP lists.
- Geographic restrictions (per business needs).
Avoid: blocking entire countries by default — legitimate users get blocked. Geo restrict only when business reason.
CAPTCHA / challenge
When abuse detected, escalate before block:
- Invisible challenge (Cloudflare Turnstile, hCaptcha, reCAPTCHA v3): scoring; transparent to good users.
- Visible challenge: image puzzle; user friction.
- Magic link / 2FA: highest friction; for verified compromise scenarios.
Friction sequence: invisible → visible → block. Match severity.
Backend circuit breakers
When an upstream is failing, fail fast rather than pile up requests:
- After N consecutive failures: open circuit; immediate failure for all requests.
- After cool-down: half-open; try one request; reset on success.
Avoids:
- Thundering herd on recovery.
- Cascading failure (your service can't catch up; the next one chokes).
Libraries: Polly (.NET), resilience4j (JVM), opossum (Node).
Per-tenant abuse signals
Beyond limits, detect:
- One tenant generating > Nx the median request rate.
- One tenant's error rate suddenly spikes (broken integration).
- One tenant accessing many distinct resources rapidly (scraping signal).
- Unusual times / patterns (per
../quality/observability-pattern.md).
Each surfaces in a dashboard; on-call has a runbook for "noisy tenant".
Costs
DDoS protection isn't free:
- CDN traffic for legit users is the bulk; DDoS spike usually doesn't change much for the CDN.
- WAF rule evaluation adds latency (sub-ms typically).
- AWS Shield Advanced / Cloudflare Enterprise: tens of K USD / year for full protection.
For most products, the managed-CDN's default DDoS protection is enough. Advanced tier when you're a high-value target.
Common failure modes
- No rate limit on login. Credential stuffing succeeds. → Per-IP + per-username.
- Rate limit per-instance. Bypassed by hitting different replicas. → Shared store.
- Rate limit returns 200 with empty body. Clients retry; defeat purpose. → 429 + Retry-After.
- No backend circuit breaker. Upstream goes down; your service piles up; goes down too. → Library.
- CAPTCHA on every login. Friction kills good users. → Scoring; only when suspicious.
- Geo-block based on bad data. Real users blocked. → Geographic restrictions narrowly + via human decision.
- Per-IP limits don't account for NAT. A small office blocked because shared IP. → IP + identity hybrid.
- No alerting on rate-limit fires. Abuse goes unnoticed. → Per-route monitoring; spike alerts.
Tooling stack (typical)
| Concern | Tool |
|---|---|
| CDN + edge WAF | Cloudflare, Fastly, Akamai, AWS CloudFront + WAF |
| DDoS L3/L4 | Cloud provider native + AWS Shield Advanced |
| Application rate-limit | express-rate-limit (Node), Flask-Limiter, ASP.NET Core RateLimiting, rate-limiter-flexible |
| CAPTCHA | Cloudflare Turnstile, hCaptcha, reCAPTCHA |
| Bot detection | DataDome, PerimeterX, Cloudflare Bot Management |
| Circuit breaker | Polly, resilience4j, opossum |
| Shared store | Redis, DynamoDB |
See also
multi-tenant-isolation-pattern.md— per-tenant quotas.session-mgmt-pattern.md— login-specific limits.audit-ledger-pattern.md— rate-limit fires logged.../architecture/caching-cdn-pattern.md— CDN is the absorption layer.../quality/observability-pattern.md— rate-limit metrics.