Rate Limiting and Throttling in API Gateway

Back to: Microservices using ASP.NET Core Web API Tutorials

Rate Limiting and Throttling in API Gateway

In a modern distributed system such as a Microservices Architecture, your API Gateway serves as the entry point to dozens of microservices, including UserService, OrderService, and ProductService. When hundreds or thousands of users, mobile apps, browser tabs, or even bots try to hit your APIs simultaneously, it creates Traffic Overload.

If one client sends too many requests in a short period, by accident or on purpose, your system can crash, slow down, or become unavailable to other users. That’s where Rate Limiting and Throttling come into play. These two techniques help your API Gateway act as a smart traffic controller, ensuring backend services remain healthy, available, and fairly shared.

What is Rate Limiting?

Rate Limiting means defining how many requests a client is allowed to make in a fixed period of time. For example, you might decide that every client can make only 100 requests per minute. If a user or application tries to go beyond that limit, the Gateway will simply block further requests for that minute and send back a clear response:

HTTP 429 – Too Many Requests

This mechanism ensures that no single user can consume all available resources and that everyone gets a fair share of system capacity.

You can think of Rate Limiting like a speed limit on a highway: if you drive too fast (send too many requests), you get a penalty (the 429 error). It’s a strict rule; once you exceed the limit, you must wait until the next time window.

What is Throttling?

Throttling is slightly different. Instead of blocking requests immediately, the Gateway controls traffic flow by slowing it or temporarily queuing additional requests. This approach is useful during short-term traffic spikes, for instance, when a flash sale starts, and everyone clicks “Buy Now” at the same moment.

Throttling behaves more like a traffic cop managing a crowded junction. When too many cars (requests) arrive, the cop doesn’t close the road completely. They slow the flow, allowing cars to pass one by one to keep traffic moving safely. The API Gateway does the same. It smooths out sudden surges without rejecting every extra request.

Why Both Are Important

Rate Limiting and Throttling aren’t just security measures; they’re essential tools for maintaining API health and stability. Together, they:

Prevent Server Overload and Downtime,
Keep the system Responsive Under High Traffic,
Ensure Fair Usage among all clients, and
Protect services from Malicious Bots or Accidental Misuse.

These two concepts are not just about security; they are essential tools to maintain API health, fairness, and reliability in any serious microservices system.

Why Do We Need Rate Limiting and Throttling?

In a Microservices Architecture, multiple clients, web apps, mobile apps, partner systems, and third-party integrations, all send API requests simultaneously. While that’s a sign of high engagement, it can become dangerous when a single client sends too many requests in a short period.

This can happen in several ways:

A developer writes an infinite loop that keeps calling your API.
A user refreshes the app repeatedly while waiting for a response.
A competitor or bot tries to overload your service intentionally.
A sale or promotional event causes a sudden traffic spike.

Without proper controls, these situations can lead to:

Service Overload or Crashes: Your backend servers get overwhelmed and stop responding.
Increased Latency (Response Time): One user’s heavy load slows down response times for everyone else.
High Infrastructure Costs: Servers scale up rapidly due to unregulated load.
Complete Downtime: Worst-case scenario, your entire system goes offline.

This is where Rate Limiting and Throttling come into play.

Rate Limiting ensures Fair Usage by setting limits Per User, IP Address, API Key, Client ID, or Authentication Token. Each client is given a fair quota, for example, 100 requests per minute.

Throttling ensures Smooth Traffic Flow. Instead of rejecting every extra request, it queues or delays them slightly. This way, your services don’t crash under pressure, and clients experience predictable, stable responses even during heavy load.

Together, they protect your system from:

DDoS (Denial of Service) attacks, where attackers flood your APIs with requests.
Accidental Overuse occurs when genuine users unintentionally send too many requests.
Sudden Traffic Surges, like during flash sales or product launches.
System Instability, ensuring backend services remain healthy and responsive.

In short, Rate Limiting and Throttling keep your APIs safe, scalable, and sustainable.

How Does Rate Limiting and Throttling Work?

Let’s break the process down step by step to understand how the API Gateway handles incoming traffic.

Step 1 – Receiving Requests

Every incoming request first passes through the API Gateway, which acts as the single-entry point for all clients.

Step 2 – Identifying the Client

The Gateway identifies the requester. It uses attributes like:

IP Address (for anonymous clients)
User ID from JWT Token (for logged-in users)

This helps the system track each client’s activity separately.

Step 3 – Counting Requests

The Gateway keeps a counter of the number of requests that the client has made in a specific period (e.g., the last 60 seconds). This count can be stored in memory or in a distributed cache, such as Redis, if multiple gateway servers are running.

Step 4 – Applying the Limit

If the client’s request count is within the allowed limit, the Gateway forwards the request to the appropriate microservice (like ProductService or PaymentService). But if the limit is exceeded, one of two things happens:

Rate Limiting: The Gateway blocks the request immediately and responds with

HTTP/1.1 429 Too Many Requests
Retry-After: 30

This means you’ve hit the limit. Try again after 30 seconds. The client does not need special handling; modern browsers and API clients understand this automatically.

Throttling: The Gateway temporarily holds the request or slows its processing to prevent the backend from being overwhelmed.

This intelligent filtering ensures that legitimate requests continue smoothly while automatically controlling abusive or excessive traffic.

Example to Understand Rate Limiting and Throttling in API Gateway

In our example, we will implement Rate Limiting and Throttling inside an API Gateway built using Ocelot in ASP.NET Core. The goal was to control and manage the number of requests from clients (web, mobile, or external systems) to different microservices, such as Product, Order, and Payment, to prevent overloading and ensure fair usage.

The implementation will use:

Fixed Window Rate Limiter for time-based request limits (e.g., 100 requests per minute).
Concurrency Limiter for sensitive APIs (like Payments) to control simultaneous requests.

Each policy is defined in appsettings.json, with specific rules for the number of requests allowed per minute and the number that can be queued.

In short, the example demonstrates how to protect backend microservices by applying smart request controls (Rate Limiting & Throttling) through a custom middleware in the API Gateway layer.

Step 1: Install Required Packages

In Visual Studio, open the Package Manager Console and select your API Gateway project (not the individual microservices). Run the following command:

Install-Package System.Threading.RateLimiting

This is the native .NET 8 library for rate limiting and throttling.

Step 2: Define Configuration in appsettings.json

Think of the “RateLimiting” section in our appsettings.json as the Control Panel or Rulebook that our API Gateway follows to decide:

How many requests can each user or IP make?
Within how much time?
What happens when the limit is reached?

This configuration is read once during startup, then used by our middleware (RateLimitingMiddleware) and policy service (RateLimitPolicyService) to enforce runtime throttling. So, open the appsettings.json file and add this section:

"RateLimiting": {
  "IsEnabled": true,
  "DefaultPolicy": {
    "PermitLimit": 60,
    "Window": "00:01:00",
    "QueueLimit": 5,
    "QueueProcessingOrder": "OldestFirst"
  },
  "ProductApiPolicy": {
    "PermitLimit": 10,
    "Window": "00:01:00",
    "QueueLimit": 0,
    "QueueProcessingOrder": "OldestFirst"
  },
  "OrderApiPolicy": {
    "PermitLimit": 50,
    "Window": "00:01:00",
    "QueueLimit": 2,
    "QueueProcessingOrder": "OldestFirst"
  },
  "PaymentApiPolicy": {
    "PermitLimit": 5,
    "QueueLimit": 2,
    "QueueProcessingOrder": "OldestFirst"
  }
}

Code Explanation:

IsEnabled

This property acts as the master switch for your entire Rate Limiting and Throttling system.

When IsEnabled is set to true, the API Gateway actively enforces all the rate-limiting rules defined under each policy.
When set to false, rate limiting is disabled entirely. The middleware will use a NoRateLimiter, which allows every request to pass through without restriction.
This makes it easy for developers or administrators to temporarily disable throttling for debugging or maintenance without needing to remove or comment out any code.
It provides Environment-Level Flexibility, so you can keep rate limiting disabled in development and testing environments but enabled in production.

PermitLimit

This property defines the Maximum Number of Requests Allowed for a single identity (UserId or IP address) within one defined time window.

It sets the “capacity,” or the number of requests a user or client can make before being temporarily blocked.
Once the number of requests exceeds this limit, any further requests are either queued (if a queue limit exists) or rejected immediately with an HTTP 429 Too Many Requests response.
It helps protect your downstream services from overload by limiting excessive request bursts.
The appropriate value depends on your API’s expected traffic, for example:
- Product APIs (mostly read-only) can have higher limits, such as 100 per minute.
- Payment or booking APIs (write-heavy operations) should have lower limits, such as 5 or 10 per minute.

Example: If “PermitLimit”: 60, each user or IP can make 60 requests per 1-minute window.

Window

This property defines the Time Interval or Duration over which the system tracks and enforces the PermitLimit.

It tells the Rate Limiter how long to count incoming requests before resetting the counter.
The format follows the standard hh:mm:ss pattern (hours, minutes, seconds).
Once the time window expires, the request counter resets, and users are allowed to issue a new batch of requests.
For example, a Window of “00:01:00” means that the rate limiter will count requests per minute.

Example Use Case: If “PermitLimit”: 60 and “Window”: “00:01:00”, each user can send 60 requests per minute. After one minute, their count resets automatically.

QueueLimit

This property specifies the Maximum Number of Requests That Can Wait in the Queue when all available permits are exhausted.

When a user hits the PermitLimit, any additional requests are not immediately rejected; they are placed in a waiting queue.
If the number of waiting requests exceeds the QueueLimit, new requests will be denied with a 429 Too Many Requests response.
It helps smooth out short spikes in traffic by allowing a few extra requests to wait instead of being dropped instantly.
Setting this value too high may increase response time under load; too low may cause early rejections.

Example: If “QueueLimit”: 5, after reaching 60 requests (PermitLimit), five more requests can wait before the system starts rejecting new ones.

QueueProcessingOrder

This property defines the order in which queued requests are processed once permits become available again.

It controls how the system decides which waiting request gets served first.
The possible options are:
- - “OldestFirst” → Processes requests in the same order they arrived (First-In, First-Out).
  - “NewestFirst” → Gives priority to the most recent requests (Last-In, First-Out).
In most APIs, “OldestFirst” is preferred to ensure fairness, preventing earlier requests from being starved by newer ones.
This setting becomes active only when there are queued requests due to limited permits.

Example: If “QueueProcessingOrder”: “OldestFirst”, the limiter will always handle the earliest waiting request first when a slot frees up.

Policy Sections (DefaultPolicy, ProductApiPolicy, OrderApiPolicy, PaymentApiPolicy)

Each policy section represents a separate rule set for a specific category of APIs within your Gateway.

DefaultPolicy acts as a fallback for routes that don’t match any specific category.
ProductApiPolicy defines rules for product-related APIs, typically more tolerant because they handle read-heavy traffic.
OrderApiPolicy applies to order-processing endpoints and provides slightly tighter control due to transaction sensitivity.
PaymentApiPolicy is the strictest, often using concurrency-based limits because payment operations must be serialized or strictly rate-controlled.

Example Flow: If a user sends a request to /api/orders/create, the middleware will automatically detect that it falls under the OrderApiPolicy and use its specific limits.

Step 3: Create the Model Classes

The RateLimitSettings class acts as a bridge between our configuration file (appsettings.json) and our runtime logic (middleware and services). It ensures that the rate-limiting settings defined in appsettings.json are strongly typed and easily accessible throughout our codebase using .NET’s Options Pattern.

In short, the RateLimitSettings class defines the rate-limiting rules and how they should be applied for each API category. So, create a class file named RateLimitSettings.cs within the Models folder and copy-paste the following code.

namespace APIGateway.Models
{
    // Represents the global rate limiting configuration.
    // This model maps directly to the "RateLimiting" section in appsettings.json.
    public class RateLimitSettings
    {
        // Enables or disables rate limiting globally.
        // When false, the middleware bypasses all limiter checks (uses NoopRateLimiter).
        public bool IsEnabled { get; set; }

        // Default global policy applied to all endpoints
        // that do not have a specific API policy defined.
        public Policy DefaultPolicy { get; set; } = new();

        // Rate limiting rules specifically for Product-related APIs.
        // Example: /api/products/*
        public Policy ProductApiPolicy { get; set; } = new();

        // Rate limiting rules specifically for Order-related APIs.
        // Example: /api/orders/*
        public Policy OrderApiPolicy { get; set; } = new();

        // Rate limiting rules specifically for Payment-related APIs.
        // Example: /api/payments/*
        // Typically configured with a stricter limit
        // since payment operations are sensitive and require throttling.
        public Policy PaymentApiPolicy { get; set; } = new();

        // Inner class representing an individual rate limiting policy.
        // Each policy defines how many requests can be made in a specific time window
        // and how excess requests are queued or rejected.
        public class Policy
        {
            // Maximum number of requests permitted within a given time window.
            // Example: PermitLimit = 100 means 100 requests per window per user/IP.
            public int PermitLimit { get; set; }

            // The duration of the fixed or sliding window.
            // Format: "hh:mm:ss" (e.g., "00:01:00" = 1 minute window).
            public string Window { get; set; } = "00:01:00";

            // The number of requests that can wait in the queue
            public int QueueLimit { get; set; }

            // The order in which queued requests are processed:
            //  "OldestFirst" → FIFO (first-come, first-served)
            //  "NewestFirst" → LIFO (newest request gets priority)
            // This maps to the QueueProcessingOrder enum in System.Threading.RateLimiting.
            public string QueueProcessingOrder { get; set; } = "OldestFirst";
        }
    }
}

RateLimitPolicy Enum

The RateLimitPolicy enum represents the different categories or groups of APIs in our Gateway, each with its own rate-limiting behavior. Instead of hardcoding strings like “ProductApiPolicy” or “OrderApiPolicy” throughout our code, we define a strongly typed enum that expresses which policy applies to which route.

The RateLimitPolicy enum tells the API Gateway which set of rules to apply to each incoming request, ensuring that every route (Products, Orders, Payments, etc.) follows the correct rate-limiting policy defined in your configuration file. So, create a class file named RateLimitPolicy.cs within the Models folder and copy-paste the following code.

namespace APIGateway.Models
{
    // Each enum value corresponds to a specific configuration section 
    // inside the RateLimiting section of appsettings.json file.
    // Example (JSON):
    // "RateLimiting": {
    //   "DefaultPolicy": { ... },
    //   "ProductApiPolicy": { ... },
    //   "OrderApiPolicy": { ... },
    //   "PaymentApiPolicy": { ... }
    // }
 
    public enum RateLimitPolicy
    {
        Default,
        ProductApi,
        OrderApi,
        PaymentApi
    }
}

Understanding RateLimiter and RateLimitLease:

These two (RateLimiter and RateLimitLease) are the Core Building Blocks of the native .NET 8 rate-limiting system.

What is RateLimiter?

RateLimiter is an Abstract Base Class in .NET that defines the rules and behaviour for controlling how many requests can be performed within a specific period of time. It’s like a Traffic Controller that ensures not too many requests are processed at once, preventing system overload, API abuse, or resource exhaustion.

How RateLimiter Works in General

When a request comes in:

The limiter checks how many requests have already been made recently.
It decides whether to allow, delay, or reject the new request.
It returns a RateLimitLease, a token that indicates whether the request was granted.
The caller (Middleware or App Code) acts accordingly: either processes the request or returns an HTTP 429 (Too Many Requests) response.

What is RateLimitLease?

RateLimitLease represents the result of a Rate-Limit check. It tells whether the current request was allowed or rejected. It’s like the Permission Slip or Entry Pass issued by the RateLimiter.

Creating No Rate Limiter Helper:

The NoRateLimiter class is a custom implementation of .NET’s built-in RateLimiter base class. It’s designed to do nothing, that is, to always allow every request immediately, without applying any throttling, counting, or queuing.

It’s used when the global configuration (IsEnabled = false in appsettings.json) says, “I don’t want to enforce any rate limiting right now.” So instead of completely removing the rate-limiting middleware or adding dozens of if (IsEnabled) checks throughout the code, we swap the normal limiter for this one.

First, create a folder named Helpers at the root directory of the API Gateway layer project. Then, create a class file named NoRateLimiter.cs within the Helpers folder, and copy-paste the following code.

using System.Threading.RateLimiting;

namespace APIGateway.Helpers
{
    // Used when global rate limiting is turned off (IsEnabled = false).
    // Always allows requests without applying any throttling.
    public sealed class NoRateLimiter : RateLimiter
    {
        // Reusable "always-allowed" lease for all requests.
        private static readonly RateLimitLease _lease = new NoLease();

        // Indicates the limiter is always active (never idle).
        public override TimeSpan? IdleDuration => Timeout.InfiniteTimeSpan;

        // Synchronously grants permission for every request.
        protected override RateLimitLease AttemptAcquireCore(int permitCount)
        {
            return _lease;
        }

        // Asynchronous method that grants permission immediately for all requests.
        // Used when rate limiting is globally disabled.
        protected override ValueTask<RateLimitLease> AcquireAsyncCore(
            int permitCount,
            CancellationToken cancellationToken)
        {
            // Always returns a successful lease without any delay or throttling.
            return new ValueTask<RateLimitLease>(_lease);
        }

        // No statistics tracking for this limiter.
        public override RateLimiterStatistics? GetStatistics()
        {
            return null;
        }

        // Represents a successful lease that always allows requests.
        private sealed class NoLease : RateLimitLease
        {
            // Always grants permission.
            public override bool IsAcquired => true;

            // No metadata (e.g., retry info) is provided.
            public override IEnumerable<string> MetadataNames => Array.Empty<string>();

            // Always returns false since no metadata exists.
            public override bool TryGetMetadata(string name, out object? metadata)
            {
                metadata = null;
                return false;
            }
        }
    }
}

What is RateLimitLease?

A RateLimitLease represents a “Permission Slip” that determines whether a request is allowed to proceed.

Every time a client request arrives, the rate limiter must decide, “Can I allow this request or not?”
This decision is returned as a RateLimitLease object.
If IsAcquired = true, the request can continue; otherwise, the request is rejected (usually with a 429 Too Many Requests status).
In a NoRateLimiter, the lease is always granted, because throttling is disabled.
The _lease instance is reused to save memory and reduce object allocation, making it lightweight and thread-safe.

Think of it like: Each API request must show a “Permit Card.” If the card (lease) is valid → the gate opens. If it’s invalid → the request is stopped.

What is IdleDuration?

The IdleDuration property tells how long the limiter has been idle before it resets or is considered inactive.

In normal limiters, this helps track when to reset counters or recycle resources.
In NoRateLimiter, there is no reset concept, it’s always active.
That’s why it returns Timeout.InfiniteTimeSpan, which literally means “never idle”.
This signals to the framework that the limiter doesn’t pause, expire, or rest, it’s permanently available.

Think of it like: A guard who never sleeps, always awake, always letting people through.

What is AttemptAcquireCore()?

This is the synchronous method used to request permission from the limiter.

It’s called when a request arrives and you want an immediate yes/no answer, no waiting.
In real limiters, this method checks current usage, counts, and limits before deciding.
In NoRateLimiter, it just returns the reusable _lease object, meaning, every request is instantly approved.
The permitCount parameter (number of tokens requested) is ignored, since all requests are accepted.

Think of it like: Someone knocks → the guard instantly waves them through without checking anything.

What is AcquireAsyncCore()?

This is the asynchronous version of AttemptAcquireCore(), used when the limiter might need to wait for an available slot (e.g., when queueing is involved).

In most real-world limiters, this method would pause the request until a slot opens up.
But in NoRateLimiter, there’s no waiting or queuing, the limiter always approves immediately.
Hence, it simply returns a ValueTask containing the reusable _lease that grants access.

Think of it like: Even if you try to wait for a spot, the gatekeeper immediately says, “No need to wait, you can go right in.”

What is GetStatistics()?

The GetStatistics() method provides runtime insights about limiter performance, such as:

Number of acquired permits
Number of rejected requests
Queue length or average wait time

But since NoRateLimiter never tracks or limits anything, it doesn’t collect data. Therefore, it always returns null, meaning “no statistics available.”

Think of it like: A system that never keeps logs because everything is always allowed.

Nested NoLease Class

The NoLease class defines the actual “permit” returned when a request is approved.

It inherits from RateLimitLease and overrides its key properties.
Since this is an “always-successful” lease, it doesn’t track metadata, limits, or retry times.
It’s simply a success flag (IsAcquired = true) that indicates the request passed without restriction.

Think of it like: A pre-signed, reusable “access card” that’s valid forever, no expiry, no data, no restrictions.

What is IsAcquired?

The IsAcquired property indicates whether the current request successfully obtained permission.

In normal limiters, IsAcquired = false means the limit has been reached, and the request must wait or be rejected.
In this case, it’s always true, meaning every request gets an automatic green light.

Think of it like: Every visitor automatically gets their entry pass stamped “Approved”.

What is MetadataNames?

MetadataNames holds extra information attached to the lease, such as:

“RetryAfter” → how long to wait before retrying
“RemainingPermits” → how many requests can still be made in this window

But the NoRateLimiter doesn’t use or expose any metadata. Hence, it simply returns an empty list.

Think of it like: An access card that doesn’t have any extra fields — just “Approved”.

What is TryGetMetadata()?

The TryGetMetadata() is used to retrieve specific metadata values by key. For example, a real limiter could return:

“RetryAfter” → 30 seconds
“RemainingRequests” → 5″

In NoRateLimiter, since nothing is tracked or delayed, there’s no metadata to return; it always returns false.

Think of it like: Asking the guard, “When can I come back?” and he replies, “You’re always allowed, no need to wait.”

Why do We Need this No Rate Limiter?

In a production API Gateway, there are situations where we might want to temporarily disable rate limiting, such as:

During Load Testing or Stress Testing, we don’t want throttling to interfere.
While Debugging or Profiling performance.
When Rate Limits Are Controlled Externally (for example, by a reverse proxy like Nginx or Cloudflare).
When running in Development or Staging Environments, you don’t want to slow down developers with artificial limits.

Types of Rate limiter?

Rate-limiting mechanisms can be broadly categorized based on how they control the flow of requests into a system. Their primary purpose is to prevent overloading, ensure fair usage, and maintain consistent service quality across all clients.

In .NET, the System.Threading.RateLimiting namespace provides multiple built-in limiter strategies, each designed for a specific use case. The most commonly used types are:

Fixed Window Rate Limiter – Time-based limiting
Concurrency Limiter – Parallelism-based limiting

What is FixedWindowRateLimiter?

The FixedWindowRateLimiter is a time-based rate limiting mechanism that defines how many requests can be processed within a specific, fixed-duration time window, for example, 100 requests per minute per user or IP address.

It works by dividing time into fixed, non-overlapping intervals, known as windows. Each window maintains a counter representing how many requests have been processed so far.

How It Works

At the start of each window (say 1 minute), the limiter allows up to N requests (as defined in configuration).
Each incoming request increments the counter.
Once the counter reaches the limit (e.g., 100 requests), any additional requests within the same time window are either queued (if a queue limit is configured) or rejected immediately.
When the time window expires, the counter resets to zero, and the next window starts fresh.

Real-World Analogy

Imagine a movie theatre ticket counter that only sells 100 tickets per minute. If 101st person arrives before that minute ends, they must either:

Wait until the next minute (new window), or
Leave if the queue is full.

This ensures predictable load and avoids sudden traffic bursts that could overload downstream services.

Note: Ideal for public APIs, gateway-level control, or uniform time-based quotas

What is ConcurrencyLimiter?

The ConcurrencyLimiter is a Parallelism-Based rate-limiting mechanism. Instead of limiting requests per time interval, it focuses on how many requests are actively being processed at the same time, in real time.

It defines a fixed number of concurrent permits (active slots). When a request arrives, it must acquire a permit before being processed. If all permits are in use, the incoming request:

Waits in a queue until a slot becomes available, or
Is rejected immediately with a 429 Too Many Requests response if the queue is full.

How It Works

Suppose your limiter allows five concurrent requests.
The first five incoming requests acquire permits and start processing immediately.
The 6th request is paused and placed in a waiting queue.
As soon as one of the first 5 completes its work, it releases its permit.
The limiter immediately allows the subsequent waiting request to proceed.

This cycle continues, ensuring the system never processes more than 5 requests at once and maintaining stable performance.

Real-World Analogy

Think of an elevator that can hold only five people at a time. If 10 people are waiting:

The first five get in (active requests).
The next five wait (queued requests).
When the elevator returns (a request completes), another person in the queue gets their turn.

This ensures the elevator (system) never carries more load than it safely can.

Note: Ideal for long-running, high-cost, or sensitive operations, such as payment processing or data uploads.

Step 4: Create the Rate Limit Policy Service

IRateLimitPolicyService is an interface that defines how your API Gateway should create and configure rate limiters based on the API being accessed, without hardcoding limiter logic into the middleware. Think of it like a Factory Contract that produces rate limiters (e.g., FixedWindow, ConcurrencyLimiter) according to the rules defined in your appsettings.json.

Create an interface named IRateLimitPolicyService.cs within the Services folder, then copy-paste the following code. It defines a contract so you can later plug in another provider (e.g., Redis-based, database-driven, etc.) without touching the middleware logic.

using APIGateway.Models;
using System.Threading.RateLimiting;

namespace APIGateway.Services
{
    // Defines a contract for creating rate limiters dynamically 
    // based on API category (policy) and request identity.

    // The implementing service (RateLimitPolicyService) uses this interface
    // to generate the appropriate RateLimiter instance (FixedWindow, ConcurrencyLimiter, etc.)
    // depending on the policy selected by the middleware.
    public interface IRateLimitPolicyService
    {
        RateLimiter CreateRateLimiter(RateLimitPolicy policy);
    }
}

Creating RateLimitPolicyService

The RateLimitPolicyService is the factory and configuration brain of our rate-limiting system. Its job is to:

Read settings from appsettings.json,
Choose the correct limiter type (FixedWindow or Concurrency),
Configure it with our limits (like request count, queue size, etc.),
Return a ready-to-use RateLimiter object to the middleware.

Create a class file named RateLimitPolicyService.cs within the Services folder, then copy-paste the following code.

using APIGateway.Helpers;
using APIGateway.Models;
using Microsoft.Extensions.Options;
using System.Threading.RateLimiting;

namespace APIGateway.Services
{
    // Responsible for creating and configuring rate limiter instances 
    // based on policy definitions found in appsettings.json.
 
    // Design Goals:
    //  • Centralize all limiter creation logic in one place.
    //  • Support multiple limiter strategies (FixedWindow, ConcurrencyLimiter, etc.).
    //  • Allow global enable/disable toggle via configuration.
    public class RateLimitPolicyService : IRateLimitPolicyService
    {
        // Holds the strongly typed configuration from appsettings.json ("RateLimiting" section).
        private readonly RateLimitSettings _settings;

        // Initializes the policy service using the options pattern to bind configuration.
        // The IOptions<T> abstraction automatically injects RateLimitSettings values from appsettings.json.
        public RateLimitPolicyService(IOptions<RateLimitSettings> options)
        {
            _settings = options.Value;
        }

        // Factory method for creating the appropriate RateLimiter based on the selected policy.
        public RateLimiter CreateRateLimiter(RateLimitPolicy policy)
        {
            // If rate limiting is globally disabled (via appsettings.json),
            // return a lightweight NoopRateLimiter that allows all requests.
            if (!_settings.IsEnabled)
                return new NoRateLimiter();

            // Dynamically choose limiter type based on policy
            return policy switch
            {
                // Product APIs → Use Fixed Window Limiter (e.g., 100 requests per minute)
                RateLimitPolicy.ProductApi => CreateFixedWindowLimiter(_settings.ProductApiPolicy),

                // Order APIs → Also Fixed Window, typically stricter limits than Product
                RateLimitPolicy.OrderApi => CreateFixedWindowLimiter(_settings.OrderApiPolicy),

                // Payment APIs → Use ConcurrencyLimiter (limits simultaneous requests)
                RateLimitPolicy.PaymentApi => CreateConcurrencyLimiter(_settings.PaymentApiPolicy),

                // Fallback to Default Policy for all other APIs
                _ => CreateFixedWindowLimiter(_settings.DefaultPolicy)
            };
        }

        // Creates a FixedWindowRateLimiter based on configuration values.
        // Fixed Window Limiter:
        // Allows up to N requests within a defined "window" duration (e.g., 60/sec, 100/min).
        // Once the window is full, requests are queued or rejected.
        private RateLimiter CreateFixedWindowLimiter(RateLimitSettings.Policy policy)
        {
            return new FixedWindowRateLimiter(new FixedWindowRateLimiterOptions
            {
                // Maximum number of allowed requests in a window (PermitLimit = 100 → 100 req/min)
                PermitLimit = policy.PermitLimit,

                // The time duration defining each window (e.g., "00:01:00" = 1 minute)
                Window = TimeSpan.Parse(policy.Window),

                // Number of extra requests that can be queued after reaching the limit
                QueueLimit = policy.QueueLimit,

                // Defines how queued requests are processed (OldestFirst or NewestFirst)
                QueueProcessingOrder = Enum.Parse<QueueProcessingOrder>(policy.QueueProcessingOrder)
            });
        }

        // Creates a ConcurrencyLimiter based on configuration values.
        // Concurrency Limiter:
        // Restricts the number of concurrent (simultaneous) operations instead of per-minute requests. 
        // Ideal for Payment APIs, where only a few payment transactions 
        // should be processed at the same time to avoid gateway overload or double submissions.
        private RateLimiter CreateConcurrencyLimiter(RateLimitSettings.Policy policy)
        {
            return new ConcurrencyLimiter(new ConcurrencyLimiterOptions
            {
                // Number of concurrent operations permitted
                PermitLimit = policy.PermitLimit,

                // Additional requests waiting in queue (optional)
                QueueLimit = policy.QueueLimit,

                // Defines how queued requests are processed when slots free up
                QueueProcessingOrder = Enum.Parse<QueueProcessingOrder>(policy.QueueProcessingOrder)
            });
        }
    }
}

Step 5: Create the Custom Middleware

RateLimitingMiddleware is a custom middleware component that sits inside your API Gateway’s HTTP request pipeline. Its job is to inspect every incoming request and decide whether to:

Allow it (if within rate limits), or
Block it (if the client has exceeded the configured limit).

It acts as the traffic controller for your APIs, ensuring that no single user, IP address, or system can overwhelm your backend microservices.

Key Responsibilities

Identify who is making the request (UserId from JWT or IP address).
Identify the type of API being accessed (Product, Order, or Payment).
Apply the correct rate-limiting rule based on configuration.
Keep track of how many requests the user/IP has made recently.
Decide whether to allow or block the request.

Create a class file named RateLimitingMiddleware.cs within the Middlewares folder, then copy and paste the following code.

using APIGateway.Models;
using APIGateway.Services;
using System.Collections.Concurrent;
using System.Security.Claims;
using System.Threading.RateLimiting;

namespace APIGateway.Middlewares
{
    // This middleware enforces rate limiting for every incoming request:
    // For authenticated users → limits are applied per UserId (from JWT).
    // For anonymous users → limits are applied per IP address.
    
    // Workflow:
    //  1️. Detect which API endpoint is being accessed (Product, Order, Payment, etc.).
    //  2️. Resolve identity (UserId or IP).
    //  3️. Retrieve or create a RateLimiter (based on policy).
    //  4️. Attempt to acquire permission (token) from the limiter.
    //  5️. If denied → return HTTP 429 Too Many Requests.
    //  6️. Otherwise → continue down the pipeline.
    public class RateLimitingMiddleware
    {
        // Points to the next middleware in the pipeline
        private readonly RequestDelegate _next;

        // A dependency that creates the correct RateLimiter 
        private readonly IRateLimitPolicyService _policyService;

        // An in-memory thread-safe dictionary that stores a separate limiter per user/IP per policy
        private static readonly ConcurrentDictionary<string, RateLimiter> _limiters = new();

        public RateLimitingMiddleware(RequestDelegate next, IRateLimitPolicyService policyService)
        {
            _next = next;
            _policyService = policyService;
        }

        // This is called automatically by the ASP.NET Core pipeline for every HTTP request.
        public async Task InvokeAsync(HttpContext context)
        {
            // Step 1: Identify the Request Path
            // Extracts the URL path, e.g. /api/products/get-all.
            // Helps the middleware determine which API category the request belongs to.
            var path = context.Request.Path.Value ?? string.Empty;

            // Step 2: Resolve the Request Identity (IP or User Id)
            // This method determines who is making the request.
            var identityKey = ResolveIdentity(context);

            // Step 3: Determine Which API Policy Applies
            // Product, Order, Payment, etc.
            var policy = GetPolicyFromPath(path);

            // Step 4: Combine both to form a unique limiter key
            // Builds a unique key combining policy + user/IP
            // Example: "OrderApi_user:123" or "Default_ip:203.91.45.10"
            var limiterKey = $"{policy}_{identityKey}";

            // Step 5: Retrieve or Create a Limiter
            // Looks up an existing limiter in the dictionary.
            // If not found, creates a new one via RateLimitPolicyService.
            var limiter = _limiters.GetOrAdd(
                limiterKey,
                _ => _policyService.CreateRateLimiter(policy));

            // Step 6: Try to Acquire a Permit
            // This is the actual enforcement line.
            // Requests 1 token(permit) from the limiter.
            // The limiter(either FixedWindow or Concurrency) checks:
            // Has this user already hit their request limit?
            // If not → allow and mark a token as used.
            // If yes → deny access.
            // The lease object tells you if the request was accepted:
            //      lease.IsAcquired == true → proceed.
            //      lease.IsAcquired == false → reject(too many requests).
            using var lease = await limiter.AcquireAsync(1, context.RequestAborted);

            // Step 6: Handle Rejected Requests
            if (!lease.IsAcquired)
            {
                // Responds with HTTP 429 Too Many Requests.
                context.Response.StatusCode = StatusCodes.Status429TooManyRequests;
                context.Response.ContentType = "application/json";

                // Adds a standard Retry-After header to inform the client when to retry.
                context.Response.Headers["Retry-After"] = "60";

                await context.Response.WriteAsync(
                    "{\"error\":\"rate_limit_exceeded\",\"message\":\"Too many requests. Please try again later.\"}");

                // Stops the request pipeline here — the request doesn’t reach your downstream APIs.
                return;
            }

            // Step 7: Pass Allowed Requests Forward
            // If the limiter allows the request,
            // it’s forwarded to the next middleware or API controller as normal.
            await _next(context);
        }

        // Determines which rate-limiting policy applies based on request path. 
        // This ensures different API categories can have different rate limits
        // Example:
        //   /api/products → ProductApiPolicy
        //   /api/orders → OrderApiPolicy
        //   /api/payments → PaymentApiPolicy
        private static RateLimitPolicy GetPolicyFromPath(string path)
        {
            if (path.Contains("/products", StringComparison.OrdinalIgnoreCase))
                return RateLimitPolicy.ProductApi;

            if (path.Contains("/orders", StringComparison.OrdinalIgnoreCase))
                return RateLimitPolicy.OrderApi;

            if (path.Contains("/payments", StringComparison.OrdinalIgnoreCase))
                return RateLimitPolicy.PaymentApi;

            // Default fallback for all other routes
            return RateLimitPolicy.Default;
        }

        // Resolves a unique identity key for the current requester.
        // Priority order:
        //  1️. Authenticated user → Extract UserId from JWT claims.
        //  2️. Anonymous user → Use client IP address (proxy-aware).
        private static string ResolveIdentity(HttpContext context)
        {
            // 1️. Try JWT-based user identification.
            // Looks for claims like NameIdentifier, sub, or userId.
            // If found → identity becomes user: { userId}.
            var userId = context.User?.FindFirst(ClaimTypes.NameIdentifier)?.Value
                      ?? context.User?.FindFirst("sub")?.Value
                      ?? context.User?.FindFirst("userId")?.Value;

            if (!string.IsNullOrWhiteSpace(userId))
                return $"user:{userId}";

            // If no JWT token (anonymous user):
            // Uses IP address as identity key.
            // Checks the X-Forwarded-For header(important when using proxies or CDNs).
            // Example: ip: 203.91.44.88
            var ip = context.Request.Headers["X-Forwarded-For"].FirstOrDefault()
                     ?? context.Connection.RemoteIpAddress?.ToString()
                     ?? "unknown";

            return $"ip:{ip}";
        }
    }
}

Step 6: Add Extension for Clean Integration

This file provides extension methods that make it easy to:

Register your custom Rate Limiting services (like IRateLimitPolicyService and RateLimitSettings), and
Enable your Rate Limiting middleware (RateLimitingMiddleware) inside the request pipeline.

Instead of writing long code directly in the Program.cs, you can simply call:

builder.Services.AddCustomRateLimiting(configuration);
app.UseCustomRateLimiting();

Extension methods allow you to add reusable setup logic to IServiceCollection (for dependency injection) and IApplicationBuilder (for middleware). So instead of manually wiring services and middleware, these methods act like: One-line shortcuts to register and activate your entire rate limiting system.

First, create a folder named Extensions at the root directory of the API Gateway layer project. Then, create a class file named RateLimitingExtensions.cs within the Extensions folder, and copy-paste the following code.

using APIGateway.Models;
using APIGateway.Services;
using APIGateway.Middlewares;

namespace APIGateway.Extensions
{
    // Provides extension methods to register and enable 
    // custom in-memory rate limiting within the API Gateway.
    public static class RateLimitingExtensions
    {
        // Adds and configures the custom rate limiting services to the dependency injection container.
        public static IServiceCollection AddCustomRateLimiting(
            this IServiceCollection services, IConfiguration configuration)
        {
            // 1️. Bind "RateLimiting" section in appsettings.json → RateLimitSettings model
            // Example:
            // "RateLimiting": {
            //   "IsEnabled": true,
            //   "DefaultPolicy": { ... },
            //   "ProductApiPolicy": { ... }
            // }
            services.Configure<RateLimitSettings>(
                configuration.GetSection("RateLimiting"));

            // 2️. Register the policy service that creates limiters based on configuration
            // This allows the middleware to request a RateLimiter via IRateLimitPolicyService
            services.AddSingleton<IRateLimitPolicyService, RateLimitPolicyService>();

            // Why Singleton?
            //  Rate limiting depends on shared in-memory limiters.
            //  A singleton ensures consistent throttling across requests.

            // Return for fluent chaining in Program.cs
            return services;
        }

        // Adds the "RateLimitingMiddleware" into the application request pipeline.
        // What this does:
        // Inserts your custom rate-limiting logic before controller execution.
        // Every request will pass through this middleware where it will either:
        //    Be allowed → proceed to next middleware or controller, OR
        //    Be throttled → immediately return HTTP 429 Too Many Requests.
 
        // Recommended Placement:
        // Place this middleware early in the pipeline (after authentication but before business logic).
        public static IApplicationBuilder UseCustomRateLimiting(this IApplicationBuilder app)
        {
            // Insert custom middleware into the ASP.NET Core pipeline
            return app.UseMiddleware<RateLimitingMiddleware>();
        }
    }
}

Step 7: Program.cs

In an ASP.NET Core API Gateway, Program.cs acts as the entry point that integrates:

Ocelot (Routing + Reverse Proxy)
JWT Authentication
Serilog (Structured Logging)
Custom Response Compression
Custom Rate Limiting & Throttling
Swagger for local API testing
Microservice HttpClients

So, the Program.cs is the Final Integration Point where your Rate Limiting and Throttling System Becomes Active inside the API Gateway. It ensures that:

The rate limiter is registered (AddCustomRateLimiting()),
The middleware is activated (UseCustomRateLimiting()), and
Both Custom Gateway Routes and Ocelot Proxy Routes are protected from overuse before requests ever reach your backend microservices.

Please modify the Program.cs class file as follows:

using APIGateway.Extensions;
using APIGateway.Middlewares;
using APIGateway.Models;
using APIGateway.Services;
using Microsoft.AspNetCore.Authentication.JwtBearer;
using Microsoft.IdentityModel.Tokens;
using Microsoft.Win32;
using Newtonsoft.Json.Serialization;
using Ocelot.DependencyInjection;
using Ocelot.Middleware;
using Serilog;
using System.Text;

namespace APIGateway
{
    public class Program
    {
        public static async Task Main(string[] args)
        {
            var builder = WebApplication.CreateBuilder(args);

            // MVC Controllers + Newtonsoft JSON Configuration
            builder.Services
                .AddControllers()
                .AddNewtonsoftJson(options =>
                {
                    options.SerializerSettings.ContractResolver = new DefaultContractResolver
                    {
                        NamingStrategy = new DefaultNamingStrategy()
                    };
                });

            // Reads the RateLimiting section from appsettings.json
            // Binds it to your RateLimitSettings model
            // Registers the RateLimitPolicyService as a singleton
            builder.Services.AddCustomRateLimiting(builder.Configuration);

            // Bind CompressionSettings section to our model using opions pattern
            builder.Services.Configure<CompressionSettings>(
                builder.Configuration.GetSection("CompressionSettings"));

            // Ocelot Configuration (API Gateway Routing Layer)
            builder.Configuration.AddJsonFile("ocelot.json", optional: false, reloadOnChange: true);
            builder.Services.AddOcelot(builder.Configuration);

            // Structured Logging Setup (Serilog)
            Log.Logger = new LoggerConfiguration()
                .ReadFrom.Configuration(builder.Configuration)
                .Enrich.FromLogContext()
                .CreateLogger();

            builder.Host.UseSerilog(); // Replace default .NET logger with Serilog.

            builder.Services.AddEndpointsApiExplorer();
            builder.Services.AddSwaggerGen();

            // JWT Authentication (Bearer Token Validation)
            builder.Services
                .AddAuthentication(options =>
                {
                    // Define the default authentication scheme as Bearer
                    options.DefaultAuthenticateScheme = JwtBearerDefaults.AuthenticationScheme;
                    options.DefaultChallengeScheme = JwtBearerDefaults.AuthenticationScheme;
                })
                .AddJwtBearer(options =>
                {
                    // Token validation configuration
                    options.TokenValidationParameters = new TokenValidationParameters
                    {
                        ValidateIssuer = true,
                        ValidIssuer = builder.Configuration["JwtSettings:Issuer"],

                        // We’re not validating audience because microservices share same gateway.
                        ValidateAudience = false,

                        // Enforce token expiry check
                        ValidateLifetime = true,

                        // Ensure token signature integrity using secret key
                        ValidateIssuerSigningKey = true,
                        IssuerSigningKey = new SymmetricSecurityKey(
                            Encoding.UTF8.GetBytes(builder.Configuration["JwtSettings:SecretKey"]!)
                        ),

                        // No extra grace period for expired tokens
                        ClockSkew = TimeSpan.Zero
                    };
                });

            builder.Services.AddAuthorization(); // Enables [Authorize] attributes.

            // Downstream Microservice Clients (typed HttpClientFactory)
            var urls = builder.Configuration.GetSection("ServiceUrls");

            builder.Services.AddHttpClient("OrderService", c =>
            {
                c.BaseAddress = new Uri(urls["OrderService"]!);
            });

            builder.Services.AddHttpClient("UserService", c =>
            {
                c.BaseAddress = new Uri(urls["UserService"]!);
            });

            builder.Services.AddHttpClient("ProductService", c =>
            {
                c.BaseAddress = new Uri(urls["ProductService"]!);
            });

            builder.Services.AddHttpClient("PaymentService", c =>
            {
                c.BaseAddress = new Uri(urls["PaymentService"]!);
            });

            // Custom Aggregation Service Registration
            builder.Services.AddScoped<IOrderSummaryAggregator, OrderSummaryAggregator>();

            var app = builder.Build();

            // Swagger (API Explorer for development/debugging)
            if (app.Environment.IsDevelopment())
            {
                app.UseSwagger();
                app.UseSwaggerUI();
            }

            app.UseHttpsRedirection();

            // Custom conditional compression middleware
            app.UseMiddleware<ConditionalResponseCompressionMiddleware>();

            // Global Cross-Cutting Middleware
            app.UseCorrelationId();        
            app.UseRequestResponseLogging(); 

            // BRANCH 1: Custom Aggregated Endpoints (/gateway/*)
            app.MapWhen(
                ctx => ctx.Request.Path.StartsWithSegments("/gateway", StringComparison.OrdinalIgnoreCase),
                gatewayApp =>
                {
                    // Enable endpoint routing for this sub-pipeline
                    gatewayApp.UseRouting();

                    // Apply authentication & authorization
                    gatewayApp.UseAuthentication();
                    gatewayApp.UseAuthorization();

                    // Apply rate limiting also inside this sub-pipeline if needed
                    gatewayApp.UseCustomRateLimiting();

                    // Register controller actions under this branch
                    gatewayApp.UseEndpoints(endpoints =>
                    {
                        endpoints.MapControllers();
                    });
                });

            // BRANCH 2: Ocelot Reverse Proxy
            app.UseAuthentication();

            // Register rate limiting globally here
            app.UseCustomRateLimiting();

            // Middleware for pre-validation of Bearer tokens (optional)
            app.UseGatewayBearerValidation();

            // Ocelot middleware handles routing, transformation, and load-balancing
            await app.UseOcelot();

            // Start the Application
            app.Run();
        }
    }
}

Testing Rate Limiting and Throttling Using Postman

After implementing all the steps (Middleware, Services, and Configuration in appsettings.json), you’ll verify that your API Gateway correctly limits and throttles requests to different microservices.

Test Rate Limiter (Product / Order APIs)

Step 1: Create a request

In Postman, click + New → HTTP Request.
Choose the method (GET or POST) depending on the endpoint you want to test.
Example:
GET https://localhost:7204/products/products?pageNumber=1&pageSize=20
Test it once, and you should get a 200 OK response.

Step 2. Save the request into a collection

Click the Save button (top-right of the request).
Choose + New Collection and give it a name, e.g., Rate Limit Tests.
Save your request inside that collection.

Step 3. Open the Postman Collection Runner

In the left sidebar, expand Collections.
Hover over your collection (Rate Limit Tests) and click the Run button (or right-click → “Run collection”).
This opens the Collection Runner window.

Step 4. Configure the runner

In the runner window:

Iteration count: 120. Sends the same request 120 times
Delay (ms). 0. No delay between requests, all fire rapidly
Data File. Leave empty. (Not needed here)
Environment. Select your environment. (Optional)

Step 5: Run the collection

Click Run Rate Limit Tests. Postman will start firing 120 requests rapidly (0 ms gap).

You’ll see results appear live:

First 60 / 100 requests → 200 OK
After the configured threshold → 429 Too Many Requests

When to Use Rate Limiting and Throttling

Use Rate Limiting and Throttling when:

Your APIs are publicly exposed to multiple clients.
You want to prevent abuse or overuse.
Your system must handle traffic surges gracefully.
You want to offer tier-based usage plans.
Your backend services are resource-sensitive (like Payment, Search, or Analytics APIs).

These mechanisms help you maintain stability, predictable performance, and user fairness — even when traffic spikes suddenly.

When to Avoid or Relax It

While Rate Limiting is essential for public APIs, there are some situations where you may choose to disable or relax it:

For internal communication between trusted microservices. (Services within your own cluster should not be restricted.)
For critical real-time operations, such as internal message queues or system health checks.
During low-traffic development or testing environments, where performance isn’t a concern.
When your API serves batch jobs or analytics queries that legitimately need to fetch large volumes of data at once.

In those cases, you can either turn it off or apply higher limits for specific users or roles.

Real-Time Scenarios in E-Commerce Microservices

Preventing API Abuse

Imagine your Public Product API (/api/products) is open for third-party integrations. If one client starts sending hundreds of requests per second, it could overload the Product Service.

Rate limiting can cap requests to 100 per minute per client.
Beyond that, the Gateway automatically blocks further requests for that minute.

This prevents misuse and ensures fair access for all clients.

Protecting Payment Service

The Payment Service is sensitive; multiple payment attempts can trigger duplicate transactions or fraud checks. To prevent this, the Gateway can enforce:

Maximum five payment API calls per minute per user.
If a client exceeds this, it receives a 429 Too Many Requests response.

This protects your payment gateway and avoids unnecessary load on financial APIs.

Managing High Traffic During Sales

During festival sales or flash deals, thousands of users may hit your APIs simultaneously. Throttling can slow down requests instead of rejecting them outright, allowing the Gateway to queue extra requests briefly, handle them gradually, and maintain system stability. This ensures that the system remains responsive even under heavy load.

Tier-Based Access Control

In your platform, you might have Basic, Premium, and Partner users. Each can have different limits:

Basic → 60 requests per minute
Premium → 600 requests per minute
Partner → 2000 requests per minute

The API Gateway enforces these dynamically based on user roles or API keys, giving you flexible, policy-based control.

Conclusion

Rate limiting and Throttling are like security guards for your API Gateway. They ensure no user, app, or script can send too many requests and slow down your system.

In your E-Commerce Microservices Architecture, they protect critical services such as Product, Order, and Payment APIs from being flooded with excessive calls, keeping your platform reliable, secure, and efficient in both normal and high-traffic situations.

Dot Net Tutorials

About the Author: Pranaya Rout

Pranaya Rout has published more than 3,000 articles in his 11-year career. Pranaya Rout has very good experience with Microsoft Technologies, Including C#, VB, ASP.NET MVC, ASP.NET Web API, EF, EF Core, ADO.NET, LINQ, SQL Server, MYSQL, Oracle, ASP.NET Core, Cloud Computing, Microservices, Design Patterns and still learning new technologies.