How do most APIs handle rate limiting?

I got into a discussion this past week with one of my colleagues about rate limiting or throttling for APIs. In particular, how we might handle a user going beyond their limit and how we would inform them of what the threshold values are so they can continue calling later on. Neither of us came to an agreement – he took the 503 route and I took the 429 route.

As a side effect though, we took a look at some various companies out there, and found only a couple of HTTP response codes and headers, which all at least follow the same model, with only moderately different header names. For the most part, they all seemed to have these exact headers, or variations of them with slightly different names.

  • X-RateLimit-Limit – The limit that you cannot surpass in a given amount of time
  • X-RateLimit-Remaining – The number of calls you have available until a given reset time stamp, or calculated given some sort of sliding time window.
  • X-RateLimit-Reset – The timestamp in UTC formatted to HTTP spec per RFC 1123 for when the limits will be reset.

In my colleague’s defense, he wasn’t the only one to go the 503 route for rate limiting as this StackExchange post covered (along with the ‘Retry-After’ header), but we couldn’t find a company that practiced it on an API. It is however, a convention used by web browsers for websites according to a StackOverflow post. We were really hoping for a uniform standard, but at least it’s not all over the place.


They use the three response headers:


In the event that you hit the limit, they return a HTTP 403 (Forbidden) with a JSON body with a message about hitting the limit.

GitHub API Documentation


They use the same response headers as GitHub:


It appears that they also respond with a 403 (Forbidden), with a JSON body that is similar to GitHub’s, but with an error code.

Vimeo API Documentation

Atlassian HipChat

Again, the same response headers:


It appears that they too respond with a 403 (Forbidden), but no word on what the response body would say.

Atlassian HipChat API Documentation


Slight variation to the response headers (just the first one is different):


It appears that they too respond with a 403 (Forbidden), but no word on what the response body would say.

Reddit API Documentation


Almost the response headers as before, but with an additional dash in ‘rate-limit’:


However, when you get throttled by Twitter, the response code you get back is different based on what API you are using and what version of that API as well:

  • HTTP 403 (Forbidden) – Tweets
  • HTTP 420 (Enhance your calm) – API 1 (at least for search and trends)
  • HTTP 429 (Too Many Requests) – API 1.1

Twitter API Documentation


These folks go a bit further and split up the headers to show a limit on a per-user and per-client basis.


They will return a 429 on throttled calls.

Imgur API Documentation

Google Maps

Interestingly enough, Google opted for not using any headers on the responses. At least if they do use them, there is no documentation that says that they would.

They do however use a 403 (Forbidden) with a JSON body that has a OVER_QUERY_LIMIT element, but doesn’t appear to have any additional details as far as when you should come back or how many calls you have available for a given timeframe. The only thing I could find was that they will pass back a cache header as such on all of their calls which they expect you to honor by actually caching the response:

Cache-Control: public, max-age=86400


Now that we know that companies are mostly split between 429 and 403 with them almost using the exact same headers, what exactly do the standards (if any) say about this?

It appears that RFC 6585, Section 4 extended the original HTTP specification to add a few status codes, such as the 429 response code specifically created for limiting requests. In regards to the 429, it says this:

The response representations SHOULD include details explaining the condition, and MAY include a Retry-After header indicating how long to wait before making a new request.

Note that this specification does not define how the origin server identifies the user, nor how it counts requests. For example, an origin server that is limiting request rates can do so based upon counts of requests on a per-resource basis, across the entire server, or even among a set of servers.

The downside is that this RFC doesn’t define way to convey back to the client what the limits are and there is no documented standard for the ‘x-ratelimit-*’ headers that I could find. In fact, I couldn’t even find who actually implemented them in the first place. The upside is that there is no documented standard, yet almost everyone has been able to land on some sort of consensus by using this set of headers as is or with minimal variation.