Engineering · 4 min read

Shipping SP-API: surviving Amazon's rate limits at high volume

A look at how MarginLock keeps shipment data fresh against SP-API's per-merchant rate limits, with a Postgres-backed token bucket and the trade-offs.

By Kenderson Tripaldi · April 26, 2026

Engineer reviewing marketplace API sync and rate-limit monitoring dashboards

Amazon's Selling Partner API rate limits are stricter than most people assume going in. Every endpoint has its own per-merchant rate limit, expressed as a sustained rate plus a burst, and the limits are not generous. The Orders endpoint, for example, allows roughly 0.0167 requests per second (one per minute) sustained, with a burst of 20. Across a few hundred merchants and the dozen-plus endpoints we depend on, the rate-limit budget becomes the dominant engineering constraint on data freshness.

This post is about how MarginLock manages that budget. It's an engineering post, not a product post; expect SQL.

The naive approach (and why it doesn't work)

The simplest possible approach: every time you need data from SP-API, call the endpoint. Trust the LWA token, retry on 429, log when you hit limits.

This breaks down fast. With more than a handful of merchants, you'll hit a 429 on Orders within minutes. The retries push your effective rate-limit budget further down (because retries themselves count). You build a backoff mechanism, then realize that across 500 merchants the queue of pending requests is a queue of doomed retries. The data freshness gets worse, not better, because everybody's retrying the same hot endpoints simultaneously.

You need centralized rate-limiting that's aware of every active worker.

A Postgres-backed token bucket

We chose Postgres for the rate-limiter state, for the same reason we chose Postgres for the job queue: it's already there, it's transactionally consistent, and the operational simplicity of one less moving piece is worth the small performance trade-off. The pattern:

CREATE TABLE sp_api_rate_bucket (
  merchant_id  UUID NOT NULL,
  endpoint     TEXT NOT NULL,
  tokens       NUMERIC NOT NULL,
  last_refill  TIMESTAMPTZ NOT NULL,
  capacity     NUMERIC NOT NULL,
  refill_rate  NUMERIC NOT NULL,  -- tokens per second
  PRIMARY KEY (merchant_id, endpoint)
);

Every SP-API call goes through a function that:

Locks the bucket row (FOR UPDATE).
Computes new token count: min(capacity, tokens + (now - last_refill) * refill_rate).
If tokens >= 1, decrement and proceed.
Otherwise, sleep until enough tokens are available, then retry.

The lock is short-lived — a few milliseconds per call — and Postgres handles contention gracefully at our scale.

Why not Redis?

Redis is the obvious choice for token buckets. We use it for cache and session state. We didn't use it here for two reasons:

Operational simplicity. Postgres is on the critical path already. One less hop, one less thing to monitor, one less thing that can drift between workers and database.
Transactional alignment. When a worker calls SP-API and writes the result to Postgres, we want the rate-limiter consumption and the result write to land in the same transaction. With Redis, that's a two-phase pattern; with Postgres, it's just a transaction.

The Postgres approach has worse worst-case latency (low single-digit ms vs sub-ms for Redis) but it's well within budget for SP-API calls, where the network round-trip dominates anyway.

Retry budgets

The other piece worth talking about is retry budgets. When a 429 does happen — it does, sometimes, when an upstream rate limit changes or a token bucket desyncs — we don't retry forever. Each request gets a budget:

2 immediate retries with exponential backoff (1s, 2s).
3rd retry deferred to a job queue with at-least-once semantics, scheduled one minute out.
After 5 total attempts, the request is failed and a circuit breaker trips for that endpoint for that merchant.

The circuit breaker is what saves you in cascading-failure scenarios. If Amazon temporarily drops the rate limit for an endpoint (it happens), the breaker keeps every worker from compounding the problem.

The trade-offs we made

A few decisions were not slam-dunks. For posterity:

Capability	Choice	Alternative
Postgres for bucket state	Operational simplicity, transactional	Redis: faster, more idiomatic
Circuit breaker per endpoint	Fails fast on persistent issues	No breaker: try until success
Bucket per (merchant, endpoint)	Matches SP-API's actual limits	Bucket per merchant: simpler but wastes budget
Async retry via job queue	Worker doesn't block on retries	Synchronous: simpler but blocks workers

What this gets us

Steady-state, MarginLock keeps shipment, inventory, order, and settlement data fresh against SP-API's rate limits across hundreds of active merchants. The 429 rate sits below 0.1% of total calls; when one does occur it's absorbed by the retry path without merchant-visible impact.

The two metrics we watch closely:

Bucket exhaustion events per hour. A bucket hitting zero is a sign that we should request a higher rate-limit allocation from Amazon, or that a workflow is calling an endpoint more than it needs to.
Median data staleness per merchant. This is the user-visible metric. Our target is <60 minutes for hot data (orders, inventory) and <2 hours for cold data (catalog metadata).

Engineering investment in the rate-limiter pays itself back the first time a new merchant onboards and SP-API immediately tries to rate-limit them out of existence. Without the bucket and breaker, that's a paged engineer. With them, it's a non-event.

Connect Amazon and watch the data flow in