2 min read
Why cache at all
The token endpoint is itself rate-limited, and minting a token per request is pure waste — the same token works for every call until it expires. A single live token, shared across processes, is both faster and cheaper. See the token lifecycle.
Key and store correctly
Key the cache by (client_id, scope-set) so a narrowly-scoped token and a broadly-scoped one do not overwrite each other. Store the absolute expiry — now + expires_in − margin — where the margin (a few seconds) covers clock skew and in-flight requests. In a multi-process deployment, put the token in a shared cache like Redis so all workers reuse it.
Re-mint on the boundary
Serve the cached token until it is within the margin of expiry, then mint a fresh one and swap it in atomically. Handle the rare mid-flight expiry as a single re-mint-and-retry on a 401 — do not loop, since an invalid credential also 401s. This pairs with back-off on the token endpoint.
Frequently asked questions
Should each worker mint its own token?
No. Share one live token per (client, scope) across all workers via a shared cache. Per-worker or per-request minting wastes your token-endpoint rate-limit budget for no benefit.
What margin should I subtract from expires_in?
A few seconds is enough to cover clock skew and requests already in flight. Re-mint once the token is within that margin of expiry so no request ever goes out with an already-expired token.
Related reading

Handle rate limits and back off correctly
Do not guess your remaining budget — read the RateLimit- headers, slow down before the wall, and on a 429…
Read →
Rotate partner API credentials with zero downtime
Rotate a partner secret with no outage: create a second credential with the same scopes, cut traffic to it,…
Read →Funding for UK limited companies
Credicorp lends to your company, not to you personally — short-term working capital with no personal guarantee. See what your business could access.