Stop recomputing
the same tokens.

LMCache stores and reuses KV caches across requests, so your LLM serves repeated context from cache instead of re-running attention every time.

INSTALL LMCACHE READ THE DOCS

CACHE TELEMETRYREADOUT

cache hit rate

TTFT reduction

reused KV blocksWARM

backendvLLM

READOUT ILLUSTRATIVE · GAINS DEPEND ON WORKLOAD

// why a cache layer

KV-01

Shared prefixes and repeated documents are served from cache, not recomputed per request.

KV-02

Skipping prefill on cached context cuts time-to-first-token on long, repeated prompts.

KV-03

Built to slot into existing LLM serving stacks instead of replacing them.