Prompt Caching #234

tpaulshippy · 2025-06-09T07:16:13Z

What this does

Automatically opts into prompt caching in both Anthropic and Bedrock providers for Claude models that support it. And report prompt caching token counts for OpenAI and Gemini which cache automatically.

Disable prompt caching:

RubyLLM.configure do |config|
  config.cache_prompts = false # Disable prompt caching with Anthropic models
end

Caching just system prompts:

chat = RubyLLM.chat
chat.with_instructions("You are a helpful assistant.")
chat.ask("What is the capital of France?", cache: :system)

Caching just user prompts:

chat = RubyLLM.chat
chat.ask("What is the capital of France?", cache: :user)

Caching just tool definitions:

chat = RubyLLM.chat
chat.with_instructions("You are a helpful assistant.")
chat.with_tool(MyTool)
chat.ask("What is the capital of France?", cache: :tools)

Caching system prompts and tool definitions:

chat = RubyLLM.chat
chat.with_instructions("You are a helpful assistant.")
chat.with_tool(MyTool)
chat.ask("What is the capital of France?", cache: [:system, :tools])

Type of change

New feature

Scope check

I read the Contributing Guide
This aligns with RubyLLM's focus on LLM communication
This isn't application-specific logic that belongs in user code
This benefits most users, not just my specific use case

Quality check

I ran overcommit --install and all hooks pass
I tested my changes thoroughly
I updated documentation if needed
I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

New public methods/classes

Related issues

Resolves #13

tpaulshippy · 2025-06-09T21:45:24Z

@crmne As I don't have an Anthropic key, I'll need you to generate the VCR cartridges for that provider. Hoping everything just works, but let me know if not.

crmne · 2025-06-11T07:55:06Z

@tpaulshippy this would be great to have! Will you be willing to enable it on all providers?

I'll do a proper review when I can.

tpaulshippy · 2025-06-11T14:00:21Z

My five minutes of research indicates that at least OpenAI and Gemini take the approach of automatically caching for you based on the size and structure of your request. So the only support I think we'd really need for those two is to populate the cached token counts on the response messages. Unless we want to try to support explicit caching on the Gemini API but that looks complex and not as commonly needed.

Do you know of other providers that require payload changes for prompt caching?

tpaulshippy · 2025-06-11T14:06:54Z

lib/ruby_llm/providers/anthropic/media.rb

+        def with_cache_control(hash, cache: false)
+          return hash unless cache
+
+          hash.merge(cache_control: { type: 'ephemeral' })


Realizing this might cause errors on older models that do not support caching. If it does, we could raise here, or just let the API validation handle it. I'm torn on whether the capabilities check complexity is worth it as these models are probably so rarely used.

tpaulshippy · 2025-06-12T18:08:46Z

@crmne As I don't have an Anthropic key, I'll need you to generate the VCR cartridges for that provider. Hoping everything just works, but let me know if not.

Scratch that. I decided to stop being a cheapskate and just pay Anthropic their $5.

tpaulshippy · 2025-07-16T15:17:21Z

Looking to implement this in our project and now I'm wondering if it should be an opt out rather than an opt in. If you are using unique prompts every time I guess it adds some cost to cache them but my guess is in most applications prompts will get repeated, especially system prompts.

crmne

Thank you for this feature @tpaulshippy, however there are several improvements I'd like you to make before we merge this.

On top of the ones made in the comments, and the most important one, I'd like to have prompt caching implemented in all providers.

Plus I have not fully checked the logic in providers/anthropic but the patch seems a bit heavy-handed with the amount of changes needed at first glance. Where all changes necessary or could it be done in a simpler manner?

docs/guides/prompt-caching.md

lib/ruby_llm/chat.rb

lib/ruby_llm/completion_params.rb

lib/ruby_llm/provider.rb

spec/fixtures/large_prompt.txt

tpaulshippy · 2025-07-16T17:56:16Z

I'd like to have prompt caching implemented in all providers.

Did you see this? Is the request to populate the cached token counts on the response messages for OpenAI and Gemini?

crmne · 2025-07-16T18:18:29Z

Did you see this? Is the request to populate the cached token counts on the response messages for OpenAI and Gemini?

Thank you for pointing that out, I had missed it. I think it would certainly be a nice addition to RubyLLM to have all providers have almost the same level of support of caching.

tpaulshippy · 2025-07-16T18:32:58Z

Did you see this? Is the request to populate the cached token counts on the response messages for OpenAI and Gemini?

Thank you for pointing that out, I had missed it. I think it would certainly be a nice addition to RubyLLM to have all providers have almost the same level of support of caching.

Ok we have a bit of a naming issue. Here's the property names we get from each provider:

Anthropic
cache_creation_input_tokens
cache_read_input_tokens

OpenAI
cached_tokens

Gemini
cached_content_token_count

My reading of the docs indicates that the OpenAI and Gemini values correspond pretty closely with the cache_read_input_tokens in Anthropic.

What should we call these properties in the Message?

crmne · 2025-07-16T18:48:44Z

For the naming, let's go with:

cached_tokens - maps to the cache read values from all providers (the main property developers will use)
cache_creation_tokens - Anthropic-specific cache creation cost (nil for other providers)

This keeps it consistent with our existing input_tokens/output_tokens pattern while handling the provider differences cleanly.

Can you update the Message properties to use these names? Thanks Paul!

This reverts commit 31b8b0e.

This reverts commit d6f36f3.

sosso · 2025-09-24T15:13:14Z

One-shot prompt scenarios is our main use case, the above would work great. Caching support is also a blocker on us making the jump to RubyLLM, thanks all!

maximevaillancourt · 2025-10-03T20:42:10Z

Caching support is also a blocker on us making the jump to RubyLLM

No need to wait: use with_params in the meantime!

RubyLLM
  .chat(model: "claude-sonnet-4-20250514")
  .with_params(system: [{
    type: "text",
    text: "This is my very long system prompt that will get cached.",
    cache_control: { type: "ephemeral" },
  }])

sosso · 2025-10-03T22:21:12Z

Caching support is also a blocker on us making the jump to RubyLLM

No need to wait: use with_params in the meantime!

RubyLLM
  .chat(model: "claude-sonnet-4-20250514")
  .with_params(system: [{
    type: "text",
    text: "This is my very long system prompt that will get cached.",
    cache_control: { type: "ephemeral" },
  }])

Hm, When trying that @maximevaillancourt , and later doing a .ask, my system prompt doesn't end up getting into openrouter. Are you using this approach successfully?

maximevaillancourt · 2025-10-03T23:23:58Z

Are you using this approach successfully?

Yes, but worth noting that I'm using claude-sonnet-4-20250514 (the Anthropic one) directly, not an OpenRouter one, maybe that explains the difference in behaviour.

sosso · 2025-10-17T01:19:00Z

Hi @tpaulshippy @crmne -- anything we can do to help this along? Happy to help out if needed.

codecov · 2025-10-20T19:47:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.78%. Comparing base (c5c0027) to head (fe5c1e7).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #234      +/-   ##
==========================================
+ Coverage   89.72%   89.78%   +0.06%     
==========================================
  Files          36       36              
  Lines        1761     1772      +11     
  Branches      481      487       +6     
==========================================
+ Hits         1580     1591      +11     
  Misses        181      181

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sosso · 2025-10-20T20:18:32Z

Thanks for picking this back up!

Have you played around much with the 1h TTL, @tpaulshippy https://docs.claude.com/en/docs/build-with-claude/prompt-caching#1-hour-cache-duration

tpaulshippy · 2025-10-20T20:54:30Z

Have you played around much with the 1h TTL, @tpaulshippy

No I haven't. Didn't seem that useful for our scenarios. Would be a good addition to this library though.

crmne · 2025-10-21T15:28:56Z

@tpaulshippy, I really appreciate the work you poured into this.

However, I've had a nagging gut feeling the whole time. The amount of churn here never felt proportionate to the feature. This ended up rewriting a good chunk of the library for what's ultimately an Anthropic quirk.

In the end it was impossible to review this in a way that steered toward what I had in mind without actually building it myself: once I went hands-on I discovered my own earlier suggestion about with_message_params (my initial preferred provider-agnostic way of dealing with this) couldn't work because Anthropic expects the cache metadata inside the content blocks.

The exploration led to Raw Content Blocks: 869a755 - raw messages that go straight to the LLM. This way Anthropic gets its caching hooks, we can support any weird provider-specific quirk of the message contents, and we keep the core clean and provider-agnostic.

I've shipped docs https://rubyllm.com/chat/#raw-content-blocks, and updated the Rails integration, as well as making an update generator for 1.9.

Thanks again for your work and enjoy Raw Content Blocks!

sosso · 2025-10-21T16:07:40Z

This new change is working almost perfectly for us, @crmne ! One callout: we're using the OpenRouter provider (but ultimately Anthropic models primarily within that), and while the message caching with raw blocks (using RubyLLM::Providers::Anthropic::Content in an openrouter chat) is working great, the tool with_params pattern (in the tool subclasses) is not caching definitions. I think it's because of the subclass hierarchy.

Also, Anthropic's a bit strange in that their docs have you cache the last tool, https://docs.claude.com/en/docs/build-with-claude/prompt-caching#prompt-caching-examples (Caching tool definitions).

crmne · 2025-10-21T16:09:28Z

As mentioned in the docs, Tool's with_params is only implemented in Anthropic but I can quickly add that to the other providers!

tpaulshippy · 2025-10-21T16:20:24Z

Maybe we should even enable that by default (and add a configuration toggle).

I really liked this idea. Any chance we could get it? Cache the last system message, last user message, and last tool by default?

crmne · 2025-10-21T16:32:19Z

Maybe we should even enable that by default (and add a configuration toggle).

I really liked this idea. Any chance we could get it? Cache the last system message, last user message, and last tool by default?

This would mean changing the whole thing again and re-add a lot of your code only for a bit of magic in a provider quirk. Hard pass. This belongs in your app.

Also, that comment precedes the whole investigation I did.

tpaulshippy · 2025-10-21T16:37:15Z

Ok fair enough. I bring it up because one of the strengths of this library is the ability to switch between providers and models seamlessly. Since Open AI and Gemini cache by default, setting up Anthropic to do the same would be nice.

sosso · 2025-10-21T16:43:30Z

I think the difference is that Gemini and OpenAI don't charge the user extra for the cache writes, while Anthropic does.

tpaulshippy · 2025-10-21T16:45:24Z

That is true. But in most use cases they charge even more if you don't cache at all. Thus, this PR.

tpaulshippy · 2025-10-21T16:47:41Z

Even if it were opt in, a way to properly turn on caching for Anthropic in one line without having to track which will be your last tool seems like it would be nice.

crmne · 2025-10-22T09:29:19Z

@sosso done! 9916f01

tpaulshippy added 7 commits June 8, 2025 22:57

13: Failing specs

2e84006

13: Get caching specs passing for Bedrock

be61e48

13: Remove comments in specs

edec138

13: Add unused param on other providers

971f176

13: Rubocop -A

557a5ee

13: Add cassettes for bedrock cache specs

9673b13

13: Resolve Rubocop aside from Metrics/ParameterLists

c47d270

tpaulshippy changed the title ~~Prompt caching~~ Prompt caching for Claude Jun 9, 2025

tpaulshippy added 4 commits June 9, 2025 12:08

13: Use large enough prompt to hit cache meaningfully

eaf0876

13: Ensure cache tokens are being used

160d9ab

13: Refactor completion parameters

d1698bf

16: Add guide for prompt caching

344729f

tpaulshippy marked this pull request as ready for review June 9, 2025 21:44

tpaulshippy commented Jun 11, 2025

View reviewed changes

tpaulshippy added 2 commits June 12, 2025 11:02

Add real anthropic cassettes ($0.03)

7b98277

Merge branch 'main' into prompt-caching

fd30f14

crmne requested changes Jul 16, 2025

View reviewed changes

crmne added the enhancement New feature or request label Jul 16, 2025

tpaulshippy added 3 commits July 18, 2025 21:28

Switch from large_prompt.txt to 10,000 of the letter a

a91d07e

Make that 2048 * 4 (2048 tokens for Haiku)

f40f37d

Rename properties on message class

109bb51

tpaulshippy added 8 commits September 22, 2025 07:32

Revert "Remove unused hash and add example to doc"

a16d6dd

This reverts commit 31b8b0e.

Revert "Add with_provider_options and use that for opting into caching"

2e586e1

This reverts commit d6f36f3.

Merge branch 'main' into prompt-caching

7f30f58

Update docs to reflect new API

4fdc805

Take cache setting as parameter

f5c3825

Update specs and refactor a bit

581a568

Get specs passing

3da7f26

Update appraisal gemfiles

7e6fa0d

tpaulshippy requested a review from crmne September 22, 2025 15:37

tpaulshippy and others added 2 commits October 17, 2025 08:22

Merge branch 'main' into prompt-caching

40f31f5

Merge branch 'main' into prompt-caching

fe5c1e7

crmne closed this Oct 21, 2025

Uh oh!

Prompt Caching #234

Prompt Caching #234

Uh oh!

Conversation

tpaulshippy commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Type of change

Scope check

Quality check

API changes

Related issues

Uh oh!

tpaulshippy commented Jun 9, 2025

Uh oh!

crmne commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpaulshippy commented Jun 11, 2025

Uh oh!

tpaulshippy Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

tpaulshippy commented Jun 12, 2025

Uh oh!

tpaulshippy commented Jul 16, 2025

Uh oh!

crmne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tpaulshippy commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crmne commented Jul 16, 2025

Uh oh!

tpaulshippy commented Jul 16, 2025

Uh oh!

crmne commented Jul 16, 2025

Uh oh!

sosso commented Sep 24, 2025

Uh oh!

maximevaillancourt commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sosso commented Oct 3, 2025

Uh oh!

maximevaillancourt commented Oct 3, 2025

Uh oh!

sosso commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sosso commented Oct 20, 2025

Uh oh!

tpaulshippy commented Oct 20, 2025

Uh oh!

crmne commented Oct 21, 2025

Uh oh!

sosso commented Oct 21, 2025

Uh oh!

crmne commented Oct 21, 2025

Uh oh!

tpaulshippy commented Oct 21, 2025

Uh oh!

crmne commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpaulshippy commented Oct 21, 2025

Uh oh!

sosso commented Oct 21, 2025

Uh oh!

tpaulshippy commented Oct 21, 2025

tpaulshippy commented Jun 9, 2025 •

edited

Loading

crmne commented Jun 11, 2025 •

edited

Loading

tpaulshippy commented Jul 16, 2025 •

edited

Loading

maximevaillancourt commented Oct 3, 2025 •

edited

Loading

codecov bot commented Oct 20, 2025 •

edited

Loading

crmne commented Oct 21, 2025 •

edited

Loading