Skip to content

Conversation

@dentiny
Copy link
Contributor

@dentiny dentiny commented Sep 21, 2025

What changes are included in this PR?

object storage cache is introduced at #512, which seems to be used for (internal) scan only for now.

I did some profiling on moonlink CPU, and found loading manifest list takes 23% CPU.
image

So I would like to expose access to object storage and attempt to get manifest list myself.
I think it's beneficial for all accesses requiring manifest and manifest list, which is not confined to only scan workload.

Are these changes tested?

No-op change, just visibility update.

@liurenjie1024
Copy link
Contributor

It's a small change, but I have concerns about the purpose of this pr. Do you mean to fill the cache by yourself? Why not just provide your own cache implementation?

@dentiny
Copy link
Contributor Author

dentiny commented Sep 22, 2025

Do you mean to fill the cache by yourself? Why not just provide your own cache implementation?

Thanks for the quick review!

I don't plan to do any writes, the only purpose for exposure is to share the cache content for multiple reads from both iceberg-rust internal scan and external usage.

@liurenjie1024
Copy link
Contributor

I have concerns exposing this cache. It's for internal usage only and should be transparent to user.

@dentiny
Copy link
Contributor Author

dentiny commented Sep 24, 2025

I have concerns exposing this cache. It's for internal usage only and should be transparent to user.

My question is:

  • Could you please share more about your concern? If we don't want exposed cache to be written, we could hide the write interface by visbility?
  • My hope is to share object cache inside and outside the crate, without which I have to afford an IO operation anyway, and likely a re-implementation of whatever-cache

@liurenjie1024
Copy link
Contributor

Could you please share more about your concern? If we don't want exposed cache to be written, we could hide the write interface by visbility?

The concern is as I said, cache should be transparent to user, so it should not show up in public interface.

My hope is to share object cache inside and outside the crate, without which I have to afford an IO operation anyway, and likely a re-implementation of whatever-cache

I think the right direction is to enhance the logic to use cache for manifest load/store, rather than exposing to user to allow them to manipulate it outside.

@dentiny
Copy link
Contributor Author

dentiny commented Sep 25, 2025

I think the right direction is to enhance the logic to use cache for manifest load/store, rather than exposing to user to allow them to manipulate it outside.

gotcha, sounds like the write through / read through cache I mentioned in the comment
Let me have a try

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants