Skip to content

Consider Using object_store as IO Abstraction #172

@tustvold

Description

@tustvold

I have debated filing this ticket for a while, but largely held off as I wasn't sure how well it would be received, especially as I am acutely aware that this crate currently makes use of OpenDAL and @Xuanwo is an active contributor to both repositories. However, I feel it is important to have these discussions, and part of my role as a maintainer of object_store is to engage with others in the community and hear about how its offering could be made more compelling.

That all being said, I think object_store provides some quite compelling functionality that might be of particular interest to this project:

  • First-party integration with arrow-rs, parquet, DataFusion and polars, including sophisticated vectored and streaming IO
  • Support for conditional writes, which would allow iceberg-rs to support multiple concurrent writers directly against object storage, without needing an external catalog
  • A flexible configuration system developed in partnership with, and used by both the polars and delta-rs communities
  • Extensive support for the various cloud provider credential sources, with extension points for users to further customise this
  • APIs that mirror that of object stores and not filesystems, which helps to understand what and how IO is being performed, and allows support for object store specific functionality like tags, partial range requests, and more...
  • Battle-tested in multiple production systems, and with a substantial and growing user-base

The major area object_store is limited, somewhat intentionally, is in the number of first-party implementations; only supporting S3-compatible stores, Google Cloud Storage, Azure Blob Storage, in-memory and local filesystems. However, the object-safe design does allow for third-party implementations, for things like HDFS.

I look forward to hearing your thoughts, but also fully understand if this is not a discussion you would like to engage with at this time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions