Skip to content

Conversation

@morningman
Copy link
Contributor

Add blog about how to integrate Apache Doris with Polaris

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @morningman! The guide looks pretty useful to me! I have some thoughts about the location for this page (below).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any easy links to existing blog posts from the main project site 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR to add it, #2575


With the continuous evolution of data lake technologies, efficiently and securely managing massive datasets stored on object storage (such as AWS S3) while providing unified access endpoints for upstream analytics engines (like [Apache Doris](https://doris.apache.org)) has become a core challenge in modern data architectures. [Apache Polaris](https://polaris.apache.org/), as an open and standardized REST Catalog service for Iceberg, provides an ideal solution to this challenge. It not only handles centralized metadata management but also significantly enhances data lake security and manageability through fine-grained access control and flexible credential management mechanisms.

This document will provide a detailed guide on integrating Apache Doris with Polaris to achieve efficient querying and management of Iceberg data on S3. We'll guide you through the complete process from environment preparation to final data querying step by step
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it may be worth placing this page under the docs section > Getting Started... WDYT?

-- Enable credential vending
'iceberg.rest.vended-credentials-enabled' = 'true',
-- S3 basic configuration (no keys required)
's3.endpoint' = 'https://s3.us-west-2.amazonaws.com',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: Starting with 1.1.0 Polaris can provide endpoints to clients automatically. Cf. #1913

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! This is actually a limitation on the Doris side — we currently need to recognize the storage type through an explicit parameter. We’ll look into improving this in the future.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Thanks a lot for working on it, @morningman ! This blog not only shows case how Apache Doris works together with Polaris, but also demonstrates a detailed end-to-end setup. It'd super helpful for anyone want to try similar deployment.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Sep 15, 2025

### 2. Polaris Deployment and Catalog Creation

With the environment ready, we'll now deploy the Polaris service and configure the Iceberg Catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not a blocker] wondering if you had change to look at this script in the repo :
https://github.com/apache/polaris/blob/main/getting-started/assets/cloud_providers/deploy-aws.sh

it automatically sets up polaris env with bucket creation etc, wondering if that is something we can leverage

Copy link
Contributor

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @morningman, thanks for this contribution - but I'm against the blog as it stands in the PR now for a few different reasons:

  • You've rewritten the instructions for most of what is already available here and here. For maintenance of the instructions in this PR, we should really put the Getting Started instructions for Doris within this section - similar to how we have it for Spark and Trino. That way we can reuse the existing flows for all cloud providers as well.
  • There are no instructions currently for how to start Apache Doris. Personally, I find it important to see the instructions on how to start Apache Doris included somewhere as well - not all users will have instance(s) of it already. Either we should add them to the existing Kubernetes Docker Compose files (if Doris has a prebuilt image) or how to install it locally within this section here.

I'm excited to see Apache Doris' Getting Started flow added to our documentation - but would prefer if we conform to the formats we've already created to maintain uniformity :)

@github-project-automation github-project-automation bot moved this from Ready to merge to PRs In Progress in Basic Kanban Board Sep 15, 2025
Copy link
Contributor

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatting with @flyrain offline - I still believe we should be augmenting the Getting Started documentation rather than a stand-alone blog. IMO it is a much ROI to do that then releasing a blog stating that we've updated the documentation with that information. (Or publishing the blog in addition to adding it to the Getting Started documentation). I strongly believe that keeping Apache Doris congruent to other query engines would be a win-win for both Doris and Polaris.

But I understand that this is a PR solely for blog purposes. Approving here to remove the "Request Changes", but I still highly recommend making the changes for the Getting Started documentation as a higher priority.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Sep 16, 2025
@morningman
Copy link
Contributor Author

morningman commented Sep 16, 2025

Chatting with @flyrain offline - I still believe we should be augmenting the Getting Started documentation rather than a stand-alone blog. IMO it is a much ROI to do that then releasing a blog stating that we've updated the documentation with that information. (Or publishing the blog in addition to adding it to the Getting Started documentation). I strongly believe that keeping Apache Doris congruent to other query engines would be a win-win for both Doris and Polaris.

But I understand that this is a PR solely for blog purposes. Approving here to remove the "Request Changes", but I still highly recommend making the changes for the Getting Started documentation as a higher priority.

I’m fine with following the same rules as other engines. For now, this blog is mainly meant to showcase an integration case(not a "document"). Later on, I can also improve the current “Getting Started” guide to include Doris as another query engine, similar to Spark or Trino.

@flyrain
Copy link
Contributor

flyrain commented Sep 16, 2025

Awesome! I'm excited to see Doris as another engines in the doc. We could make it happen in another PR. Glad @adnanhemani pointed it out.

Thanks a lot @morningman for adding the blog! Thanks @dimas-b @singhpk234 @adnanhemani for the review!

@flyrain flyrain merged commit 97c191a into apache:main Sep 16, 2025
14 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants