Skip to content

Conversation

@Youngwb
Copy link
Contributor

@Youngwb Youngwb commented Oct 21, 2025

Add blog about how to integrate StarRocks with Apache Polaris

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your blog post, @Youngwb ! It looks good to me overall... some comments below.

Comment on lines 49 to 50
### Business Value of the Pairing
**StarRocks** delivers performance; **Polaris** delivers openness. Together they let you:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polaris as an ASF project should remain vendor-neutral. I'm not sure this paragraph as it stands fits that model 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mainly explains the advantages of combining Polaris and StarRocks, which are conclusions drawn from the characteristics of the two. Moreover, the descriptions below mention not only StarRocks but also other engines, such as Spark, Flink, and Trino.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark, Flink, Trino are mentioned as sub-cases, while StarRocks appears to be elevated as a solution with a "performance advantage". I do not question that property of StarRocks, but it does not feel right to me to highlight that in a Polaris blog post.

Also "Business Value" appears to suggest deployment choices to the reader. I believe Polaris blogs should remain purely technical to avoid the impression of favouring one compatible engine over another.

In other words, why is this section specific (pairing) to StarRocks? IMHO, the previous sections provide sufficient details to the reader in order to form an independent opinion.

It is perfectly fine to link to StarRocks documentation that provides more details, of course (as done in other sections).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Business Value" is indeed a controversial term and I'd prefer to not not use it and stay purely technical in the blog post.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified, please review it again.

@dimas-b dimas-b requested a review from jbonofre October 21, 2025 17:04
flyrain
flyrain previously approved these changes Oct 22, 2025
Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Youngwb for adding it.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Oct 22, 2025
Copy link

@xxubai xxubai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: starrocks-polaris-intergration.md -> starrocks-polaris-integration.md


## Configure StarRocks Iceberg Catalog

First, you need to have a StarRocks cluster up and running. Please refer to the [StarRocks Quick Start Guide](https://docs.starrocks.io/docs/quick_start) for instructions on setting up a StarRocks cluster.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify that it uses a Shared-Data architecture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not necessary, users can choose the architecture they need for deployment.


## Configure StarRocks Iceberg Catalog

First, you need to have a StarRocks cluster up and running. Please refer to the [StarRocks Quick Start Guide](https://docs.starrocks.io/docs/quick_start) for instructions on setting up a StarRocks cluster.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the StarRocks Quick Start Guide for instructions on setting up a StarRocks cluster.

Which section should we include that in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need to specify a particular section; users can choose a suitable deployment method through this page.

cd apache-polaris-1.1.0-incubating
```

2. Build Polaris
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are published Docker images. Why do users have to build Polaris and the admin tool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add link to Polaris quick-start guide , user can choose the way which suits them to deploy polairs

cd apache-polaris-1.1.0-incubating
```

2. Build Polaris
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: now that we use 1.1.0 in this example, why bother with building from source? 1.1.0 has binary artifacts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I also started to learn about Polaris from the Polaris Quick Start(https://polaris.apache.org/releases/1.1.0/getting-started/quickstart). This document starts with building Polaris, so I used this method before and only later learned about the binary artifacts. I added a link to the Polaris Quick Start in this documentation. I think we can add content about binary artifacts to the Polaris Quick Start later, this way, new users can directly deploy using the binary artifacts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.

Here is an example of creating an external catalog in StarRocks that connects to Polaris using credential vending:

```sql
CREATE EXTERNAL CATALOG polaris_catalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add per-engine docs under https://polaris.apache.org/in-dev/unreleased/getting-started/using-polaris/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense. It's out of scope of this PR though, this isn't a doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly not for this PR. This was just a general idea for enhancing our docs.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks @Youngwb !

Let's wait for a few more reviews.

@flyrain flyrain merged commit b9c2ee2 into apache:main Oct 28, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Oct 28, 2025
@flyrain
Copy link
Contributor

flyrain commented Oct 28, 2025

Thanks @Youngwb for working on it. Thanks @dimas-b @snazy @xxubai for review.

@Youngwb Youngwb deleted the starrocks-integration branch October 30, 2025 02:09
vchag pushed a commit to vchag/polaris that referenced this pull request Nov 5, 2025
snazy added a commit to snazy/polaris that referenced this pull request Nov 20, 2025
* Fix Jandex Maven coordinates (apache#2888)

The entry `jandex = { module = "io.smallrye.jandex:jandex", version ="3.5.0" }` is wrong (coordinates are `io.smallrye:jandex`), and Jandex is defined elsewhere as `smallrye-jandex`.
Interestingly, these (broken) coordinates seem to cause the consistent re-creation of the Quarkus 3.29.0 PR (the cause is a mystery).

* Update plugin com.gradle.develocity to v4.2.2 (apache#2597)

* Site: Hugo docs relative links (apache#2892)

* Update dependency software.amazon.awssdk:bom to v2.36.2 (apache#2901)

* Update GitHub Artifact Actions (apache#2895)

* Formatting: apply Spotless to :polaris-distribution (apache#2900)

* Build: Capture jcstress output in a log file (apache#2890)

The jcstress output is pretty verbose and prints a lot to the console.
This change captures the output in a log file. In case of a test failure, the output is logged to the console, but only in case of a failure.

* Prep: Site for 1.2 release  (apache#2877)

* Adding 1.2.0 as one of active releases (apache#2916)

Co-authored-by: Yufei Gu <yufei.apache.org>

* Use official spark image (apache#2899)

* Update dependency ipykernel to v7.1.0 (apache#2918)

* Added missing features doc (apache#2898)

* Added missing features doc

* Added missing features doc

* Site: Add a blog for StarRocks and Apache Polaris Integration (apache#2851)

* NoSQL: Node IDs - API, SPI + general implementation (apache#2728)

* NoSQL: Node IDs - API, SPI + general implementation

This PR provides a mechanism to assign a Polaris-cluster-wide unique node-ID to each Polaris instance, which is then used when generating Polaris-cluster-wide unique Snowflake-IDs.

The change is fundamental for the NoSQL work, but also demanded for the existing relational JDBC persistence.

Does not include any persistence specific implementation.

* NoSQL: Fail node-management-impl init after timeout

Also move the expensive part to a `@PostConstruct` to not block CDI entirely from initializing.

* Update dependency io.prometheus:prometheus-metrics-exporter-servlet-jakarta to v1.4.2 (apache#2929)

* Build-logic: `GitInfo` refactor (apache#2908)

Allows use of `GitInfo` for other use cases than just Jar manifest attributes.
SBOM generation will be another use case for Git information.

* Memoize ASF project information (apache#2909)

Information included in Polaris publications pulls some information about the project from ASF project metadata sources (Whimsey).
This information is currently only used when generating Maven poms, but will also be needed in SBOMs.

This change adds a new, memoized `AsfProject` information object, which holds the project infromation from Whimsey.

* Build: Simplify signing + fix execution in polaris-distribution (apache#2906)

This change simplifies generation of non-publication artifacts by adding a function taking the task which outputs shall be signed. That function takes care of setting up the correct task dependencies and task execution.

Also fixes an issue that signing does not always happen when running `./gradlew :polaris-distribution:assemble`, because the task dependency graph for the archive tasks and the corresponding signing tasks isn't properly set up.

* Proposed Test Fix (apache#2936)

Co-authored-by: Travis Michael Bowen <[email protected]>

* Update docker.io/prom/prometheus Docker tag to v3.7.3 (apache#2944)

* Update Quarkus Platform and Group to v3.29.0 (apache#2934)

* Update Gradle to v9.2.0 (apache#2938)

Co-authored-by: Robert Stupp <[email protected]>

* Update dependency openapi-generator-cli to v7.17.0 (apache#2940)

* Implement OpaPolarisAuthorizer (apache#2680)

* Update dependency com.github.ben-manes.caffeine:caffeine to v3.2.3 (apache#2923)

* Prefer PolarisPrincipal.getRoles in Resolver (apache#2925)

it should be sufficient to rely on `SecurityContext.getUserPrincipal`
alone, we dont need to call `isUserInRole` explicitly.

note due to the `ResolverTest` testing with non-existent roles we have
to add null-filtering to the `Resolver`.

* Move `nodeids` to `nosql` package parent (apache#2931)

Following up on apache#2728 this change moves "nodeids" code to the
`org.apache.polaris.persistence.nosql.nodeids` package.

* Update actions/stale digest to 39bea7d (apache#2950)

* Update dependency org.junit:junit-bom to v5.14.1 (apache#2951)

* docs(2843): Add documentation around Polaris-Tools (apache#2946)

* Add documentation around Polaris-Tools
* Related to apache#2843

* Add getting started with Apache Ozone (apache#2853)

* Add getting started with Apache Ozone

Use Apache Ozone as an example S3 impl. that does not have STS.

* fix typo in MinIO readme

* Update dependency com.azure:azure-sdk-bom to v1.3.0 (apache#2754)

* docs: add feature configuration section to Hive federation guide (apache#2952)

Add documentation for required feature flags when enabling
Hive Metastore federation. Users must configure three properties
in `application.properties` before Hive federation will work:

- `SUPPORTED_CATALOG_CONNECTION_TYPES`
- `SUPPORTED_EXTERNAL_CATALOG_AUTHENTICATION_TYPES`
- `ENABLE_CATALOG_FEDERATION`

Inspired from [this](https://apache-polaris.slack.com/archives/C084XDM50CB/p1761851426511259) Slack thread.

Co-authored-by: Prathyush Shankar <[email protected]>

* Change getting-start docker file to use official spark image from outdated jupyter image (apache#2943)

* Use official spark image

* Use official spark image

* Use official spark image

* Use official spark image

* Use official spark image

* Use Iterable for realms in BootstrapCommand (apache#2956)

* Simplify digest generation (apache#2907)

Similarly to the change to simplify artifact signing, this change simplifies digest generation by introducing a function to digest the output files of any task. That function takes care of setting up the correct task dependencies and task execution.

Also removes an unnecessary double buffering during digest generation.

* Build: `GitInfo` function to build a raw github content URL (apache#2910)

* NoSQL: nodeids renames

* NoSQL: Update test for Caffeine 3.2.3

The read of `Eviction` properties is "just" a volatile read since Caffeine 3.2.3 and trigger cleanups asynchronously. Before 3.2.3, cleanups happened synchronously.  This change breaks the initially present assertions of this test, but not the functionality of the production code.

See ben-manes/caffeine#1897

* Last merged commit cec41c4

---------

Co-authored-by: Mend Renovate <[email protected]>
Co-authored-by: olsoloviov <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: Yufei Gu <[email protected]>
Co-authored-by: Yong Zheng <[email protected]>
Co-authored-by: Youngwb <[email protected]>
Co-authored-by: Travis Bowen <[email protected]>
Co-authored-by: Travis Michael Bowen <[email protected]>
Co-authored-by: Sung Yun <[email protected]>
Co-authored-by: Christopher Lambert <[email protected]>
Co-authored-by: Dmitri Bourlatchkov <[email protected]>
Co-authored-by: Adam Christian <105929021+adam-christian-software@users.noreply.github.com>
Co-authored-by: carc-prathyush-shankar <[email protected]>
Co-authored-by: Prathyush Shankar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants