Skip to content
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -133,11 +133,6 @@ constructor(private val softwareComponentFactory: SoftwareComponentFactory) : Pl

suppressPomMetadataWarningsFor("testFixturesApiElements")
suppressPomMetadataWarningsFor("testFixturesRuntimeElements")

if (project.tasks.findByName("createPolarisSparkJar") != null) {
// if the project contains spark client jar, also publish the jar to maven
artifact(project.tasks.named("createPolarisSparkJar").get())
}
}

if (
Expand Down
8 changes: 5 additions & 3 deletions plugins/spark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,17 @@ Right now, the plugin only provides support for Spark 3.5, Scala version 2.12 an
and depends on iceberg-spark-runtime 1.9.0.

# Build Plugin Jar
A task createPolarisSparkJar is added to build a jar for the Polaris Spark plugin, the jar is named as:
A shadowJar task is added to build a jar for the Polaris Spark plugin, the jar is named as:
`polaris-spark-<sparkVersion>_<scalaVersion>-<polarisVersion>-bundle.jar`. For example:
`polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT-bundle.jar`.

- `./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar` -- build jar for Spark 3.5 with Scala version 2.12.
- `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` -- build jar for Spark 3.5 with Scala version 2.13.
- `./gradlew :polaris-spark-3.5_2.12:shadowJar` -- build jar for Spark 3.5 with Scala version 2.12.
- `./gradlew :polaris-spark-3.5_2.13:shadowJar` -- build jar for Spark 3.5 with Scala version 2.13.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: :assemble is higher level (less internal details visible to end users) and does not have much overhead compared to shadowJar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is still good to let user know that there is a specific task to just produce the jar in case they don't want to run assemble or build. I added one sentence below to mention that this task is also executed when run gradlew assemble or gradlew build. so user can choose whatever way they want.


The result jar is located at plugins/spark/v3.5/build/<scala_version>/libs after the build.

The shadowJar task is also executed automatically when you run `gradlew assemble` or `gradlew build`.

# Start Spark with Local Polaris Service using built Jar
Once the jar is built, we can manually test it with Spark and a local Polaris service.

Expand Down
14 changes: 10 additions & 4 deletions plugins/spark/v3.5/spark/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,10 @@

import com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar

plugins { id("polaris-client") }
plugins {
id("polaris-client")
id("com.gradleup.shadow")
}

// get version information
val sparkMajorVersion = "3.5"
Expand Down Expand Up @@ -112,7 +115,7 @@ dependencies {
}
}

tasks.register<ShadowJar>("createPolarisSparkJar") {
tasks.named<ShadowJar>("shadowJar") {
archiveClassifier = "bundle"
isZip64 = true

Expand All @@ -135,8 +138,11 @@ tasks.register<ShadowJar>("createPolarisSparkJar") {
exclude(dependency("org.apache.avro:avro*.*"))
}

relocate("com.fasterxml", "org.apache.polaris.shaded.com.fasterxml.jackson")
relocate("com.fasterxml", "org.apache.polaris.shaded.com.fasterxml")
relocate("org.apache.avro", "org.apache.polaris.shaded.org.apache.avro")
}

tasks.withType(Jar::class).named("sourcesJar") { dependsOn("createPolarisSparkJar") }
// ensure the shadowJar job is run for both `assemble` and `build` task
tasks.named("assemble") { dependsOn("shadowJar") }

tasks.named("build") { dependsOn("shadowJar") }
11 changes: 11 additions & 0 deletions site/content/in-dev/unreleased/polaris-spark-client.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,14 @@ The Polaris Spark client has the following functionality limitations:
3) Rename a Delta table is not supported.
4) ALTER TABLE ... SET LOCATION is not supported for DELTA table.
5) For other non-Iceberg tables like csv, it is not supported.

## Iceberg Spark Client compatibility with Polaris Spark Client
The Polaris Spark client today depends on a specific Iceberg client version, and the version dependency is described
in the following table:

| Spark Client Version | Iceberg Spark Client Version |
|----------------------|------------------------------|
| 1.0.0 | 1.9.0 |

The Iceberg dependency is automatically downloaded when the Polaris package is downloaded, so there is no need to
add the Iceberg Spark client in the `packages` configuration.
Loading