Skip to content

Conversation

@adutra
Copy link
Contributor

@adutra adutra commented Feb 11, 2025

3 integration tests currently require a temporary folder for various purposes. The folders are generally hard-coded to some local filesystem location, which makes those tests impossible to execute against a remote Polaris instance.

This PR aims at fixing this situation by leveraging a new optional environment variable that points to a location that is accessible by both the server and the client running the tests. This would typically be an S3 bucket, for example, but could also be a shared volume mounted on both machines at the same mount point. If this environment variable is not present, then local execution is assumed and a temporary directory managed by JUnit is used instead.

3 integration tests currently require a temporary folder for various purposes. The folders are generally hard-coded to some local filesystem location, which makes those tests impossible to execute against a remote Polaris instance.

This PR aims at fixing this situation by leveraging a new optional environment variable that points to a location that is accessible by both the server and the client running the tests. This would typically be an S3 bucket, for example, but could also be a shared volume mounted on both machines at the same mount point. If this environment variable is not present, then local execution is assumed and a temporary directory managed by JUnit is used instead.
@adutra
Copy link
Contributor Author

adutra commented Feb 11, 2025

@dimas-b @andrew4699 FYI.

.map(URI::create)
.orElse(tempDir.toUri());
s3BucketBase =
Optional.ofNullable(System.getenv("INTEGRATION_TEST_S3_PATH"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could get rid of this environment variable and rely solely on INTEGRATION_TEST_TEMP_DIR, but I wanted this change to remain backwards-compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep backward compatibility here? Is this environment variable expected to be used in CI pipelines, or is it mainly for local debugging? If it's mostly for local use, consolidating into a single variable might make things simpler and help avoid confusion. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's consolidate then 👍

@flyrain
Copy link
Contributor

flyrain commented Feb 12, 2025

cc @HonahX

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this PR LGTM, but I have minor URI handling concerns

Optional.ofNullable(System.getenv("INTEGRATION_TEST_TEMP_DIR"))
.map(URI::create)
.orElse(tempDir.toUri())
.resolve(realm + "/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have unexpected behaviour... IIRC s3://bucket/a/b .resolve("realm/") will yield s3://bucket/a/realm/... Did you mean that?

It might be best to do a simple string append and then call URI.normalize().

Also, some valid S3 locations do not pass URI validation checks cf. projectnessie/nessie#8328 .... so in the end I think it might be best to stick with string concat until Polaris gets tooling for dealing with S3 locations that are not proper URIs 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have unexpected behaviour... IIRC s3://bucket/a/b .resolve("realm/") will yield s3://bucket/a/realm/... Did you mean that?

No, of course; the expectation is that INTEGRATION_TEST_TEMP_DIR must end with a trailing slash.

It might be best to do a simple string append and then call URI.normalize().

Yep, let me change that.

Also, some valid S3 locations do not pass URI validation checks [...] I think it might be best to stick with string concat

I remember the pain, but OTOH I dislike writing stringly-typed code. Do you think the issues we had in Nessie would manifest here? Honestly I expect these URIs to be as simple as s3://test-bucket, so they should be parsable. And also, as you said, Polaris is not ready yet to handle those problematic S3 locations.

Copy link
Contributor

@dimas-b dimas-b Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we can assume that Polaris tests use only well-formed URIs as base locations.

* "s3://bucket/polaris"}. If the URI does not have a scheme, it will be assumed to be a local
* file URI.
*/
public static URI getTemporaryDirectory(Path defaultLocalDirectory) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be lovely if Path could be annotated with @TempDir but unfortunately that does not work:

junit-team/junit-framework#1786

@adutra adutra merged commit 839bc25 into apache:main Feb 25, 2025
5 checks passed
@adutra adutra deleted the temp-dir-it branch February 25, 2025 11:01
eric-maynard added a commit to eric-maynard/polaris that referenced this pull request Mar 13, 2025
* Publish proposals and roadmap on the website (apache#1043)

* Allow Polaris CLI to set property values that contain '=' (apache#1003)

Currently the CLI parsing is too pessimistic and throws an error if
it detects '=' in the parsed value of a property. However, the split 1
semantics are already standard behavior for most parsers where the
right-hand-side is allowed to contain '='.

This commonly comes up if the value is itself a comma-separated kv list
or if it the value is a base64-encoded string.

* main: Update postgres Docker tag to v17.4 (apache#1049)

* main: Update dependency org.junit:junit-bom to v5.12.0 (apache#1045)

* Revert "Remove CallContext and its ThreadLocal usage" (apache#1000)

* revert b84f462

* resolve conflicts

* autolint

* autolint

* ...not possible to automatically add a synthetic no-args constructor to an unproxyable bean class.

* fix

* fixed non-quarkus build issues

* fixed non-quarkus build issues

* autolint

* more di issues

* add more producers

* add another producer

* autolint

* change callcontext usage in FileIOFactoryTest

* stablize tests

* autolint

* fix a test

* refactor TableCleanupTaskHandlerTest

* another fix, still npe

* autolint

* only task cleanup is problematic

* autolint

* Stabilize more tests

* autolint

* Avoid propagating old callcontexts across TestServices instances

* testservices update

* autolint

* Revert "autolint"

This reverts commit 66a7f22.

* Revert "testservices update"

This reverts commit 897a5a1.

* changes to taskexecutorimpl

* autolint

---------

Co-authored-by: Dennis Huo <[email protected]>

* Enable console color when running Polaris with Gradle (apache#977)

* CLI: add subcommand access for principals (apache#1019)

* Access subcommand access for principals

* Access subcommand access for principals

* Fix getting start spark (apache#1054)

* Use correct link format for adoc and update menu title for proposals (apache#1051)

* Add a debug log with the full exception (apache#1023)

* clean up

* spotless

* clean up

* refactor message

* no CVEM

* change log levels

* revert polaris-java dep change

* add testImplementation

* uppercase

* Remove all tightly coupled EntityCache dependencies in the main persistence stack (apache#1055)

Remove the EntityCacheEntry wrapper since the timestamp fields were never used anyways;
instead the underlying Caffeine cache transparently handles access times, and the
types of entries we cache are simply the existing ResolvedPolarisEntity.

Remove interactions of business logic with explicit "cache entries", instead
operating on ResolvedPolarisEntity. Fix the equals()/hashCode() behavior of
PolarisEntity to be compatible with PolarisBaseEntity as intended.

Improve code comments to explain the (current) relationship between PolarisEntity
and PolarisBaseEntity, and clarify the behaviors in Resolver.java.

Fully remove the PolarisRemoteCache interface and its methods. Add different
methods that aren't cache-specific instead.

* Remove PolarisMetaStoreSession from FileIOFactory/FileIOUtil in favor of CallContext (apache#1057)

This appeared to be some leaky divergence that occurred after CallContext had been
removed, but PolarisMetaStoreSession really is only a low-level implementation detail
that should never be handled by BasePolarisCatalog/FileIOFactory.

This plumbs CallContext explicitly into the FileIOFactory and FileIOUtil methods and
thus removes a large source of CallContext.getCurrentContext calls; now the threadlocal
doesn't have to be set at all in BasePolarisCatalogTest.

* Cleanup Iceberg Manifest list files on Purge (apache#1039)

* Fix small quickstart typo (apache#1061)

* Enable rat check on md files and fix the existing ones (apache#1050)

* main: Update dependency boto3 to v1.37.0 (apache#1052)

* main: Update dependency software.amazon.awssdk:bom to v2.30.27 (apache#1053)

* Integration tests: improve support for temporary folders (apache#987)

3 integration tests currently require a temporary folder for various purposes. The folders are generally hard-coded to some local filesystem location, which makes those tests impossible to execute against a remote Polaris instance.

This PR aims at fixing this situation by leveraging a new optional environment variable that points to a location that is accessible by both the server and the client running the tests. This would typically be an S3 bucket, for example, but could also be a shared volume mounted on both machines at the same mount point. If this environment variable is not present, then local execution is assumed and a temporary directory managed by JUnit is used instead.

* build-logic: exclude generated code from errorprone (apache#1035)

* build-logic: let javadoc depend on jandex (apache#1034)

* Add Prashant in the collaborators list (apache#1064)

* main: Update dependency org.slf4j:slf4j-api to v2.0.17 (apache#1065)

---------

Co-authored-by: JB Onofré <[email protected]>
Co-authored-by: Dennis Huo <[email protected]>
Co-authored-by: Mend Renovate <[email protected]>
Co-authored-by: Eric Maynard <[email protected]>
Co-authored-by: Dennis Huo <[email protected]>
Co-authored-by: Alexandre Dutra <[email protected]>
Co-authored-by: MonkeyCanCode <[email protected]>
Co-authored-by: Andrew Guterman <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: Robert Stupp <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants