-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19385. S3A: Add iceberg bulk delete test #7316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
HADOOP-19385. S3A: Add iceberg bulk delete test #7316
Conversation
09b4682
to
d37310c
Compare
🎊 +1 overall
This message was automatically generated. |
<type>test-jar</type> | ||
</dependency> | ||
|
||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
"fs.s3a.performance.flags"; | ||
|
||
/** | ||
* All performance flags in the enumeration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@value
* @return a configuration for subclasses to extend | ||
*/ | ||
|
||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cut for now, or make this suite parameterized on bulk delete enabled
* Format tests. | ||
*/ | ||
|
||
package org.apache.fs.test.formats; No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline
|
||
private static final int DELETE_FILE_COUNT = 7; | ||
|
||
@Parameterized.Parameters(name = "multiobjectdelete-{0}-usebulk-{1}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flip params and add a javadoc
* The classic invocation mechanism reports a failure. | ||
*/ | ||
@Test | ||
public void testDeleteDirectory() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand detail in name directory does not happen
public void testDeleteManyFiles() throws Throwable { | ||
Path path = methodPath(); | ||
final FileSystem fs = getFileSystem(); | ||
final List<Path> files = createFiles(fs, path, 1, DELETE_FILE_COUNT, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+add a local temp file to show mixing of filesystems
assertSuccessfulBulkDelete(bulkDelete_delete(getFileSystem(), basePath, paths)); | ||
} | ||
|
||
public static List<String> stringList(List<Path> files) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadocs and explain why not .toURI()
} | ||
|
||
@Test | ||
public void testDeleteManyFiles() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment that this is how it is expected to be used
/** Is bulk delete enabled on hadoop runtimes with API support: {@value}. */ | ||
public static final String ICEBERG_BULK_DELETE_ENABLED = "iceberg.hadoop.bulk.delete.enabled"; | ||
|
||
private static final int DELETE_PAGE_SIZE = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain choice of this and the delete file count
final FileSystem fs = getFileSystem(); | ||
final List<Path> files = createFiles(fs, path, 1, DELETE_FILE_COUNT, 0); | ||
try (HadoopFileIO fileIO = createFileIO()) { | ||
fileIO.deleteFiles(stringList(files)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add iostats assertions
} | ||
|
||
/** | ||
* Get a file status value or, if the path doesn't exist, return null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps an opportunity to return Optional<FileStatus>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Actually, maybe we should put that into some new IOFunctions class in the util.functional package, which we can use to make some of the basic IO ops stuff we can chain together. Presumably you have stuff which can go in there too
basePath = path(getClass().getName()); | ||
dynamicWrappedIO = new DynamicWrappedIO(); | ||
pageSize = dynamicWrappedIO.bulkDelete_pageSize(fs, basePath); | ||
fs.mkdirs(basePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert that mkdirs
succeeds (returns true
)?
} | ||
|
||
public static List<String> stringList(List<Path> files) { | ||
return files.stream().map(p -> toString(p)).collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be reduced to:
return files.stream().map(Path::toString).collect(Collectors.toList());
...and then remove the static toString
method below? It just seems to turn around and call toString()
on the object.
Chris, not forgotten this. Need to get that iceberg pr in first, with this one showing we can do it. Though I would like to get that logging in to the next hadoop release, |
hadoop-tools/hadoop-aws/pom.xml
Outdated
|
||
<!-- Adds a *test only* java 17 build profile--> | ||
<profile> | ||
<id>java-17-or-later</id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<id>java-17-or-later</id> | |
<id>jdk17+</id> |
better to follow
https://github.com/apache/maven-apache-parent/blob/ce14a4641f0f21f98e51f564e9d290910ae75e7c/pom.xml#L500
## <a name="java17"></a> Java 17 Tests | ||
|
||
This module includes a test source tree which compiles and runs on | ||
Java 17+ _only_. This is to allow external libraries to be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg 1.8 requires Java 11+, instead of Java 17+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? Even better. I do seem to have to switch to a java17 JVM for a build though, but maybe that's because I do that for parquet.
This means I'll be able to wire yetus up to CI this even before we've got that java17 image up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually going to add a special fileformat profile which is needed to explicitly include the format stuff, rather than do it by java version.
why so? stops another loop getting into the build. there's enough going on there with avro and hbase being hadoop dependencies...keeping format (regression) testing optional will keep release engineers happy. Or at least not unhappy with me
d37310c
to
8d9201e
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Add Iceberg core to the hadoop-aws test classpath. Iceberg is java17+ only, so this adds * A new test source path src/test/java17 * A new profile "java-17-or-later" which includes this and declares the dependency on iceberg-core. The new test is ITestIcebergBulkDelete; it is parameterized Iceberg bulk delete enabled/disabled and s3a multipart delete enabled/disabled. There is a superclass contract test org.apache.fs.test.formats.AbstractIcebergDeleteTest To support future stores which implement bulk delete. This is currently a minimal superclass; all tests are currently in ITestIcebergBulkDelete Change-Id: Ica8682f4efdd0cb04ca7f4762b2e47d396552910
Code in sync with PRs, more assertions, use other parts of API for coverage there too.
use listing API to understand it
All parameterized contract test setup is utterly broken. Need to think of a way to address that, probably something where contract setup is postponed
d2fef70
to
1a3d25f
Compare
💔 -1 overall
This message was automatically generated. |
tested with local build as
this is now parameterized for junit5 and works, will now need to fix up the rests of the tests, which will need pulling d2aec90 to trunk, fixing up what was needed here (all setup/teardown to tag as beforeAll/afterAll even though superclass method already is) |
Tests are invoked iff "format-tests" option is set. this means that a java17 build doesn't require a compatible version of iceberg (or later, other formats) just to build. This avoids adding more loops in a from-scratch build; there are enough with Avro and HBase.
0830758
to
3a3e6f3
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Forms initial iceberg page. To add * analytics stream * using s3 as the URLs
💔 -1 overall
This message was automatically generated. |
Add Iceberg core to the hadoop-aws test classpath to verify that it is correctly
invoked through the HadoopFileIO component when enabled in iceberg
Iceberg is java17+ only, so this adds
The new test is ITestIcebergBulkDelete; it is parameterized Iceberg bulk delete enabled/disabled and s3a multipart delete enabled/disabled.
There is a superclass contract test
This is to support future stores which implement bulk delete. This is currently a minimal superclass; all tests are currently in
ITestIcebergBulkDelete
How was this patch tested?
Fun!
Once we have iceberg PR 10233 merged in, we can merge this. Until then it doesn't currently compile as for testing unless we move to DynMethods there (maybe I should)
Important: Yetus cannot test this yet!
Yetus doesn't have a java17 run, so the new test is not executed. Even if the Iceberg Jar was there
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?