-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-13230. S3A to optionally retain directory markers #2149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-13230. S3A to optionally retain directory markers #2149
Conversation
|
testing against s3 ireland; not run anything other than the (now expanded) cost tests |
|
tested s3 ireland with params one failure cause is when status == null is passed in the failure happens is raised in the .get() on the future, but the operation expects it to fail in the first read(), which is only true with a file status is supplied) this test path is only followed on an unversioned bucket, presumably that's what my test setup now is, and why it always worked before. Works in IDE. May be a TTL expiry of s3guard record on overloaded test run. |
9fb5ebf to
7951cd3
Compare
|
latest update has marker tool with tests, no docs. We are going to need a full page on directory markers and compatibility, aren't we? Tests in progress, failure on Seen this before -assume cause is that I've turned off bucket checks. Really test should combine a create and a list / and expect a failure in either place |
|
see #2170 for the <= 3.2 subset of this patch, needed to not mistake paths with dir markers as empty. As well as not deleting markers on file creation, I'm going to add the option of not doing any re-creation of markers when files are deleted. This reduces IO on file deletion at the cost of a common assumption: "after you delete all files in a directory -the directory still exists". I don't know what that's going to break, so it is going to be explicitly optional, with the recovery policy to be true/false/authoritative |
92ef217 to
fa83f8d
Compare
|
|
|
@iwasakims yeah, been working on that new test. Here's my latest test run. Test run; s3 london, bucket KMS encrypted, build params: Two failures are related to empty dir markers; encryption over rename could be unrelated (only just turned KMS on yesterday, but...) |
|
filed [https://issues.apache.org/jira/browse/HADOOP-17167](https://issues.apache.org/jira/browse/HADOOP-171670 over the ITestS3AEncryptionWithDefaultS3Settings failure; unrelated, happens on trunk with my s3 config |
|
Overnight test run failures. This is with a change not pushed up, where during teardown if FS != keep we do an audit of the store. Aim is to make sure that 100% of the FS operations don't leave markers in delete mode, so as to be fully compatible with existing releases. One unexpected seek failure (!) and the landsat select test is timing out in parallel (6) test runs, but not standalone. This is consistent for me, new this week. really weird where it times out though -log4j. Deadlock? |
|
followup thought: I'm running with log @ debug...this log statement will only be hit in that mode. So until this week I wouldn't have been running in quite the same config. Maybe we've just found a log4j problem? |
|
💔 -1 overall
This message was automatically generated. |
|
last local batch run;
fail with |
|
💔 -1 overall
This message was automatically generated. |
Change-Id: I93097ff39f7254f18f8d382ad891002502c7112d
Change-Id: I1093fcce9737ac18a5e9a6cfb4f31da6731faf70
-always have one run in auth mode -cleaning up and javadocing -plan to pull up a base class and then move the delete and rename ops away Change-Id: I7fbf885de73eccc0bc842a4987ac12682ff1c63c
…category Change-Id: Ifadd671a6e39a4a8f4c4a6ab600132bc812ef79e
|
latest patch tweaks markertool, but also adds pathcapabilities probes to the s3a store so you can see if an instance is (a) markeraware and (b) whether markers are being kept or deleted on a given path. Look at the docs for details. branch-3.2 doesn't support Path capabilities, but it does do it for stream capabilities; I'll extend cloudstore to have a |
|
💔 -1 overall
This message was automatically generated. |
| * Execute a full recursive rename. | ||
| * The source is a file: rename it to the destination. | ||
| * @throws IOException failure | ||
| * There is a special handling of directly markers here -only leaf markers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: directory
| * @throws IOException if metadata store update failed | ||
| */ | ||
| @RetryTranslated | ||
| public static boolean refreshEntry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we just put again? or just update the modtime if needed? if we only update modtime then the method name should reflect that.
| And, historically, When a path is listed, if a marker to that path is found, *it | ||
| has been interpreted as an empty directory.* | ||
|
|
||
| ## <a name="problem"></a> Scale issues related to directory markers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put the title in the html tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed the title :)
| 1. There's also the overhead of actually issuing the request and awaiting the | ||
| response. | ||
|
|
||
| Issue #2 has turned out to cause significant problems on some interactions with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's because how markdown works. it will do the ordering
| @Test | ||
| public void testDeleteEncryptedObjectWithDifferentKey() throws Exception { | ||
| requireUnguardedFilesystem(); | ||
| //requireUnguardedFilesystem(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is commented out, should it be removed instead?
| |-------------------------|-------------------------| | ||
| | `fs.s3a.capability.directory.marker.aware` | Does the filesystem support surplus directory markers? | | ||
| | `fs.s3a.capability.directory.marker.keep` | Does the path retain directory markers? | | ||
| | `fs.s3a.capability.directory.marker.delete` | Does the path delete directory markers? | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add all the possible values of this propery this to this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's on a specific path, meaning in auth mode it will actually give you the real retain/delete state, rather than just "we retain on authoritative"; That's why I'd left it out. At this level (Application, not shell) my view is that the caller is more interested in what happens at that level...especially as the actual path to authoritativeness isn't visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...maybe we should have some props for bucket along with path, e.g. marker.path.keep and marker.path.delete for the dynamic ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with fs.s3a.directory.marker.retention=delete against ireland. No issues. There were some nits and things to change, otherwise LGTM.
|
It looks good to me overall as I did not see relevant integration tests failures with the patch applied. It is a bit confusing that there are It would help reviewing to split s3guard related part and performance tests to follow-up JIRAs. We can commit essential part first if it retains present default (DELETE) behaviour.
|
adding the checks to bucket-info, with tests for this, provides a straightforward way to verify that an s3a client is compatible with kept markers. if the command `hadoop s3guard bucket-info -markers aware s3a://bucket/` succeeds then the client has the modifications to support directory markers above files. If it fails as an unknown option: not compatible Change-Id: I2b58501eda160f9c2598bf492908bc6b3bf34f28
|
💔 -1 overall
This message was automatically generated. |
| LOG.debug("S3GetFileStatus {}", path); | ||
| Preconditions.checkArgument(!needEmptyDirectoryFlag | ||
| || probes.contains(StatusProbeEnum.List), | ||
| "s3GetFileStatus(%s) wants to know if a directory is empty but" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition and message doesn't seem consistent. Please review.
|
Overall looks good to me apart from some minor nits and doc changes. Really nice documentation. |
it is. I'm adding a comment. I'm going to put in a unit test to verify the condition. Key point: if needEmptyDirFlag is false: we don't care, if it is true then the probe must contain a list, so And obviously, the second test of the flag state is superflouous, which is why IDEs and things complain about it |
the parameterized ones lets us explicitly explore behaviours when we turn features on and off, so verify that they do what we expect in terms of keep vs delete. the -Dmarkers= lets us switch the entire test run to keep/delete, so verify that the whole API works with keep=true, and, with auditing enabled, that the delete option isn't somehow retaining markers. It's a PITA to have another option to run with, but making it explicit lets us show when pasting in our test results which option we used
the s3guard bit which gabor was talking about? -if everyone wants it, then I shall reluctantly do this the performance tests, I'd like them in. They are looking at the cost of delete/rename and listings. All the existing cost tests broke as fewer head calls were being made, more list etc, and trying to distinguish regressions from legit changes of the in-code-constants was hard. The OperationCost type with simple aggregation makes it easy to determine what the cost are. oh, they aren't really performance BTW -cost. Should I put them in a cost/ package instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I agree that we need both parametrized tests and -Dmarkers= option.
the performance tests, I'd like them in. They are looking at the cost of delete/rename and listings.
The tests are nice addition while it takes time to walk through added code and test results. I'm ok here given that the itests passed and the test logs looks fine.
oh, they aren't really performance BTW -cost. Should I put them in a cost/ package instead?
We can do it later, after another performance tests are added. ITestDirectoryMarkerListing could be moved to under another package since it is not for performance nor cost.
I would like to move this forward. +1, pending checkstyle and findbugs warnings.
* specific bucket-level "marker.policy.{delete, keep, authoritative}"
probes, which
* dynamic probes renamed marker.action.{keep, delete}
Capability logic all moved into DirectoryMarkerPolicyImpl
...which means it is now testable in unit tests. Tests added.
Checkstyles.
Change-Id: I27db716097a3bc1e1fe2639d2e90c1e855658675
|
@iwasakims thanks for that review. checkstyles in, and feedback from Gabor: we now have static path capabilities probes for bucket policy as well as the dynamic "what will happen on this path". All in DirectoryPolicyImpl, so it can be unit tested too! Testing: s3 london: -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep -Dscale. and The first test run failed in ITestS3AConfiguration.testProxyConnection; one I've seen before, where an expected failure on proxies succeeds -this on a bucket where I've disabled existence checks. Doesn't happen when guarded. Hypothesis: when s3guard is on, the proxy issues surface in initialize(), when off: not until the first FS connection. Filed https://issues.apache.org/jira/browse/HADOOP-17207 |
|
💔 -1 overall
This message was automatically generated. |
|
checkstyles are either unfixable (initialize()) or deliberate (test case naming) I'm going to merge this in now -just do one final review of the docs. I think I'll add a compatibility statement on the index.md page too, something to repeat for the other releases as I backport the getFileStatus change |
Change-Id: If76d9f3c6918d5c3cfd9bb28d4a97e35654139ea
|
and its merged in. Thanks to everyone for their reviews and feedback! Next steps
|
|
💔 -1 overall
This message was automatically generated. |
|
doing some followup tuning of the CLI tooling in #2254 ; nothing major. Better for integration test and general message formatting...the things which only surface after playing with the tool for a few days |
|
Thanks Steve. Looking forward to a Hadoop 2 backport! |
|
@liuml07 Looking forward to a Hadoop 2 backport! I'll do a hadoop-2 fixup of list time changes, but if you look #2310 delete gets complex, especially handling partial failures of deletes. I'm going to have to say that any branch without that (major) change will be able to read data in a path which keeps markers, but it must not delete things -otherwise S3Guard can get out of sync. |
| */ | ||
| @Override | ||
| @SuppressWarnings("deprecation") | ||
| public boolean isDirectory(Path f) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveloughran it looks the change to this function was meant to optimize performance, but I am experiencing performance regression when upgrading spark version from 3.1.2 to 3.5.1, and I found it caused by this change.
When I do a spark.read.parquet('s3a://path/to/1.parquet', ..., 's3a://path/to/10000.parquet') before this change, ONLY HEAD requests are sent to build the DataFrame. However, after this change, LIST requests are sent, which is significantly slower as I am reading from quite a lot of parquets.
The docstring "it is optimized to a single HEAD" also confuses me because StatusProbeEnum.DIRECTORIES is just an alias for StatusProbeEnum.LIST_ONLY.
Am I missing anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not good. file a PR, including what you can of the stack of checks.
What is probably happening is that the method calling this is assuming all the paths are directories (which this call is optimised for) but as all the paths are files it ends up doing
LIST path
so yes, it would be a step backwards. The code should be calling getFileStatus to really get everything about a file.
how, why are yo providing a list of many may files, given that spark expects to be working on a directory at a time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reply!
not good. file a PR, including what you can of the stack of checks.
Will do.
how, why are yo providing a list of many may files, given that spark expects to be working on a directory at a time?
We have an upstream service generating many parquet files, whose paths are long and deep in hierarchies. I am responsible for ingesting them into a Iceberg table with a PySpark cron job. Reading a list of many files instead of a directory is to workaround the slow and recursive S3 (in our case, Ceph RGW) LIST calls, so that I just need to do one LIST call before passing input to spark.read.parquet().
HADOOP-13230
successor to #1861