Skip to content

Conversation

@steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Oct 28, 2021

Contains MAPREDUCE-7367. #2971
Parallelize file moves in FileOutputCommitter v1 job commit

Contributed by Rajesh Balamohan

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?

@steveloughran steveloughran force-pushed the mr/HADOOP-17981-resilient-commit branch from 08e20d0 to ee6854c Compare October 29, 2021 09:40
Rajesh Balamohan and others added 2 commits October 29, 2021 18:06
…commit

Contributed by Rajesh Balamohan

Change-Id: I8d2cedb0ba8b6aba2f1d0f814f52d77b156f1206
Selective cherrypick of MAPREDUCE-7341 files; move etag tests
into hadoop-azure.
RawLocalFS implements the API; checksum fs does not

Change-Id: Ic68337742de6aefd067eda14313fd07088c73ac7
@steveloughran steveloughran force-pushed the mr/HADOOP-17981-resilient-commit branch from ee6854c to f4950b3 Compare October 29, 2021 17:07
@apache apache deleted a comment from hadoop-yetus Nov 1, 2021
@apache apache deleted a comment from hadoop-yetus Nov 1, 2021
@apache apache deleted a comment from hadoop-yetus Nov 1, 2021
adds options vararg similar to rename/3; allows caller to declare it is confident the
dest path doesn't exist, so no need to check for overwrites or similar.

test on local fs verifies that renaming on a filesystem without the new
interface matches that of the new behaviour,

committer test alternates between raw/local fs on parameterized runs.
minor reorg of the new tests in that class

Change-Id: Ic31d52c9a95b768e0a97583e906c20c53b120194
@apache apache deleted a comment from hadoop-yetus Nov 1, 2021
Path finalOutput = getOutputPath();
FileSystem fs = finalOutput.getFileSystem(context.getConfiguration());
// created resilient commit helper bonded to the destination FS/path
resilientCommitHelper = new ResilientCommitByRenameHelper(fs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism becomes very FileSystem specific. Implemented by Azure right now.
Other users of rename will not see the benefits without changing interfaces, which in turn requires shimming etc.

Would it be better for AzureFileSystem rename itself to add a config parameter which can lookup the src etag (at the cost of a performance hit for consistency), so that downstream components / any users of the rename operation can benefit from this change without having to change interfaces. Also, if the performance penalty is a big problem - Abfs could create very short-lived caches for FileStatus objects, and handle errors on discrepancies with the cached copy.
Essentially - don't force usage of the new interface to get the benefits.

Side note: The fs.getStatus within ResilientCommitByRenameHelper for FileSystems where this new functionality is not supported will lead to a performance penalty for the other FileSystems (performing a getFileStatus on src).

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Pending ABFS side changes.

@steveloughran
Copy link
Contributor Author

This mechanism becomes very FileSystem specific. Implemented by Azure right now.

I agree, which is why the API is restricted for its uses to mr-client-core only.
as abfs is the only one which needs it for correctness under load,
And I'm not worried about that specifity.
Can I point to how much of the hadoop fs api are hdfs-only -and they are public.

Other users of rename will not see the benefits without changing interfaces, which in turn requires shimming etc.

Please don't try and use this particular interface in Hive.

Would it be better for AzureFileSystem rename itself to add a config parameter which can lookup the src etag (at the cost of a performance hit for consistency), so that downstream components / any users of the rename operation can benefit from this change without having to change interfaces.

We are going straight from a listing (1 request/500 entries) to that rename. doing a HEAD first cuts the throughtput in half. so no.

Also, if the performance penalty is a big problem - Abfs could create very short-lived caches for FileStatus objects, and handle errors on discrepancies with the cached copy.

Possible but convoluted.

Essentially - don't force usage of the new interface to get the benefits.

I understand the interests of the hive team, but this fix is not the place to do a better API.

Briefly cacheing the source FS entries is something to consider though. Not this week.

What I could do with is some help getting #2735 in, then we can start on a public rename() builder API which will take a file status, as openFile does.

Side note: The fs.getStatus within ResilientCommitByRenameHelper for FileSystems where this new functionality is not supported will lead to a performance penalty for the other FileSystems (performing a getFileStatus on src).

There is an option to say "i know it is not there"; this skips the check. the committer passes this option down because it issues a delete call first.

FWIW the manifest committer will make that pre-rename commit optional, saving that IO request. I am curious as to how well that will work I went executed on well formed tables.

Change-Id: I65dd20e2e0d6af2d347c07b4a3e13ab58763c1ac
Change-Id: I79297a308bf558d188592cadbe60e2622435c6e8
@steveloughran
Copy link
Contributor Author

@sidseth I discussed the idea of caching status etags with @mukund-thakur

considering that you'd need to remove entries from the cache on: create, rename, & delete, it would actually be a copy of org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore, as #3534 is removing all of s3guard, I'll consider that a WONTFIX. sorry.

@steveloughran
Copy link
Contributor Author

see #3611 . i think this pr had got too complex and so have gone back to what I had originally done in the manifest committer. filesystem provides etags, application code recovers.

This is something which we can expose to the hive team, as all that will be added is a single stable extension to filestatus to get the etag.

@sidseth
Copy link
Contributor

sidseth commented Nov 3, 2021

This mechanism becomes very FileSystem specific. Implemented by Azure right now.

I agree, which is why the API is restricted for its uses to mr-client-core only. as abfs is the only one which needs it for correctness under load, And I'm not worried about that specifity. Can I point to how much of the hadoop fs api are hdfs-only -and they are public.

Other users of rename will not see the benefits without changing interfaces, which in turn requires shimming etc.

Please don't try and use this particular interface in Hive.

Was referring to any potential usage - including Hive.

Would it be better for AzureFileSystem rename itself to add a config parameter which can lookup the src etag (at the cost of a performance hit for consistency), so that downstream components / any users of the rename operation can benefit from this change without having to change interfaces.

We are going straight from a listing (1 request/500 entries) to that rename. doing a HEAD first cuts the throughtput in half. so no.

In the scenario where this is encountered. Would not be the default behaviour, and limits the change to Abfs. Could also have the less consistent version which is not etag based, and responds only on failures. Again - limited to Abfs.

Also, if the performance penalty is a big problem - Abfs could create very short-lived caches for FileStatus objects, and handle errors on discrepancies with the cached copy.

Possible but convoluted.

Agree. Quite convoluted. Tossing in potential options - to avoid a new public API.

Essentially - don't force usage of the new interface to get the benefits.

I understand the interests of the hive team, but this fix is not the place to do a better API.

Briefly cacheing the source FS entries is something to consider though. Not this week.

What I could do with is some help getting #2735 in, then we can start on a public rename() builder API which will take a file status, as openFile does.

This particular change would be FSImpl agnostic, and potentially remove the need for the new interface here?

Side note: The fs.getStatus within ResilientCommitByRenameHelper for FileSystems where this new functionality is not supported will lead to a performance penalty for the other FileSystems (performing a getFileStatus on src).

There is an option to say "i know it is not there"; this skips the check. the committer passes this option down because it issues a delete call first.

EOD - this ends up being a new API (almost on the FileSystem), which is used by the committer first; then someone discovers it and decides to make use of it.

FWIW the manifest committer will make that pre-rename commit optional, saving that IO request. I am curious as to how well that will work I went executed on well formed tables.

@apache apache deleted a comment from hadoop-yetus Nov 3, 2021
@apache apache deleted a comment from hadoop-yetus Nov 3, 2021
@steveloughran
Copy link
Contributor Author

EOD - this ends up being a new API (almost on the FileSystem), which is used by the committer first; then someone discovers it and decides to make use of it.

yes, and both hive and hbase are known to do that, often ending forcing hdfs only apis to get pulled up without concern for the other stores (see git history of FileSystem there). a builder rename would be a big job with

  1. rename/3 public with tests and performant object store impls
  2. default builder to delegate to that with same semantics
  3. add options for etag etc on a store by store or cross store basis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants