-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19161. S3A: option "fs.s3a.performance.flags" to take list of performance flags #6789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19161. S3A: option "fs.s3a.performance.flags" to take list of performance flags #6789
Conversation
note the commented out bit where we considered adding options like "hive" or "spark". @HarshitGupta11 and I discussed this; for now lets go with a list of options and "*" |
/* case "hive": | ||
case "impala": | ||
case "spark": | ||
case "distcp": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not let downstreamers decide what flags they want (after extensive testing)? And across different releases, they might need different flags to be turned on (in case of any regression)?
We can just recommend the flags (as already commented out here) but not set the flags for them. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
harshit and I were discussing this. i think it's best to have that option list, as app settings could be too brittle to changes
public boolean isDelete() { | ||
return delete; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one also we want to tackle as separate task (after HADOOP-19072), correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. harshit did an experiment where he turned off all attempts at creating parent dirs after delete. fairly brittle, i think
I have a better design for this. changign this to draft. Proposed: we have a this makes it trivial to reuse/process. the implementation would be outside the actual Configuration class to make it easy for AbfsConfiguration to use too |
feb384f
to
28cd50a
Compare
|
28cd50a
to
f3571ac
Compare
a564aa4
to
82974c4
Compare
@mukund-thakur @saxenapranav @HarshitGupta11 @ahmarsuhail @virajjasani FYI, this is ready to review. Once in, viraj's mkdirs option can go in, and then we can document this option in performance.md ABFS can use it for their options |
206bfab
to
af2dd8d
Compare
OK, I'm happy with this. @mukund-thakur @virajjasani thoughts? |
Sounds good, i would take a detailed look at ConfigurationHelper and FlagSet soon. |
realised we need one more thing: a way to enumerate all enabled flags. |
d9cb6f9
to
64319c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few nits, overall changes look solid
intercept(IllegalArgumentException.class, "unrecognized", () -> | ||
parseEnumSet("key", "c, unrecognized", SimpleEnum.class, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about intercepting the util method call instead of direct call to parseEnumSet:
intercept(IllegalArgumentException.class, "unrecognized", () -> assertEnumParse("c, unrecognized", SimpleEnum.class, false))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me do that as a separate test
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FlagSet.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FlagSet.java
Outdated
Show resolved
Hide resolved
sb.append(", partSize=").append(partSize); | ||
sb.append(", enableMultiObjectsDelete=").append(enableMultiObjectsDelete); | ||
sb.append(", maxKeys=").append(maxKeys); | ||
sb.append(", ").append(performanceFlags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key performanceFlags=
missing
public boolean enabled(final E flag) { | ||
return flags.contains(flag); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to how we check for mutability of flags while enabling/disabling a flag, i wonder if we should check for immutability while checking for enabled()
? Probably not, but then how would client know when to set immutable? or maybe it's fine to not set immutability ever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need it immutable, probably it's good case for builder or constructor so maybe keeping the object mutable is also fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i started off fully mutable, added immutabilty as an option to avoid all sync issues. you can manipulated it as much as you want and only once made immutable do you need to worry about thread safety etc. lets you buld by copying and updating, from a default value with changes etc
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/impl/TestFlagSet.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
|
Change-Id: Ie28b4ab01881cfe92fb622d4d27b7bdad8a690fd
🎊 +1 overall
This message was automatically generated. |
Change-Id: Ifa8941acbcc84be30a997b6097947c302158f583
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the design and implementation. Just need one clarification about the immutability comment
* trim the list, map to enum values in the message (case insensitive) | ||
* and return the set. | ||
* Special handling of "*" meaning: all values. | ||
* @param key key for error messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't see much use of key param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I get it now. maybe mention - configuration key which was used to configure the flags.
I got confused initially because of the UT,
} | ||
|
||
@Test | ||
public void testStarEnum() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is already present above testEnumParseAll
} | ||
|
||
@Test | ||
public void testUnknownStarEnum() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test with repeated values "a, b, a". should pass
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FlagSet.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
@mukund-thakur thanks, altered the tests and tried to clarify the immutability. One final thought: should we have a default list of enum values to set if, when parsing the config string, the string isn't found? Currently if nothing is found we return the empty set. |
* Mukund's feedback * Remove unused "throws Throwable" on test cases as appropriate Save enumClass constructor parameter of FlagSet * add accessor * compare in equals * use in a copy() operation which creates a mutable deep copy of an object. * explicit checks for null enumclass, prefix in constructor Change-Id: I8080d772bfd301ab6e3ed802d54a060a63feea78
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
I am good with an empty set. What is the default value you are thinking? create is already getting added based on the create perf flag explicitly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits, looks good to go
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Show resolved
Hide resolved
*/ | ||
private void assertHasCapability(final String capability) { | ||
Assertions.assertThat(flagSet.hasCapability(capability)) | ||
.describedAs("Capabiilty of %s on %s", capability, flagSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Capability
*/ | ||
private void assertLacksCapability(final String capability) { | ||
Assertions.assertThat(flagSet.hasCapability(capability)) | ||
.describedAs("Capabiilty of %s on %s", capability, flagSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
|
||
| *Option* | *Meaning* | Since | | ||
|----------|--------------------|:------| | ||
| `create` | Create Performance | 3.4.1 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we mention 3.4.1 / 3.5.0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
That is a complicated list of options which deliver speed if the person setting them | ||
understands the risks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just in case if it looks better? if the person or client application setting them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, its an individual who takes the blame
Change-Id: Id49ceae8badaaa80e35dbe62c75a7d6d107449c1
🎊 +1 overall
This message was automatically generated. |
ok, everything is addressed, going to merge to the branches |
…performance flags (apache#6789) 1. Configuration adds new method `getEnumSet()` to get a set of enum values from a configuration string. <E extends Enum<E>> EnumSet<E> getEnumSet(String key, Class<E> enumClass, boolean ignoreUnknown) Whitespace is ignored, case is ignored and the value "*" is mapped to "all values of the enum". If "ignoreUnknown" is true then when parsing, unknown values are ignored. This is recommended for forward compatiblity with later versions. 2. This support is implemented in org.apache.hadoop.fs.s3a.impl.ConfigurationHelper -it can be used elsewhere in the hadoop codebase. 3. A new private FlagSet class in hadoop common manages a set of enum flags. It implements StreamCapabilities and can be probed for a specific option being set (with a prefix) S3A adds an option fs.s3a.performance.flags which builds a FlagSet with enum type PerformanceFlagEnum * which initially contains {Create, Delete, Mkdir, Open} * the existing fs.s3a.create.performance option sets the flag "Create". * tests which configure fs.s3a.create.performance MUST clear fs.s3a.performance.flags in test setup. Future performance flags are planned, with different levels of safety and/or backwards compatibility. Contributed by Steve Loughran
…performance flags (#6789) (#6966) 1. Configuration adds new method `getEnumSet()` to get a set of enum values from a configuration string. <E extends Enum<E>> EnumSet<E> getEnumSet(String key, Class<E> enumClass, boolean ignoreUnknown) Whitespace is ignored, case is ignored and the value "*" is mapped to "all values of the enum". If "ignoreUnknown" is true then when parsing, unknown values are ignored. This is recommended for forward compatiblity with later versions. 2. This support is implemented in org.apache.hadoop.fs.s3a.impl.ConfigurationHelper -it can be used elsewhere in the hadoop codebase. 3. A new private FlagSet class in hadoop common manages a set of enum flags. It implements StreamCapabilities and can be probed for a specific option being set (with a prefix) S3A adds an option fs.s3a.performance.flags which builds a FlagSet with enum type PerformanceFlagEnum * which initially contains {Create, Delete, Mkdir, Open} * the existing fs.s3a.create.performance option sets the flag "Create". * tests which configure fs.s3a.create.performance MUST clear fs.s3a.performance.flags in test setup. Future performance flags are planned, with different levels of safety and/or backwards compatibility. Contributed by Steve Loughran
…performance flags (apache#6789) 1. Configuration adds new method `getEnumSet()` to get a set of enum values from a configuration string. <E extends Enum<E>> EnumSet<E> getEnumSet(String key, Class<E> enumClass, boolean ignoreUnknown) Whitespace is ignored, case is ignored and the value "*" is mapped to "all values of the enum". If "ignoreUnknown" is true then when parsing, unknown values are ignored. This is recommended for forward compatiblity with later versions. 2. This support is implemented in org.apache.hadoop.fs.s3a.impl.ConfigurationHelper -it can be used elsewhere in the hadoop codebase. 3. A new private FlagSet class in hadoop common manages a set of enum flags. It implements StreamCapabilities and can be probed for a specific option being set (with a prefix) S3A adds an option fs.s3a.performance.flags which builds a FlagSet with enum type PerformanceFlagEnum * which initially contains {Create, Delete, Mkdir, Open} * the existing fs.s3a.create.performance option sets the flag "Create". * tests which configure fs.s3a.create.performance MUST clear fs.s3a.performance.flags in test setup. Future performance flags are planned, with different levels of safety and/or backwards compatibility. Contributed by Steve Loughran
…performance flags (apache#6789) 1. Configuration adds new method `getEnumSet()` to get a set of enum values from a configuration string. <E extends Enum<E>> EnumSet<E> getEnumSet(String key, Class<E> enumClass, boolean ignoreUnknown) Whitespace is ignored, case is ignored and the value "*" is mapped to "all values of the enum". If "ignoreUnknown" is true then when parsing, unknown values are ignored. This is recommended for forward compatiblity with later versions. 2. This support is implemented in org.apache.hadoop.fs.s3a.impl.ConfigurationHelper -it can be used elsewhere in the hadoop codebase. 3. A new private FlagSet class in hadoop common manages a set of enum flags. It implements StreamCapabilities and can be probed for a specific option being set (with a prefix) S3A adds an option fs.s3a.performance.flags which builds a FlagSet with enum type PerformanceFlagEnum * which initially contains {Create, Delete, Mkdir, Open} * the existing fs.s3a.create.performance option sets the flag "Create". * tests which configure fs.s3a.create.performance MUST clear fs.s3a.performance.flags in test setup. Future performance flags are planned, with different levels of safety and/or backwards compatibility. Contributed by Steve Loughran
HADOOP-19161
Initial design
For testing we need to make sure ths is unset from all cost tests.
relates to #6543; the logic to set up that operation is here...that PR would
just be the implementation.
Same for a delete optimisation where we'd skip parent dir probe.
rename could do the same for its source dir too.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?