-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-16005: Add XAttr support to WASB and ABFS #452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Outdated
Show resolved
Hide resolved
AvroDataFileHdfsWriter should include avro as the file suffix as some pig jobs couldn't read the avro files if they don't come with the proper suffix Author: Hai Lu <[email protected]> Reviewers: Xinyu Liu <[email protected]> Closes apache#452 from lhaiesp/master
|
Apologies for the delay on this pull request. I resolved the merge conflicts and pushed an update. All the unit testes pass, but when running the integration tests with my azure-auth-keys.xml file against StorageV2 accounts, I'm getting some failures. However, I'm also getting these failures when running against trunk so I doubt that the errors are related to this pull request. I ran the tests by executing As a new contributor to this project, I'd appreciate any guidance on how to resolve the errors and properly integration test my change. Any ideas @DadanielZ? Thanks in advance for your help! |
|
@c-w the configuration for ABFS is not correct, the account name and related suffix should be "dfs.core.windows.net", not "blob.core.windows.net". |
|
🎊 +1 overall
This message was automatically generated. |
|
Thanks for the help, @DadanielZ. I was able to run the tests by changing the ABFS endpoint to dfs as opposed to blob as you suggested. All test now pass for the branch except for one test which also fails on trunk: comparison, branch log, trunk log. Let me know if this is sufficient for the purposes of this pull request or if there's further testing/verification you'd like me to perform. |
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, there are minor checkstyle issues, once they are fixed I will help to commit it. Thanks.
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Outdated
Show resolved
Hide resolved
...ols/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/NativeAzureFileSystemBaseTest.java
Outdated
Show resolved
Hide resolved
...op-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAttributes.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many line breaks in this method. Please remove the extra ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 54dcb1d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-activating this comment as i don't see this fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed some more newlines in 6ecbdc0. I'm finding it a bit difficult to discern what style to follow since the file seems to include multiple styles for newlines (e.g. my original implementation was close to setOwner in indentation style). If you have any additional feedback on-top of what is implemented now, I'd appreciate more concrete pointers on style/formatting. Thanks in advance!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about nit-picking but it would be good to just follow the formatting in other methods in this file. For example, you don't need new lines in the catch block and also remove 3588.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointers. I was previously following the style of setOwner. I removed the newlines you mentioned in f7c0d26.
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for using a different encoding when compared to the AzureBlobFileSystemStore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for Azure Data Lake Storage Gen2 (backing AzureBlobFileSystemStore) states that x-ms-properties must be encoded in ISO-8859-1 (see path/update#request-headers). The documentation for Azure Blob Storage (backing AzureNativeFileSystemStore) however only states that x-ms-meta must follow the conventions for C# identifiers (see set-blob-metadata#request-headers) which may contain Unicode letters and characters (see identifier-names). As such I believe that the two classes should use different encodings. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mostly curious about this one. I assume @DadanielZ can confirm this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is mostly looking good. If @DadanielZ can confirm the encoding issue, i can happy to commit this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being away so long, yes, this is fine
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-activating this comment as i don't see this fixed.
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@DadanielZ Any chance you could take another look at this and provide some guidance on @virajith's question regarding encodings #452 (comment)? Would love to get this wrapped up and merged soon. |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look fine for AzureBlobFileSystem.
Author: Gabor Szadovszky <[email protected]> Closes apache#452 from gszadovszky/PARQUET-1198 and squashes the following commits: 5b7ee44 [Gabor Szadovszky] PARQUET-1198: Bump java source and target to java8
As discussed in HADOOP-16005, this pull request implements
getXAttrandsetXAttron hadoop-azure's WASB and ABFS file-systems.The changes were tested against the following Azure storage account configurations:
WASB: StorageV2, RA-GRS replication in East US (primary) West US (secondary). WASB test session screenshot. All tests pass and the ABFS tests are skipped as expected.
ABFS: StorageV2 with Data Lake Storage Gen2 preview enabled, RA-GRS replication in East US (primary) West US (secondary). ABFS test session screenshot. All ABFS tests pass but the WASB tests fail since the storage account hasn't implemented the blob endpoints yet.
The test-patch script passed: test-patch output.