- 
                Notifications
    You must be signed in to change notification settings 
- Fork 9.1k
HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. #6010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good, though i want that review to make sure all is good (it should be)
| return null; | ||
| } finally { | ||
| IOUtils.close(blockUploadData); | ||
| blockToUpload.close(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, because the line above calls BlockUploadData.close(), a lot of the cleanup should take place already; that .startUpload() call on L315 sets the blockToUpload.buffer ref to null, so that reference doesn't retain a hold on the bytebuffer.
this is why we haven't seen problems yet like OOM/disk storage...cleanup was happening.
can you review the code to make sure the sequence of L348 always does the right thing and not fail due to some attempted double delete of the file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the comment.
can you review the code to make sure the sequence of L348 always does the right thing and not fail due to some attempted double delete of the file...:
- disk:
- BlockUploadData.close() -> deletes the file: hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java Line 266 in f6fa5bd LOG.debug("File deleted in BlockUploadData close: {}", file.delete()); 
- blockToUpload.close() -> innerClose() -> closeBlock() -> tries to delete the file (java.io.File#delete returns true in case path in the obj deleted succesfully, false otherwise, ref: https://docs.oracle.com/javase/8/docs/api/java/io/File.html#delete--) :hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java Line 1118 in f6fa5bd if (!bufferFile.delete() && bufferFile.exists()) { 
 
- BlockUploadData.close() -> deletes the file: 
- bytebuffer:
- What BlockUploadData contains: ByteBufferInputStream of blockBuffer:hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java Line 732 in f6fa5bd new ByteBufferInputStream(dataSize, blockBuffer)); 
- BlockUploadData.close() -> closes the ByteBufferInputSteam : -> unset thehadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java Line 263 in f6fa5bd cleanupWithLogger(LOG, uploadStream); bytebuffervariable in the inputStream :hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java Line 806 in f6fa5bd byteBuffer = null; 
- BlockToUpload.close() -> releases the buffer : -> adds back the buffer in pool:hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java Line 764 in f6fa5bd releaseBuffer(blockBuffer); hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java Line 667 in f6fa5bd private void releaseBuffer(ByteBuffer buffer) { 
 
- What BlockUploadData contains: ByteBufferInputStream of 
- ByteArrayBlock:
- BlockUploadData.close() -> closes the byteArrayInputStream(its wrapper around the buffer array): does nothing.
- BlockToUpload.close() -> does nothing: hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/store/DataBlocks.java Lines 623 to 626 in f6fa5bd protected void innerClose() { buffer = null; blockReleased(); } 
 
For checking that this flow works correctly on all the types of dataBuffers, have made change in the testCloseOfDataBlockOnAppendComplete. This will also help keep check the sanity in future on any change in the databuffer code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is safe in case of file delete, as java.io.File will return false in case file is not there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add blockToUpload in L347 alongside blockUploadData, for consistency and also checks for null.
| :::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 46 mins 36 secs.  | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good catch! Interesting to know if this was a source of any problems you saw in prod (Sorry for that 😅).
As Steve has mentioned we did do some cleanup for uploadBlock that may have saved us in a way (but could have hidden this issue).
Mostly happy with the change, just small nits.
| return null; | ||
| } finally { | ||
| IOUtils.close(blockUploadData); | ||
| blockToUpload.close(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add blockToUpload in L347 alongside blockUploadData, for consistency and also checks for null.
| BlockUploadStatistics.class)); | ||
| return factory; | ||
| }).when(store).getBlockFactory(); | ||
| OutputStream os = fs.create(new Path("/file_" + blockBufferType)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try with resources or try-finally to close the stream (Off chance we see failure after creating, we clean up)
| BlockUploadStatistics.class)); | ||
| return factory; | ||
| }).when(store).getBlockFactory(); | ||
| OutputStream os = fs.create(new Path("/file_" + blockBufferType)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "getMethodName()" instead of hardcoding "/file"
| OutputStream os = fs.create(new Path("/file_" + blockBufferType)); | ||
| os.write(new byte[1]); | ||
| os.close(); | ||
| Mockito.verify(dataBlock[0], Mockito.times(1)).close(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an assertion here that dataBlock[0] is closed.
*Suggestion, getState() can be used to verify the state of the block is in Closed state (May need to make the getter public). We can even add an assertion after L135 to assert it's in Writing state as well.
| :::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 48 mins 26 secs.  | 
| Thanks @mehakmeet for the review. I have taken the comments. Thank you. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| ...handing off review to @mehakmeet; if he's happy it's good | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| Thanks for this @saxenapranav, can you please cherry-pick the same in branch-3.3 and run the whole test suite again, then we'll merge it there as well. | 
…pache#6010) AbfsOutputStream to close the dataBlock object created for the upload. Contributed By: Pranav Saxena
| Thanks @mehakmeet for the review. I have cherry-picked the same in branch-3.3 in PR: #6105. Thanks. | 
…pache#6010) AbfsOutputStream to close the dataBlock object created for the upload. Contributed By: Pranav Saxena
| @saxenapranav @mehakmeet Hello. | 
| @zzccctv | 
| @zzccctv just checked this. direct memory is off heap, but it is still in process; process exit will reclaim it. It just doesn't move around the way objects on heap do. what is important is that as direct memory only consumes a small amount of on-heap memory (the object containing a pointer to it), it doesn't trigger GCs even if you have been leaking them. other GCs will trigger this, call finalize() on the references and release the DM then. we have a test for this somewhere | 
AbfsOutputStream doesnt close the dataBlock object created for the upload.
What is the implication of not doing that:
DataBlocks has three implementations:
ByteArrayBlock
ByteBufferBlock:
closemethod on the this object has the responsiblity of returning back the buffer to pool so it can be reused.close:DiskBlock:
startUpload() gives an object of BlockUploadData which gives method of
toByteArray()which is used in abfsOutputStream to get the byteArray in the dataBlock.Method which uses the DataBlock object:
hadoop/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Line 298 in fac7d26
jira: https://issues.apache.org/jira/browse/HADOOP-18873
:::: AGGREGATED TEST RESULT ::::
HNS-OAuth
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:344->lambda$testAcquireRetry$6:345 » TestTimedOut
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41
HNS-SharedKey
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:336 » TestTimedOut test timed o...
[ERROR] ITestAzureBlobFileSystemLease.testTwoWritersCreateAppendWithInfiniteLeaseEnabled:186->twoWriters:154 » TestTimedOut
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 2, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41
NonHNS-SharedKey
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 589, Failures: 0, Errors: 0, Skipped: 277
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44
AppendBlob-HNS-OAuth
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 572, Failures: 0, Errors: 0, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41
Time taken: 45 mins 31 secs.
azureuser@Hadoop-VM-EAST2:~/hadoop/hadoop-tools/hadoop-azure$ git log
commit 16455f8 (HEAD -> HADOOP-18873, origin/HADOOP-18873)
Author: Pranav Saxena <>
Date: Thu Aug 31 21:25:51 2023 -0700