-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-16852: Report read-ahead error back #1898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Outdated
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Outdated
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a hadoop JIRA and a link back. PRs without a matching JIRA do not exist and SHALL not be committed
|
Test results: HNS not enabled account: |
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Outdated
Show resolved
Hide resolved
|
Made a fix where read-ahead thread will never read remote a length greater than its buffer size. HNS not enabled account: |
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBuffer.java
Outdated
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Outdated
Show resolved
Hide resolved
|
@DadanielZ - Thanks for the review. I have left the comment on the bufferstatus versus the timestamp check unresolved. As mentioned in my comments, the intention is to throw the exception from the read-ahead buffer for any reads that qualify the buffer's offset and length range. Please let me know if you have any concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, +1.
Have made the necessary updates. |
|
@steveloughran - Can you please help review this PR ? |
|
@DadanielZ is happy with the core patch, so I am too. just the checkstyle to fix |
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBuffer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looked at, tests look good, little bit of some logic in the production code I'm querying
Is there any way to avoid a 30s delay on every test run for the timeouts? As that will slow down the tests, and every change like this makes the tests slower and slower, costing us engineers time, our employers money *and reducing the likelihood that people run the tests
Side issue; do those read buffer manager threads ever get released? And what happens in large JVM processes where you have many abfs fs instances?, eg. hive LLAP, Spark. Does this become a bottleneck as irrespective of the #of FS instances, the buffer size and count is hard coded.
What I'm wondering here is should the buffer manager actually be something which belongs to a specific FS instance, uses its thread pool, and is released when the FS instance is destroyed.
...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java
Show resolved
Hide resolved
...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java
Show resolved
Hide resolved
...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java
Outdated
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Outdated
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
|
things aren't building because the change made to the abfs constructor is breaking it. Sorry. That refactoring was done to try and reduce change conflict in future. |
Have merged and made test updates that were needed post the recent SAS updates. |
Timeout sleep duration in test have been reduced to 3sec. For the other issues on buffer management in Read buffer manager, will investigate separate and create JIRAs for improvement points. |
|
Tests rerun: HNS [INFO] Tests run: 69, Failures: 0, Errors: 0, Skipped: 0 non-HNS |
|
@steveloughran - Could you please help complete review and commit. |
|
are there plans to backport? |
| // As failed ReadBuffers (bufferIndx = -1) are saved in completedReadList, | ||
| // avoid adding it to freeList. | ||
| if (buf.getBufferindex() != -1) { | ||
| freeList.push(buf.getBufferindex()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @snvijaya
I am unable to understand the significance of this change. I couldn't find in code anywhere where bufferIndex is set to -1 in case of read failure apart from the default value in the class. But when the buffers initialised, they are always set to value from 0 to 15.
Trying to understand this for #3285. So please review that as well. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its set to -1 when read fails. You will find the diff for this in ReadBuffer.java line 110.
There is an issue with this commit though, for which a hotfix was made. Incase its relevant to your change -> https://issues.apache.org/jira/browse/HADOOP-17301
Will check on your PR by EOW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @snvijaya
Currently errors in read-ahead are silently ignored, thus failing to highlight any issues and causing slowness to the overall read request.
Any new read request in-turn triggers n num of read-aheads and all of them will silently fail.
This PR will report back error from the read-ahead issued by the active read call. Also, cause subsequent reads to only retry the respective read position based on the failure seen for the previous read-ahead failure on same position.