HADOOP-16852: Report read-ahead error back #1898

snvijaya · 2020-03-17T04:37:10Z

Currently errors in read-ahead are silently ignored, thus failing to highlight any issues and causing slowness to the overall read request.

Any new read request in-turn triggers n num of read-aheads and all of them will silently fail.

This PR will report back error from the read-ahead issued by the active read call. Also, cause subsequent reads to only retry the respective read position based on the failure seen for the previous read-ahead failure on same position.

...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java

steveloughran

needs a hadoop JIRA and a link back. PRs without a matching JIRA do not exist and SHALL not be committed

snvijaya · 2020-03-19T10:57:36Z

Test results:
HNS enabled account:
[INFO] Tests run: 58, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 412, Failures: 0, Errors: 0, Skipped: 66
[WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 140

HNS not enabled account:
[INFO] Tests run: 58, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 412, Failures: 0, Errors: 0, Skipped: 240
[WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 140

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java

snvijaya · 2020-03-23T06:26:11Z

Made a fix where read-ahead thread will never read remote a length greater than its buffer size.
HNS enabled account:
[INFO] Tests run: 58, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 412, Failures: 0, Errors: 0, Skipped: 66
[WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 140

HNS not enabled account:
[INFO] Tests run: 58, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 412, Failures: 0, Errors: 0, Skipped: 240
[WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 140

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBuffer.java

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java

snvijaya · 2020-03-30T11:21:26Z

@DadanielZ - Thanks for the review. I have left the comment on the bufferstatus versus the timestamp check unresolved. As mentioned in my comments, the intention is to throw the exception from the read-ahead buffer for any reads that qualify the buffer's offset and length range. Please let me know if you have any concerns.

DadanielZ

LGTM, +1.

snvijaya · 2020-04-01T03:43:17Z

needs a hadoop JIRA and a link back. PRs without a matching JIRA do not exist and SHALL not be committed

Have made the necessary updates.

snvijaya · 2020-04-01T03:43:45Z

@steveloughran - Can you please help review this PR ?

steveloughran · 2020-04-06T12:42:14Z

@DadanielZ is happy with the core patch, so I am too. just the checkstyle to fix

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBuffer.java

steveloughran

looked at, tests look good, little bit of some logic in the production code I'm querying

Is there any way to avoid a 30s delay on every test run for the timeouts? As that will slow down the tests, and every change like this makes the tests slower and slower, costing us engineers time, our employers money *and reducing the likelihood that people run the tests

Side issue; do those read buffer manager threads ever get released? And what happens in large JVM processes where you have many abfs fs instances?, eg. hive LLAP, Spark. Does this become a bottleneck as irrespective of the #of FS instances, the buffer size and count is hard coded.

What I'm wondering here is should the buffer manager actually be something which belongs to a specific FS instance, uses its thread pool, and is released when the FS instance is destroyed.

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java

bilaharith

nit

steveloughran · 2020-04-27T19:46:53Z

things aren't building because the change made to the abfs constructor is breaking it. Sorry. That refactoring was done to try and reduce change conflict in future.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.6:site (default-site) on project hadoop-azure: failed to get report for org.apache.maven.plugins:maven-dependency-plugin: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile (default-testCompile) on project hadoop-azure: Compilation failure
[ERROR] /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1898/src/hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:[72,34] error: constructor AbfsInputStream in class AbfsInputStream cannot be applied to given types;

snvijaya · 2020-05-19T12:51:23Z

things aren't building because the change made to the abfs constructor is breaking it. Sorry. That refactoring was done to try and reduce change conflict in future.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.6:site (default-site) on project hadoop-azure: failed to get report for org.apache.maven.plugins:maven-dependency-plugin: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile (default-testCompile) on project hadoop-azure: Compilation failure
[ERROR] /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1898/src/hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:[72,34] error: constructor AbfsInputStream in class AbfsInputStream cannot be applied to given types;

Have merged and made test updates that were needed post the recent SAS updates.

snvijaya · 2020-05-19T12:54:19Z

looked at, tests look good, little bit of some logic in the production code I'm querying

Is there any way to avoid a 30s delay on every test run for the timeouts? As that will slow down the tests, and every change like this makes the tests slower and slower, costing us engineers time, our employers money *and reducing the likelihood that people run the tests

Side issue; do those read buffer manager threads ever get released? And what happens in large JVM processes where you have many abfs fs instances?, eg. hive LLAP, Spark. Does this become a bottleneck as irrespective of the #of FS instances, the buffer size and count is hard coded.

What I'm wondering here is should the buffer manager actually be something which belongs to a specific FS instance, uses its thread pool, and is released when the FS instance is destroyed.

Timeout sleep duration in test have been reduced to 3sec. For the other issues on buffer management in Read buffer manager, will investigate separate and create JIRAs for improvement points.

snvijaya · 2020-05-19T12:54:35Z

Tests rerun:

HNS

[INFO] Tests run: 69, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 432, Failures: 0, Errors: 0, Skipped: 74
WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 140

non-HNS
[INFO] Tests run: 69, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 432, Failures: 0, Errors: 0, Skipped: 248
[WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 140

snvijaya · 2020-05-19T17:50:28Z

@steveloughran - Could you please help complete review and commit.

steveloughran · 2020-07-06T10:08:32Z

are there plans to backport?
If you can cherry pick onto branch-3.3 and do the test run, let me know and I will do the merge. No

mukund-thakur · 2021-08-11T07:26:47Z

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java

+    // As failed ReadBuffers (bufferIndx = -1) are saved in completedReadList,
+    // avoid adding it to freeList.
+    if (buf.getBufferindex() != -1) {
+      freeList.push(buf.getBufferindex());
+    }


Hi @snvijaya
I am unable to understand the significance of this change. I couldn't find in code anywhere where bufferIndex is set to -1 in case of read failure apart from the default value in the class. But when the buffers initialised, they are always set to value from 0 to 15.
Trying to understand this for #3285. So please review that as well. Thanks.

Its set to -1 when read fails. You will find the diff for this in ReadBuffer.java line 110.
There is an issue with this commit though, for which a hotfix was made. Incase its relevant to your change -> https://issues.apache.org/jira/browse/HADOOP-17301
Will check on your PR by EOW.

Thanks @snvijaya

snvijaya added 2 commits March 17, 2020 10:02

Report read-ahead error back

ace7311

Trunk merge

5e29d12

goiri reviewed Mar 17, 2020

View reviewed changes

steveloughran requested changes Mar 17, 2020

View reviewed changes

Incorporating review comments from Vinay

5870238

snvijaya changed the title ~~Report read-ahead error back~~ HADOOP-16852: Report read-ahead error back Mar 19, 2020

Review comments from Inigo

1cb20db

snvijaya requested review from goiri and steveloughran March 19, 2020 10:59

goiri reviewed Mar 19, 2020

View reviewed changes

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java Outdated Show resolved Hide resolved

Fix for read-ahead length

dbf0254

Fixing some test comments

4ba43ef

DadanielZ reviewed Mar 29, 2020

View reviewed changes

Fixing full namespace use of IOException

772c275

DadanielZ approved these changes Mar 31, 2020

View reviewed changes

snvijaya added 2 commits April 6, 2020 15:09

Fixing NoWhiteSpaceBefore checkstyle errors

035fa7a

Merge branch 'trunk' into HADOOP-16852

0a83ffe

Checkstyle LOG field

52fba32

steveloughran reviewed Apr 7, 2020

View reviewed changes

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBuffer.java Show resolved Hide resolved

steveloughran requested changes Apr 7, 2020

View reviewed changes

bilaharith reviewed Apr 20, 2020

View reviewed changes

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java Show resolved Hide resolved

bilaharith reviewed Apr 20, 2020

View reviewed changes

Review comments

58ac701

snvijaya added 2 commits May 18, 2020 20:08

Merge with trunk and checkstyle fix

5d1e427

Test updates needed post SAS change

08c0723

Fix findbug issue

669f3c1

DadanielZ merged commit 53b993e into apache:trunk May 27, 2020

mehakmeet mentioned this pull request Jul 3, 2020

Hadoop 16961. ABFS: Adding metrics to AbfsInputStream #2076

Merged

mehakmeet mentioned this pull request Jul 16, 2020

HADOOP-16852: Report read-ahead error back #2147

Open

HADOOP-16852: Report read-ahead error back #1898

HADOOP-16852: Report read-ahead error back #1898

Uh oh!

Conversation

snvijaya commented Mar 17, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

snvijaya commented Mar 19, 2020

Uh oh!

Uh oh!

snvijaya commented Mar 23, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snvijaya commented Mar 30, 2020

Uh oh!

DadanielZ left a comment

Choose a reason for hiding this comment

Uh oh!

snvijaya commented Apr 1, 2020

Uh oh!

snvijaya commented Apr 1, 2020

Uh oh!

steveloughran commented Apr 6, 2020

Uh oh!

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bilaharith left a comment

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Apr 27, 2020

Uh oh!

snvijaya commented May 19, 2020

Uh oh!

snvijaya commented May 19, 2020

Uh oh!

snvijaya commented May 19, 2020

Uh oh!

snvijaya commented May 19, 2020

Uh oh!

steveloughran commented Jul 6, 2020

Uh oh!

mukund-thakur Aug 11, 2021

Choose a reason for hiding this comment

Uh oh!

snvijaya Aug 11, 2021

Choose a reason for hiding this comment

Uh oh!

mukund-thakur Aug 11, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants