-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-16540. Data locality is lost when DataNode pod restarts in kubernetes (#4170) #4246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16540. Data locality is lost when DataNode pod restarts in kubernetes (#4170) #4246
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
Looking at these test failures:
Let me try pushing a patch w/ no functional change to see what list of failures I get. |
0f8ad00 to
e9049bd
Compare
|
💔 -1 overall
This message was automatically generated. |
|
Node seems to be having issues... java.lang.OutOfMemoryError: unable to create new native thread Let me try a new push. |
e9049bd to
306eb36
Compare
|
💔 -1 overall
This message was automatically generated. |
306eb36 to
2966e53
Compare
|
💔 -1 overall
This message was automatically generated. |
|
A noop patch has these failures: A push with the backport has these failures: One overlap: TestDataNodeRollingUpgrade. Other seems unrelated. Let me try a repush. |
2966e53 to
53fdbf6
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
Removing a space had us run more tests and 4 tests failed instead of 44 on previous run. Below is the change in last run. The tests that failed on this run: do not overlap at all with the failures in the previous run. I see the branch-mvninstall-root failed with "unable to create new native thread" Let me try one more run shifting a space and see what I get. I don't think this patch has any relation to the failures I'm seeing. I'll merge unless I see an obvious relation in the next run. |
|
💔 -1 overall
This message was automatically generated. |
|
Two failures: TestBPOfferService.testMissBlocksWhenReregister They come up often enough. Let me try again. Meantime running locally. |
53773ea to
007c9e8
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
…netes (apache#4170) Cherry-pick of 9ed8d60
21686a2 to
a509522
Compare
|
💔 -1 overall
This message was automatically generated. |
|
I ran the two test below in loops locally. TestBPOfferService.testMissBlocksWhenReregister The first failed once out of ten cycles both when the patch was in place and when not (jibes w/ what we see here in test runs where sometimes it fails but not always). TestUnderReplicatedBlocks.testSetRepIncWithUnderReplicatedBlocks shows up consistently but when I run it locally in multiple cycles, it passes whether the patch is applied or not. I see that in the last full branch-3.3 run, back on May 5th (https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/54/), it failed for same reason (the May 12th run was incomplete). This test is about block replication where the PR here is about a minor adjustment in NN node accounting. Unrelated I'd say. Pushing the backport. Will push in the morning. |
|
Thanks a lot, @saintstack! |
|
I don't get what changes here. Maybe I missed something? Thanks. |
|
I think the wrong change was accidentally committed - also made the comment on the JIRA just now: https://issues.apache.org/jira/browse/HDFS-16540?focusedCommentId=17767163 |
Description of PR
Cherry-pick of 9ed8d60
How was this patch tested?
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?