Skip to content

Conversation

@TaoYang526
Copy link
Contributor

Description of PR

Details please refer to YARN-11732.
Add sanity check before calling internal methods of reservedContainer.

How was this patch tested?

Not necessary.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 18m 16s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 49m 25s trunk passed
+1 💚 compile 1m 4s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 compile 0m 56s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 checkstyle 0m 56s trunk passed
+1 💚 mvnsite 1m 0s trunk passed
+1 💚 javadoc 1m 0s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 0m 48s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 2m 0s trunk passed
+1 💚 shadedclient 40m 33s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 50s the patch passed
+1 💚 compile 0m 55s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javac 0m 55s the patch passed
+1 💚 compile 0m 46s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 javac 0m 46s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 45s the patch passed
+1 💚 mvnsite 0m 50s the patch passed
+1 💚 javadoc 0m 46s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 0m 42s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 2m 0s the patch passed
+1 💚 shadedclient 41m 2s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 108m 50s hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
273m 38s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7065/1/artifact/out/Dockerfile
GITHUB PR #7065
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f342e868de9d 5.15.0-119-generic #129-Ubuntu SMP Fri Aug 2 19:25:20 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 38b63ad
Default Java Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7065/1/testReport/
Max. process+thread count 937 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7065/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@TaoYang526
Copy link
Contributor Author

@szilard-nemeth @brumi1024 Could you please help to review this PR?

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TaoYang526 Great catch! LGTM. Would you mind to add some unit test to cover this case? Seems easy to reproduce when unreserve resource for one scheduler node? cc @slfan1989

if (schedulerNode != null) {
RMContainer resContainer = schedulerNode.getReservedContainer();
if (resContainer.getReservedSchedulerKey() != null) {
if (resContainer != null && resContainer.getReservedSchedulerKey() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to add a try catch module. I have made this change in the production environment before, but it still reports a null pointer.

Copy link
Contributor Author

@TaoYang526 TaoYang526 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zeekling for the review.
I'm not sure why your environment still reported NPE after changing like that, resContainer.getReservedSchedulerKey() won't throw NPE any more with the previous not-null check: if resContainer is null. Could you please attach some details of the NPE and your changes?
For this change, I think NPE should be fixed instead of be caught in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified like your change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any chance to reproduce the NPE, could you please explain in which case NPE can be thrown again? If we can locate that risk point, I would like to fix it instead of catching it in try block. Thanks.

@TaoYang526
Copy link
Contributor Author

@Hexiaoqiao Thanks for the review.
I have considered the test cases but found all of these changes are about race condition inside the method (private visibility or totally located in the method), just like this if(node.getReservedContainer() != null){ LOG.info("... container="+ node.getReservedContainer().getContainerId()); }. It's hard to reproduce the NPE in test cases.

@Hexiaoqiao
Copy link
Contributor

Got it. +1 from my side.

@TaoYang526
Copy link
Contributor Author

Thanks @Hexiaoqiao for the review.
Do we need more reviewers before merging this PR?

@Hexiaoqiao
Copy link
Contributor

Will commit if no more comments util two workdays later. cc @TaoYang526

@Hexiaoqiao Hexiaoqiao merged commit c63aafd into apache:trunk Oct 16, 2024
1 of 3 checks passed
@Hexiaoqiao
Copy link
Contributor

Committed to trunk. Thanks @TaoYang526 for your works and @zeekling @shameersss1 for your reviews.

@TaoYang526
Copy link
Contributor Author

Thanks @Hexiaoqiao @zeekling @shameersss1 for the review and commit!

TaoYang526 added a commit that referenced this pull request Oct 17, 2024
…ainer for CapacityScheduler (#7065). Contributed by Tao Yang.

Reviewed-by: Syed Shameerur Rahman <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
TaoYang526 added a commit that referenced this pull request Oct 17, 2024
…ainer for CapacityScheduler (#7065). Contributed by Tao Yang.

Reviewed-by: Syed Shameerur Rahman <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
@zuston
Copy link
Member

zuston commented Dec 24, 2024

Nice catch! I encounter the same problem. @TaoYang526

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants