-
Couldn't load subscription status.
- Fork 9.1k
YARN-11732. Fix potential NPE when calling SchedulerNode#reservedContainer for CapacityScheduler #7065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ainer for CapacityScheduler
|
💔 -1 overall
This message was automatically generated. |
|
@szilard-nemeth @brumi1024 Could you please help to review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TaoYang526 Great catch! LGTM. Would you mind to add some unit test to cover this case? Seems easy to reproduce when unreserve resource for one scheduler node? cc @slfan1989
| if (schedulerNode != null) { | ||
| RMContainer resContainer = schedulerNode.getReservedContainer(); | ||
| if (resContainer.getReservedSchedulerKey() != null) { | ||
| if (resContainer != null && resContainer.getReservedSchedulerKey() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is recommended to add a try catch module. I have made this change in the production environment before, but it still reports a null pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zeekling for the review.
I'm not sure why your environment still reported NPE after changing like that, resContainer.getReservedSchedulerKey() won't throw NPE any more with the previous not-null check: if resContainer is null. Could you please attach some details of the NPE and your changes?
For this change, I think NPE should be fixed instead of be caught in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified like your change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any chance to reproduce the NPE, could you please explain in which case NPE can be thrown again? If we can locate that risk point, I would like to fix it instead of catching it in try block. Thanks.
|
@Hexiaoqiao Thanks for the review. |
|
Got it. +1 from my side. |
|
Thanks @Hexiaoqiao for the review. |
|
Will commit if no more comments util two workdays later. cc @TaoYang526 |
|
Committed to trunk. Thanks @TaoYang526 for your works and @zeekling @shameersss1 for your reviews. |
|
Thanks @Hexiaoqiao @zeekling @shameersss1 for the review and commit! |
…ainer for CapacityScheduler (#7065). Contributed by Tao Yang. Reviewed-by: Syed Shameerur Rahman <[email protected]> Signed-off-by: He Xiaoqiao <[email protected]>
…ainer for CapacityScheduler (#7065). Contributed by Tao Yang. Reviewed-by: Syed Shameerur Rahman <[email protected]> Signed-off-by: He Xiaoqiao <[email protected]>
|
Nice catch! I encounter the same problem. @TaoYang526 |
Description of PR
Details please refer to YARN-11732.
Add sanity check before calling internal methods of reservedContainer.
How was this patch tested?
Not necessary.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?