Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,9 @@ private NodeForPreemption getPreemptionCandidatesOnNode(
Map<ApplicationAttemptId, Set<RMContainer>> selectedCandidates,
Resource totalPreemptionAllowed, boolean readOnly) {
RMContainer reservedContainer = node.getReservedContainer();
if (reservedContainer == null) {
return null;
}
Resource available = Resources.clone(node.getUnallocatedResource());
Resource totalSelected = Resources.createResource(0);
List<RMContainer> sortedRunningContainers =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -757,10 +757,9 @@ private void completeOustandingUpdatesWhichAreReserved(
RMContainer rmContainer, ContainerStatus containerStatus,
RMContainerEventType event) {
N schedulerNode = getSchedulerNode(rmContainer.getNodeId());
if (schedulerNode != null &&
schedulerNode.getReservedContainer() != null) {
if (schedulerNode != null) {
RMContainer resContainer = schedulerNode.getReservedContainer();
if (resContainer.getReservedSchedulerKey() != null) {
if (resContainer != null && resContainer.getReservedSchedulerKey() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to add a try catch module. I have made this change in the production environment before, but it still reports a null pointer.

Copy link
Contributor Author

@TaoYang526 TaoYang526 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zeekling for the review.
I'm not sure why your environment still reported NPE after changing like that, resContainer.getReservedSchedulerKey() won't throw NPE any more with the previous not-null check: if resContainer is null. Could you please attach some details of the NPE and your changes?
For this change, I think NPE should be fixed instead of be caught in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified like your change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any chance to reproduce the NPE, could you please explain in which case NPE can be thrown again? If we can locate that risk point, I would like to fix it instead of catching it in try block. Thanks.

ContainerId containerToUpdate = resContainer
.getReservedSchedulerKey().getContainerToUpdate();
if (containerToUpdate != null &&
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -858,12 +858,13 @@ private ContainerAllocation allocate(Resource clusterResource,
FiCaSchedulerNode node = iter.next();

// Do not schedule if there are any reservations to fulfill on the node
RMContainer nodeReservedContainer = node.getReservedContainer();
if (iter.hasNext() &&
node.getReservedContainer() != null &&
nodeReservedContainer != null &&
isSkipAllocateOnNodesWithReservedContainer()) {
LOG.debug("Skipping scheduling on node {} since it has already been"
+ " reserved by {}", node.getNodeID(),
node.getReservedContainer().getContainerId());
nodeReservedContainer.getContainerId());
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.NODE_HAS_BEEN_RESERVED, ActivityLevel.NODE);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -520,13 +520,13 @@ public boolean accept(Resource cluster,
// When reserve a resource (state == NEW is for new container,
// state == RUNNING is for increase container).
// Just check if the node is not already reserved by someone
if (schedulerContainer.getSchedulerNode().getReservedContainer()
!= null) {
RMContainer reservedContainer =
schedulerContainer.getSchedulerNode().getReservedContainer();
if (reservedContainer != null) {
if (LOG.isDebugEnabled()) {
LOG.debug("Try to reserve a container, but the node is "
+ "already reserved by another container="
+ schedulerContainer.getSchedulerNode()
.getReservedContainer().getContainerId());
+ reservedContainer.getContainerId());
}
return false;
}
Expand Down
Loading