Skip to content

Conversation

@snay2
Copy link
Contributor

@snay2 snay2 commented Sep 10, 2021

Issue #, if available: Fixes #386

Description of changes:

drainEvent.Description indicates the reason for needing to cordon and/or drain a node, as well as some other details like the time the event will occur. Currently, the following reasons are possible:

These changes pass the cordon reason down to each level where it may be needed for a log message but do not otherwise change the cordon/drain behavior.

An example log message, before the change:

2021/09/13 18:28:28 INF Node successfully cordoned and drained node_name=ip-192-168-0-1.ec2.internal

After:

2021/09/13 18:28:28 INF Node successfully cordoned and drained node_name=ip-192-168-0-1.ec2.internal reason="ASG Lifecycle Termination event received. Instance will be interrupted at 2021-09-13 18:28:16 +0000 UTC \n"

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@snay2 snay2 requested a review from bwagner5 September 10, 2021 22:13
Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like one of the unit tests needs updated with the new function signature.

pkg/node/node_test.go:64:28: not enough arguments in call to tNode.CordonAndDrain
	have (string)
	want (string, string)

return err
} else {
log.Info().Str("node_name", nodeName).Msg("Node successfully cordoned")
log.Info().Str("node_name", nodeName).Str("reason", drainEvent.Kind).Msg("Node successfully cordoned")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than using drainEvent.Kind can we use drainEvent.Description? On one hand, Kind seems like a minimal amount of information which is pretty nice. But the problem is that SQS_TERMINATION doesn't really have any meaning. When NTH is running in sqs-processor mode, all the reasons will be SQS_TERMINATION even when the SQS_TERMINATION event could actually be a SPOT_ITN, SPOT_RBR, ASG_LIFECYCLE_HOOK, etc.

wdut?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the SQS case specifically needs more detail. Description looks like a much better choice. Will update the PR to use that instead.

This gives more detail, especially about SQS events. Also update unit
tests.
Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

logLevel info should state the reason for cordoning

2 participants