Skip to content

Conversation

@bwagner5
Copy link
Contributor

@bwagner5 bwagner5 commented Apr 8, 2021

Issue #, if available:
N/A

Description of changes:
Fix undefined log levels:

BEFORE:

2021/04/02 15:44:08 ??? Trying to get token from IMDSv2
2021/04/02 15:44:15 ??? Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Put \"http://169.254.169.254/latest/api/token\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
2021/04/02 15:44:18 ??? Request failed. Attempts remaining: 2
2021/04/02 15:44:18 ??? Sleep for 2.973889705s seconds
2021/04/02 15:44:23 ??? Request failed. Attempts remaining: 1
2021/04/02 15:44:23 ??? Sleep for 8.84038228s seconds
2021/04/02 15:44:34 ??? Unable to fetch metadata from IMDS error="Unable to parse metadata response: Unable to get a response from IMDS: Get \"http://169.254.169.254/latest/dynamic/instance-identity/document\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
2021/04/02 15:44:34 ??? aws-node-termination-handler arguments: 

AFTER:

build/node-termination-handler --dry-run --node-name hi
2021/04/08 15:37:37 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:37 ERR Unable to fetch metadata from IMDS error="Metadata request received http status code: 403"
2021/04/08 15:37:37 INF aws-node-termination-handler arguments:
	dry-run: true,
	node-name: hi,
	metadata-url: http://169.254.169.254,
	kubernetes-service-host: ,
	kubernetes-service-port: ,
	delete-local-data: true,
	ignore-daemon-sets: true,
	pod-termination-grace-period: -1,
	node-termination-grace-period: 120,
	enable-scheduled-event-draining: false,
	enable-spot-interruption-draining: true,
	enable-sqs-termination-draining: false,
	enable-rebalance-monitoring: false,
	metadata-tries: 3,
	cordon-only: false,
	taint-node: false,
	json-logging: false,
	log-level: INFO,
	webhook-proxy: ,
	webhook-headers: <not-displayed>,
	webhook-url: ,
	webhook-template: <not-displayed>,
	uptime-from-file: ,
	enable-prometheus-server: false,
	prometheus-server-port: 9092,
	aws-region: us-east-1,
	queue-url: ,
	check-asg-tag-before-draining: true,
	managed-asg-tag: aws-node-termination-handler/managed,
	aws-endpoint: ,

2021/04/08 15:37:37 INF Started watching for interruption events
2021/04/08 15:37:37 INF Kubernetes AWS Node Termination Handler has started successfully!
2021/04/08 15:37:37 INF Started watching for event cancellations
2021/04/08 15:37:37 INF Started monitoring for events event_type=SPOT_ITN
2021/04/08 15:37:39 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:39 WRN There was a problem monitoring for events error="There was a problem checking for spot ITNs: Metadata request received http status code: 403" event_type=SPOT_ITN
2021/04/08 15:37:41 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:41 WRN There was a problem monitoring for events error="There was a problem checking for spot ITNs: Metadata request received http status code: 403" event_type=SPOT_ITN
2021/04/08 15:37:43 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:43 WRN There was a problem monitoring for events error="There was a problem checking for spot ITNs: Metadata request received http status code: 403" event_type=SPOT_ITN
2021/04/08 15:37:45 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:45 WRN There was a problem monitoring for events error="There was a problem checking for spot ITNs: Metadata request received http status code: 403" event_type=SPOT_ITN
2021/04/08 15:37:45 WRN Stopping NTH - Duplicate Error Threshold hit.
panic: There was a problem checking for spot ITNs: Metadata request received http status code: 403

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@bwagner5 bwagner5 requested review from brycahta and haugenj April 8, 2021 20:39
Copy link
Contributor

@brycahta brycahta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Had some minor comments/questions

err = json.NewDecoder(strings.NewReader(identityDoc)).Decode(&metadata)
if err != nil {
log.Log().Msg("Unable to fetch instance identity document from ec2 metadata")
log.Warn().Msg("Unable to fetch instance identity document from ec2 metadata")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Err(err) ?

  • Both comments are nit for sure, but thinking if there's a log within if err != nil then it should be error for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can make a strict rule like that. We have a fallback for this error, so I think it can stay a Warn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree that all loggin within err != nil should be error level, but we should come up with a consistent understanding of what these levels mean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed the rule can't be that strict, but if having a fallback doesn't classify as error, then what about:

ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1

v1 is the fallback in this case? Are there any formal docs for log-levels similar to semantic versioning or something? Agreed we need some kinda consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I changed that one to WARN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything formal for log levels... I think we probably log way too much in NTH, so maybe we should consider what actually makes sense to log so that someone can view the logs and be confident things are working and have enough information to debug certain error scenarios.

}
if duplicateErrCount >= duplicateErrThreshold {
log.Log().Msg("Stopping NTH - Duplicate Error Threshold hit.")
log.Warn().Msg("Stopping NTH - Duplicate Error Threshold hit.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.panic since we're panicking ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but IMHO it's clearer when they're separate. I don't think I'd expect a log to panic if I was reading and I didn't know of that weirdness.

err = json.NewDecoder(strings.NewReader(identityDoc)).Decode(&metadata)
if err != nil {
log.Log().Msg("Unable to fetch instance identity document from ec2 metadata")
log.Warn().Msg("Unable to fetch instance identity document from ec2 metadata")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree that all loggin within err != nil should be error level, but we should come up with a consistent understanding of what these levels mean

@bwagner5 bwagner5 requested review from brycahta and haugenj April 8, 2021 21:22
Copy link
Contributor

@brycahta brycahta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants