-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
If the DescribeInstances call fails from the EC2 Discovery plugin for any reason, the code just returns an empty list of nodes. This is bad because the code currently caches it until the refresh interval expires. This is bad because the code uses the empty list of nodes immediately, and will try to make the call again on the next get, which potentially doesn't include any retry back-off.
~~With the default refresh of 10s this is sometimes not catastrophic; however, if throttling is happening a lot it can potentially cause the masters to not be able to communicate with one another and lead to cluster instability. ~~
Also, with this bug, increasing the refresh interval is dangerous because the empty results list is cached until the refresh interval expires. The code should probably not return empty list if it is being throttled and continue to use the list from the last successful call, or possibly retry more with exponential back-off for throttling exceptions.