Skip to content

Conversation

leblancd
Copy link

@leblancd leblancd commented Nov 5, 2018

What type of PR is this?
/kind bug

What this PR does / why we need it:
Kubeadm init with an IPv6-only configuration is currently failing to bring up a cluster. In this failure condition, kubeadm is waiting forever for the control plane to come up. This failure is described in kubernetes/kubeadm Issue # 1212. The root cause is that kubeadm is generating an erroneous etcd.yaml manifest that has entries of the form:

   - --advertise-client-urls=https://fd00:20::2:2379

And this is causing the etcd server to crash.
For IPv6 addresses, this URL should use square brackets around the IPv6 address, e.g.:

   - --advertise-client-urls=https://[fd00:20::2]:2379

The fix is to use net.JoinHostPort() to generate URLs such as the above.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes kubernetes/kubeadm # 1212

Special notes for your reviewer:

Does this PR introduce a user-facing change?:
NONE

NONE

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 5, 2018
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 5, 2018
@leblancd
Copy link
Author

leblancd commented Nov 5, 2018

/area ipv6

@k8s-ci-robot k8s-ci-robot added area/ipv6 release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 5, 2018
Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leblancd thanks for your contribution! Looks OK to me.

One tiny nit though. It looks to me, that we have too many times code in the spirit of:

fmt.Sprintf("https://%s", net.JoinHostPort(cfg.APIEndpoint.AdvertiseAddress, strconv.Itoa(kubeadmconstants.EtcdListenClientPort)))

Perhaps we should add an utility function to k8s.io/kubernetes/cmd/kubeadm/app/util/etcd that takes AdvertiseAddress as an argument?

@rosti
Copy link
Contributor

rosti commented Nov 5, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 5, 2018
Copy link
Contributor

@pmichali pmichali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm Looks like that'll resolve the regeression.

@leblancd
Copy link
Author

leblancd commented Nov 5, 2018

@rosti - I can create a utility function for generating the URLs. This would need to be passed the port as well as the IP address / host. Any preference for name, e.g. etcdutil.URL() or etcdutil.genURL()?
BTW, looks like a github outage is causing test failures?

Copy link
Contributor

@pmichali pmichali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I see there is a TestGetEtcdCommand() that at least tests some of the URLs generated. Could you add some test cases there, for IPv6? Can you check the other areas to see if there are existing tests that could be adapted to test IPv6 variants?

It looks like there are only two ports used. It would be useful to have a pair of helper methods to generate the URL strings, and use them, to reduce duplication.

@rosti
Copy link
Contributor

rosti commented Nov 5, 2018

@leblancd you can add a couple of tiny utility funcs in there, named GetClientURL and GetPeerURL, so that we keep code more understandable and not have to pass the port constants.

@leblancd
Copy link
Author

leblancd commented Nov 5, 2018

@rosti - Roger that, I understand your earlier comment now, thanks.

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leblancd thank you for this bug fix.
the _test.go files accompanying the touched files would need some updates too.
i.e. we need to add test coverage for IPv6 where applicable.

@neolit123
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 5, 2018
@neolit123
Copy link
Member

@pmichali your PR here is newer and seems to be conflicting with this one on the kubeadm side.

@pmichali and @leblancd please resolve which PR is going to touch kubeadm. we can make this the kubeadm change and the other one the kubelet change.
thanks.

@leblancd leblancd force-pushed the kubeadm_etcd_v6_fix branch from 9b63f43 to d28e564 Compare November 5, 2018 23:02
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 5, 2018
@leblancd
Copy link
Author

leblancd commented Nov 5, 2018

@neolit123 - @pmichali 's PR #70659 is temporarily including my changes because his fix depends on my fix being in place. This change (PR #70633) should be considered the fix for IPv6 errors in kubeadm etcd.yaml .

@leblancd
Copy link
Author

leblancd commented Nov 5, 2018

@neolit123 , @pmichali - I've added unit test cases for local.go and etcd.go.

@leblancd leblancd force-pushed the kubeadm_etcd_v6_fix branch from d28e564 to b3821fb Compare November 5, 2018 23:20
@neolit123
Copy link
Member

thanks for the update.
/retest

Copy link
Contributor

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like one of the intel folks to review in the broader context, b/c they've done most of the IPv6 work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this really needs to live here, but I don't think it's blocking for me.

/approve
/assign @kad
/cc @bart0sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do simpler "https://" + net.JoinHostPort(...) instead of calling that function, but this not critical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leblancd please just remove the utility as per the above comments then LGTM.
thanks!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: leblancd, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 6, 2018
When 'kubeadm init ...' is used with an IPv6 kubeadm configuration,
kubeadm currently generates an etcd.yaml manifest that uses IP:port
combinatins where the IP is an IPv6 address, but it is not enclosed
in square brackets, e.g.:
    - --advertise-client-urls=https://fd00:20::2:2379
For IPv6 advertise addresses, this should be of the form:
    - --advertise-client-urls=https://[fd00:20::2]:2379

The lack of brackets around IPv6 addresses in cases like this is
causing failures to bring up IPv6-only clusters with Kubeadm as
described in kubernetes/kubeadm Issues kubernetes#1212.

This format error is fixed by using net.JoinHostPort() to generate
URLs as shown above.

Fixes kubernetes/kubeadm Issue kubernetes#1212
@leblancd leblancd force-pushed the kubeadm_etcd_v6_fix branch from b3821fb to 9988771 Compare November 19, 2018 02:50
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2018
Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @leblancd ! Looks good!

// GetClientURL creates an HTTPS URL that uses the configured advertise
// address and client port for the API controller
func GetClientURL(cfg *kubeadmapi.InitConfiguration) string {
return "https://" + net.JoinHostPort(cfg.LocalAPIEndpoint.AdvertiseAddress, strconv.Itoa(constants.EtcdListenClientPort))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have used GetClientURLByIP here, but it's not bad the way it is either.

@rosti
Copy link
Contributor

rosti commented Nov 19, 2018

/test pull-kubernetes-e2e-gce-100-performance

@kad
Copy link
Member

kad commented Nov 20, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 20, 2018
@timothysc timothysc added this to the v1.13 milestone Nov 20, 2018
@timothysc timothysc added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Nov 20, 2018
@k8s-ci-robot k8s-ci-robot merged commit f8983a8 into kubernetes:master Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipv6 area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubeadm init failing for IPv6-only configuration
7 participants