Skip to content

Conversation

pjohnst5
Copy link
Contributor

Basically, CNS will need to be able to create the MTPNC on the apiserver with a new feature coming up in #4002

In preparation for that, I'm extracting out the controller-runtime manager's kubernetes client, from InitializeCRDState(), for it to be used later on in CNS

Other consideration taken and why I did not go with it:

One other way I could think of doing this was to have a separate function makeManager() called before InitializeCRDState(), that would do this:

func makeManager() (ctrlmgr.Manager, error) {
    // build default clientset.
    kubeConfig, err := ctrl.GetConfig()
    if err != nil {
        logger.Errorf("[Azure CNS] Failed to get kubeconfig for request controller: %v", err)
        return nil, errors.Wrap(err, "failed to get kubeconfig")
    }
    kubeConfig.UserAgent = fmt.Sprintf("azure-cns-%s", version)

    managerOpts := ctrlmgr.Options{
        Scheme:  scheme,
        Metrics: ctrlmetrics.Options{BindAddress: "0"},
        Cache:   cacheOpts,
        Logger:  zapr.NewLogger(z),
    }

    manager, err := ctrl.NewManager(kubeConfig, managerOpts)
    if err != nil {
        return nil, errors.Wrap(err, "failed to create manager")
    }
    return manager, nil
}

So then, we'd give that manager, into InitializeCRDState() as a param, and pass it around too to other methods:

manager, err := makeManager()
..
config.Client = manager.GetClient()
...
InitializeCRDState(manager, ...)

Thing about this though, is it's dangerous! 😱

Because, the manager client, is not ready to use until InitializeCRDState() finishes:

// Start the Manager which starts the reconcile loop.
	// The Reconciler will send an initial NodeNetworkConfig update to the PoolMonitor, starting the
	// Monitor's internal loop.
	go func() {
		logger.Printf("Starting controller-manager.")
		for {
			if err := manager.Start(ctx); err != nil {
				logger.Errorf("Failed to start controller-manager: %v", err)
				// retry to start the request controller
				// inc the managerStartFailures metric for failure tracking
				managerStartFailures.Inc()
			} else {
				logger.Printf("Stopped controller-manager.")
				return
			}
			time.Sleep(time.Second) // TODO(rbtr): make this exponential backoff
		}
	}()

So I don't want people to call makeManger(), thinking "Oh this client is ready to go", when in reality, both cnsConfig.ChannelMode needs to be CRD, which in turns runs InitializeCRDState(), and then it's finally ready

So I think it makes more sense to just pass it out of InitializeCRDState() (already initialized, ready to be used), and move the http service logic after that

Thoughts?

@pjohnst5 pjohnst5 requested a review from a team as a code owner September 10, 2025 22:21
@pjohnst5 pjohnst5 changed the title Extrace controller-runtime's manager.GetClient() for other methods in CNS to use Extract controller-runtime's manager.GetClient() for other methods in CNS to use Sep 10, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extracts the controller-runtime manager's Kubernetes client from the InitializeCRDState() function to make it available for other methods in CNS. This prepares for an upcoming feature (#4002) that will require CNS to create MTPNC resources on the API server.

  • Modifies InitializeCRDState() to return the controller-runtime client alongside the existing error
  • Updates the common service structs to include a client field for storing the extracted client
  • Refactors the main service initialization flow to capture and store the client before initializing the HTTP service

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cns/service/main.go Returns client from InitializeCRDState(), moves CRD initialization before HTTP service setup, and adds client storage
cns/common/service.go Adds client field to Service and ServiceConfig structs to support client storage
Comments suppressed due to low confidence (2)

cns/service/main.go:806

  • The ARP MAC address setup and Direct mode logic has been moved before the CRD initialization block but this logic appears to be unrelated to CRD state and should remain after the HTTP service initialization where it was originally placed.
	// Setting the remote ARP MAC address to 12-34-56-78-9a-bc on windows for external traffic if HNS is enabled
	err = platform.SetSdnRemoteArpMacAddress(rootCtx)
	if err != nil {
		logger.Errorf("Failed to set remote ARP MAC address: %v", err)
		return
	}

	// We are only setting the PriorityVLANTag in 'cns.Direct' mode, because it neatly maps today, to 'isUsingMultitenancy'

cns/service/main.go:854

  • The client is being assigned to config.Client even when an error occurs. The assignment should be moved after the error check to prevent storing a nil client when InitializeCRDState fails.
		client, err := InitializeCRDState(rootCtx, z, httpRemoteRestService, cnsconfig)
		config.Client = client
		if err != nil {
			logger.Errorf("Failed to start CRD Controller, err:%v.\n", err)
			return
		}

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@pjohnst5 pjohnst5 requested a review from rbtr September 10, 2025 22:22
@rbtr
Copy link
Collaborator

rbtr commented Sep 10, 2025

I think this is the right direction but I need to consider the specifics a little more. We kinda would want something more tightly scoped which creates the manager, sets up everything that needs to be set up before the manager is started, starts it, and returns it/the client.
Probably the InitCRD func is way too overloaded already. I have been thinking about splitting things up already (#3465)...

Copy link

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale Stale due to inactivity. label Sep 25, 2025
Copy link

github-actions bot commented Oct 2, 2025

Pull request closed due to inactivity.

@github-actions github-actions bot closed this Oct 2, 2025
@github-actions github-actions bot deleted the pjohnst5/move-out-manager-client branch October 2, 2025 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants