Skip to content

Commit 4335656

Browse files
committed
xpumanager sidecar: remove HTTPS use without certificates
Add deployment that uses cert-manager to provide self-signed certificates Add functionality to verify server endpoint in the sidecar Signed-off-by: Tuomas Katila <[email protected]>
1 parent 404508a commit 4335656

File tree

9 files changed

+160
-58
lines changed

9 files changed

+160
-58
lines changed

cmd/xpumanager_sidecar/README.md

Lines changed: 34 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Intel GPUs can be interconnected via an XeLink. In some workloads it is benefici
2323
| -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) |
2424
| -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links |
2525
| -allow-subdeviceless-links | bool | false | Include xelinks also for devices that do not have subdevices |
26-
| -use-https | bool | false | Use HTTPS protocol when connecting to XPU Manager |
26+
| -cert | string | "" | Use HTTPS and verify server's endpoint |
2727

2828
The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options.
2929

@@ -50,7 +50,7 @@ See [the development guide](../../DEVEL.md) for details if you want to deploy a
5050
Install XPU Manager daemonset with the XeLink sidecar
5151

5252
```bash
53-
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar?ref=<RELEASE_VERSION>'
53+
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http?ref=<RELEASE_VERSION>'
5454
```
5555

5656
Please see XPU Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes).
@@ -60,7 +60,7 @@ Please see XPU Manager Kubernetes files for additional info on [installation](ht
6060
Use patch to add sidecar into the XPU Manager daemonset.
6161

6262
```bash
63-
$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml?ref=<RELEASE_VERSION>'
63+
$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http/xpumanager.yaml?ref=<RELEASE_VERSION>'
6464
```
6565

6666
NOTE: The sidecar patch will remove other resources from the XPU Manager container. If your XPU Manager daemonset is using, for example, the smarter device manager resources, those will be removed.
@@ -76,7 +76,25 @@ master,0.0-1.0_0.1-1.1
7676

7777
### Use HTTPS with XPU Manager
7878

79-
XPU Manager can be configured to use HTTPS on the metrics interface. For the gunicorn sidecar, cert and key files have to be added to the command:
79+
There is an alternative deployment that uses HTTPS instead of HTTP. The reference deployment requires `cert-manager` to provide a certificate for TLS. To deploy:
80+
81+
```bash
82+
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/cert-manager?ref=<RELEASE_VERSION>'
83+
```
84+
85+
The deployment requests a certificate and key from `cert-manager`. They are then provided to the gunicorn container as secrets and are used in the HTTPS interface. The sidecar container uses the same certificate to verify the server.
86+
87+
> *NOTE*: The HTTPS deployment uses self-signed certificates. For production use, the certificates should be properly set up.
88+
89+
<details>
90+
<summary>Enabling HTTPS manually</summary>
91+
92+
If one doesn't want to use `cert-manager`, the same can be achieved manually by creating certificates with openssl and then adding it to the deployment. The steps are roughly:
93+
1) Create a certificate with [openssl](https://www.linode.com/docs/guides/create-a-self-signed-tls-certificate/)
94+
1) Create a secret from the [certificate & key](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/).
95+
1) Change the deployment:
96+
97+
* Add certificate and key to gunicorn container:
8098
```
8199
- command:
82100
- gunicorn
@@ -87,8 +105,7 @@ XPU Manager can be configured to use HTTPS on the metrics interface. For the gun
87105
- xpum_rest_main:main()
88106
```
89107

90-
The gunicorn container will also need the tls.crt and tls.key files within the container. For example:
91-
108+
* Add secret mounting to the Pod:
92109
```
93110
containers:
94111
- name: python-exporter
@@ -101,44 +118,19 @@ The gunicorn container will also need the tls.crt and tls.key files within the c
101118
secret:
102119
defaultMode: 420
103120
secretName: xpum-server-cert
104-
```
105-
106-
In this case, the secret providing the certificate and key is called `xpum-server-cert`.
107-
108-
The certificate and key can be [added manually to a secret](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/). Another way to achieve a secret is to leverage [cert-manager](https://cert-manager.io/).
109-
110-
<details>
111-
<summary>Example for the Cert-manager objects</summary>
112-
113-
Cert-manager will create a self-signed certificate and the private key, and store them into a secret called `xpum-server-cert`.
121+
```
114122

123+
* Add use-https and cert to sidecar
115124
```
116-
apiVersion: cert-manager.io/v1
117-
kind: Issuer
118-
metadata:
119-
name: selfsigned-issuer
120-
spec:
121-
selfSigned: {}
122-
---
123-
apiVersion: cert-manager.io/v1
124-
kind: Certificate
125-
metadata:
126-
name: serving-cert
127-
spec:
128-
dnsNames:
129-
- xpum.svc
130-
- xpum.svc.cluster.local
131-
issuerRef:
132-
kind: Issuer
133-
name: selfsigned-issuer
134-
secretName: xpum-server-cert
125+
name: xelink-sidecar
126+
volumeMounts:
127+
- mountPath: /certs
128+
name: certs
129+
readOnly: true
130+
args:
131+
...
132+
- --cert=/certs/tls.crt
133+
...
135134
```
136135

137136
</details>
138-
139-
For the XPU Manager sidecar, `use-https` has to be added to the arguments. Then the sidecar will leverage HTTPS with the connection to the metrics interface.
140-
```
141-
args:
142-
- -v=2
143-
- -use-https
144-
```

cmd/xpumanager_sidecar/main.go

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ import (
1919
"bytes"
2020
"context"
2121
"crypto/tls"
22+
"crypto/x509"
2223
"flag"
2324
"fmt"
2425
"io"
@@ -61,12 +62,12 @@ type xpuManagerSidecar struct {
6162
dstFilePath string
6263
labelNamespace string
6364
url string
65+
certFile string
6466
interval uint64
6567
startDelay uint64
6668
xpumPort uint64
6769
laneCount uint64
6870
allowSubdevicelessLinks bool
69-
useHTTPS bool
7071
}
7172

7273
func (e *invalidEntryErr) Error() string {
@@ -78,12 +79,30 @@ func (xms *xpuManagerSidecar) getMetricsDataFromXPUM() []byte {
7879
Timeout: 5 * time.Second,
7980
}
8081

81-
if xms.useHTTPS {
82-
customTransport := http.DefaultTransport.(*http.Transport).Clone()
83-
//#nosec
84-
customTransport.TLSClientConfig = &tls.Config{InsecureSkipVerify: true}
82+
if len(xms.certFile) > 0 {
83+
cert, err := os.ReadFile(xms.certFile)
84+
if err != nil {
85+
klog.Warning("Failed to read cert: ", err)
86+
87+
return nil
88+
}
8589

86-
client.Transport = customTransport
90+
certPool := x509.NewCertPool()
91+
if !certPool.AppendCertsFromPEM(cert) {
92+
klog.Warning("Adding server cert to pool failed")
93+
94+
return nil
95+
}
96+
97+
tr := &http.Transport{
98+
TLSClientConfig: &tls.Config{
99+
MinVersion: tls.VersionTLS12,
100+
RootCAs: certPool,
101+
ServerName: "127.0.0.1",
102+
},
103+
}
104+
105+
client.Transport = tr
87106
}
88107

89108
ctx := context.Background()
@@ -380,7 +399,7 @@ func main() {
380399
flag.Uint64Var(&xms.laneCount, "lane-count", 4, "minimum lane count for xelink")
381400
flag.StringVar(&xms.labelNamespace, "label-namespace", "gpu.intel.com", "namespace for the labels")
382401
flag.BoolVar(&xms.allowSubdevicelessLinks, "allow-subdeviceless-links", false, "allow xelinks that are not tied to subdevices (=1 tile GPUs)")
383-
flag.BoolVar(&xms.useHTTPS, "use-https", false, "Use HTTPS protocol to connect to xpumanager")
402+
flag.StringVar(&xms.certFile, "cert", "", "Use HTTPS and verify server's endpoint")
384403
klog.InitFlags(nil)
385404

386405
flag.Parse()
@@ -390,7 +409,8 @@ func main() {
390409
}
391410

392411
protocol := "http"
393-
if xms.useHTTPS {
412+
413+
if len(xms.certFile) > 0 {
394414
protocol = "https"
395415
}
396416

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
resources:
2+
- https://github.com/intel/xpumanager/deployment/kubernetes/daemonset/base/?ref=V1.2.38

deployments/xpumanager_sidecar/kustomization.yaml

Lines changed: 0 additions & 7 deletions
This file was deleted.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
apiVersion: cert-manager.io/v1
2+
kind: Issuer
3+
metadata:
4+
name: selfsigned-issuer
5+
spec:
6+
selfSigned: {}
7+
---
8+
apiVersion: cert-manager.io/v1
9+
kind: Certificate
10+
metadata:
11+
name: serving-cert
12+
spec:
13+
ipAddresses:
14+
- "127.0.0.1"
15+
privateKey:
16+
rotationPolicy: Always
17+
issuerRef:
18+
kind: Issuer
19+
name: selfsigned-issuer
20+
secretName: xpum-server-cert
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
resources:
2+
- ../../base
3+
- certs.yaml
4+
namespace: monitoring
5+
apiVersion: kustomize.config.k8s.io/v1beta1
6+
kind: Kustomization
7+
patches:
8+
- path: xpumanager.yaml
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
apiVersion: apps/v1
2+
kind: DaemonSet
3+
metadata:
4+
labels:
5+
app: intel-xpumanager
6+
name: intel-xpumanager
7+
spec:
8+
template:
9+
spec:
10+
volumes:
11+
- name: features-d
12+
hostPath:
13+
path: "/etc/kubernetes/node-feature-discovery/features.d/"
14+
- name: xpum-cert
15+
secret:
16+
secretName: xpum-server-cert
17+
containers:
18+
- name: python-exporter
19+
volumeMounts:
20+
- name: xpum-cert
21+
mountPath: "/cert"
22+
command:
23+
- gunicorn
24+
- --bind
25+
- 0.0.0.0:29999
26+
- --worker-connections
27+
- "64"
28+
- --worker-class
29+
- gthread
30+
- --workers
31+
- "1"
32+
- --threads
33+
- "4"
34+
- --keyfile=/cert/tls.key
35+
- --certfile=/cert/tls.crt
36+
- xpum_rest_main:main()
37+
startupProbe:
38+
httpGet:
39+
scheme: HTTPS
40+
livenessProbe:
41+
httpGet:
42+
scheme: HTTPS
43+
- name: xelink-sidecar
44+
image: intel/intel-xpumanager-sidecar:devel
45+
imagePullPolicy: IfNotPresent
46+
args:
47+
- -v=2
48+
- --cert=/cert/tls.crt
49+
volumeMounts:
50+
- name: features-d
51+
mountPath: "/etc/kubernetes/node-feature-discovery/features.d/"
52+
- name: xpum-cert
53+
mountPath: "/cert"
54+
securityContext:
55+
allowPrivilegeEscalation: false
56+
capabilities:
57+
drop:
58+
- ALL
59+
readOnlyRootFilesystem: true
60+
runAsUser: 0
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
resources:
2+
- ../../base
3+
namespace: monitoring
4+
apiVersion: kustomize.config.k8s.io/v1beta1
5+
kind: Kustomization
6+
patches:
7+
- path: xpumanager.yaml

deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml renamed to deployments/xpumanager_sidecar/overlays/http/xpumanager.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ spec:
1414
containers:
1515
- name: xelink-sidecar
1616
image: intel/intel-xpumanager-sidecar:devel
17-
imagePullPolicy: Always
17+
imagePullPolicy: IfNotPresent
1818
args:
1919
- -v=2
2020
volumeMounts:

0 commit comments

Comments
 (0)