Skip to content

Commit 31dcafd

Browse files
Publish system logs best practices for KKP 2.29 (#2024)
Signed-off-by: Waleed Malik <[email protected]>
1 parent 0c4c5f5 commit 31dcafd

File tree

2 files changed

+138
-0
lines changed

2 files changed

+138
-0
lines changed

content/kubeone/main/security/_index.en.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,8 @@ chapter = true
66
+++
77

88
# Security
9+
10+
## Table of Content
11+
12+
{{% children depth=5 %}}
13+
{{% /children %}}
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
+++
2+
title = "Personally Identifiable Information Analysis: Kubernetes and KKP System Logs"
3+
date = 2024-03-06T12:00:00+02:00
4+
weight = 10
5+
+++
6+
7+
This document provides a comprehensive analysis of potential Personally Identifiable Information (PII) and personal data (indirect identifiers) that may be present in system logs from Kubernetes clusters deployed using Kubermatic Kubernetes Platform (KKP).
8+
9+
**Target Audience**: Platform operators, security teams, compliance officers
10+
11+
**Prerequisites**: Basic understanding of Kubernetes and KKP
12+
13+
While KKP inherently tries to avoid logging any PII, there are some cases where it is unavoidable and outside the control of the platform operator. This could be a component that KKP ships or the underlying Kubernetes components.
14+
15+
## PII Categories (GDPR-Aligned)
16+
17+
System logs from Kubernetes clusters may contain the following types of PII:
18+
19+
### Direct Identifiers
20+
21+
* **Usernames**: Kubernetes usernames, system usernames, service account names
22+
* **Email addresses**: From TLS certificate subjects (CN, O, OU), OIDC claims, audit logs, or user labels
23+
* **IP addresses**: Client IPs
24+
25+
### Indirect Identifiers
26+
27+
* **Resource names**: Pod names, namespace names, deployment names containing user/org identifiers
28+
* Example: `webapp-john-deployment`, `john-doe-dev` namespace
29+
* **Hostnames**: Node hostnames with user or organizational patterns
30+
* Example: `worker-john-prod-01.company.com`
31+
* **Labels and annotations**: Custom metadata that may include user data
32+
* Example: `[email protected]`
33+
* **Volume paths**: Mount paths revealing directory structures with usernames
34+
* Example: `/home/john/data:/data`
35+
36+
### Cloud Provider Identifiers
37+
38+
* **Account IDs**: AWS account IDs, Azure subscription IDs, GCP project IDs
39+
* **Resource IDs**: Instance IDs, VPC IDs, volume IDs, subnet IDs, security group IDs
40+
* **DNS names**: Load balancer DNS, instance DNS names
41+
* **Geographic data**: Availability zones, regions
42+
43+
### Operational Data That May Reveal personal data
44+
45+
* **DNS queries**: Service/pod names in DNS lookups
46+
* **HTTP/gRPC metadata**: URLs, headers, cookies (if Layer 7 visibility enabled in CNI)
47+
* **Error messages**: Often contain detailed context with resource IDs and user identifiers
48+
* **Audit logs**: Comprehensive request/response data including full user context
49+
50+
## Risk Assessment Matrix
51+
52+
| Component | User Identity | IP Addresses | Credentials | Cloud IDs | Risk Level |
53+
|-----------|---------------|--------------|-------------|-----------|------------|
54+
| kube-apiserver | ✅ High | ✅ High | ✅ High | ❌ No | 🔴 **HIGH** |
55+
| kubelet | ⚠️ Medium | ✅ High | ✅ High | ❌ No | 🔴 **HIGH** |
56+
| etcd | ✅ High | ⚠️ Medium | ✅ High | ❌ No | 🔴 **HIGH** |
57+
| Cloud Controller Managers | ❌ No | ✅ High | ✅ High | ✅ High | 🔴 **HIGH** |
58+
| CSI Drivers | ❌ No | ⚠️ Medium | ✅ High | ✅ High | 🔴 **HIGH** |
59+
| Cilium | ⚠️ Medium | ✅ High | ❌ No | ❌ No | 🟡 **MEDIUM-HIGH** |
60+
| kube-controller-manager | ⚠️ Low | ⚠️ Medium | ⚠️ Medium | ⚠️ Medium | 🟡 **MEDIUM** |
61+
| kube-scheduler | ⚠️ Low | ❌ No | ❌ No | ❌ No | 🟡 **MEDIUM** |
62+
| kube-proxy | ❌ No | ✅ High | ❌ No | ❌ No | 🟡 **MEDIUM** |
63+
| CoreDNS | ⚠️ Low | ⚠️ Medium | ❌ No | ❌ No | 🟡 **MEDIUM** |
64+
| cluster-autoscaler | ⚠️ Low | ⚠️ Low | ⚠️ Low | ✅ High | 🟡 **MEDIUM** |
65+
| NodeLocalDNS | ⚠️ Low | ⚠️ Medium | ❌ No | ❌ No | 🟡 **MEDIUM** |
66+
| metrics-server | ⚠️ Low | ❌ No | ❌ No | ❌ No | 🟢 **LOW-MEDIUM** |
67+
| machine-controller | ⚠️ Low | ❌ No | ⚠️ Low | ✅ High | 🟢 **LOW** |
68+
| operating-system-manager | ⚠️ Low | ❌ No | ❌ No | ⚠️ Low | 🟢 **LOW** |
69+
70+
**Legend**:
71+
72+
* ✅ High: Frequent and detailed PII exposure
73+
* ⚠️ Medium: Moderate PII exposure
74+
* ❌ No: Minimal or no PII exposure
75+
76+
### Understanding Risk Context
77+
78+
While the risk matrix provides a helpful overview of potential PII exposure, it is important to note that the risk is not always proportional to the exposure. For example, a low-risk component may have high exposure if it is combined with a high-risk component.
79+
80+
An example of this would be a component that logs a full Kubernetes resource in case of a validation failure. The Kubernetes resource itself may contain PII, and while the fields that might contain personal data are not directly being referred to in the logs, the full resource is being logged. This results in private data being exposed to the logs. It is always recommended to review and sanitize the logs before sharing them anywhere.
81+
82+
## Log Filtering and Sanitization
83+
84+
### Automated PII Filtering
85+
86+
Implement automated filtering in your log aggregation pipeline to remove PII and personal data from the logs.
87+
88+
#### Use external tools for PII Redaction
89+
90+
* [Presidio](https://microsoft.github.io/presidio/) - A set of tools for data protection and privacy
91+
* [Azure Purview](https://learn.microsoft.com/en-us/purview/information-protection) - A cloud-based data governance service that helps you manage and protect your sensitive data
92+
93+
### Manual PII Filtering - Common patterns to filter
94+
95+
```regex
96+
# Email addresses
97+
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
98+
99+
# IPv4 addresses
100+
\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b
101+
102+
# Basic Auth in URLs
103+
https?://[^:]+:[^@]+@
104+
```
105+
106+
## Best Practices
107+
108+
### Before sharing logs with Kubermatic Support
109+
110+
1. Identify the time range needed (minimize data exposure)
111+
2. Export only relevant namespaces/components
112+
3. Run PII redaction tool or scripts
113+
4. Manual review of first 100 lines to verify redaction
114+
5. Approval from data protection officer (if required)
115+
116+
## Conclusion
117+
118+
### Key Points
119+
120+
1. Kubernetes logs contain significant PII, especially from kube-apiserver, kubelet, etcd, and all cloud provider components
121+
2. Higher log verbosity (v=4-5) dramatically increases PII exposure
122+
3. Cloud provider account identifiers are prevalent in Cloud Controller Managers (CCMs) and CSI drivers
123+
4. Automated filtering tools are essential for safe log sharing at scale
124+
5. Manual review is still necessary to catch context-specific PII
125+
126+
### Best Practice for Support
127+
128+
## Additional Resources
129+
130+
### GDPR and Privacy
131+
132+
* [GDPR Official Text](https://gdpr-info.eu/)
133+
* [Article 29 Working Party Opinion on Personal Data](https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/index_en.htm)

0 commit comments

Comments
 (0)