Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .typos.toml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
[default]
locale = "en-au"
locale = "en-us"

[default.extend-identifiers]
center = "center" # Due to CSS usage
authorization = "authorization" # due to rbac.authorization.k8s.io usage
[default.extend-words]
# Pydantic uses British spelling for this method name
customise = "customise"

[files]
ignore-vcs = true
Expand Down
169 changes: 169 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
## Project Overview

Netchecks is a cloud native tool for testing network conditions and asserting that they meet expectations. The repository contains two main components:

1. **Netcheck CLI/Python Library** - Command line tool and Python library for running network checks (DNS, HTTP) with customizable validation rules written in CEL (Common Expression Language)
2. **Netchecks Operator** - Kubernetes Operator that schedules and runs network checks, reporting results as PolicyReport resources

## Architecture

### Netcheck CLI (Python Library)
- **Entry point**: `netcheck/cli.py` - Uses Typer for CLI interface with commands `run`, `http`, `dns`
- **Core logic**: `netcheck/runner.py` - Contains `run_from_config()` for running multiple assertions and `check_individual_assertion()` for individual tests
- **Check implementations**: `netcheck/checks/` - Separate modules for DNS (`dns.py`), HTTP (`http.py`), and internal checks
- **Validation**: `netcheck/validation.py` - CEL expression evaluation for custom validation rules
- **Context system**: `netcheck/context.py` - Template replacement using external data from files, inline data, or directories (with lazy loading via `LazyFileLoadingDict`)

Test results include `spec` (test configuration), `data` (results), and `status` (pass/fail). Custom validation rules can reference both `data` and `spec` in CEL expressions.

### Netchecks Operator (Kubernetes)
- **Main operator**: `operator/netchecks_operator/main.py` - Kopf-based operator with handlers for NetworkAssertion CRD lifecycle
- **Configuration**: `operator/netchecks_operator/config.py` - Settings loaded from environment variables
- **Flow**:
1. NetworkAssertion CRD created → operator creates ConfigMap with rules + Job/CronJob with probe pod
2. Probe pod runs netcheck CLI with mounted config
3. Operator daemon monitors probe pod completion
4. Results extracted from pod logs and transformed into PolicyReport CRD
5. Prometheus metrics updated with test duration and results
- **Helm chart**: `operator/charts/netchecks/` - Includes NetworkAssertion and PolicyReport CRDs

Key transformation: K8s concepts (ConfigMap/Secret contexts) are mapped to CLI format (directory/file/inline contexts) via `transform_context_for_config_file()` at operator/netchecks_operator/main.py:208.

## Development Setup

The project uses **uv** for Python dependency management (not pip or poetry for the CLI).

### CLI/Library Development

Install dependencies:
```bash
uv sync
```

Run tests with coverage:
```bash
uv run pytest tests --cov netcheck --cov-report=lcov --cov-report=term
```

Run a single test:
```bash
uv run pytest tests/test_cli.py::test_name -v
```

Run the CLI locally:
```bash
uv run netcheck dns --host github.com -v
uv run netcheck http --url https://github.com/status -v
uv run netcheck run --config example-config.json -v
```

### Operator Development

The operator uses **Poetry** for dependency management (separate from CLI).

Install operator dependencies:
```bash
cd operator
poetry install --with dev
```

Run operator tests (requires running Kubernetes cluster):
```bash
cd operator
pytest -v
```

Integration tests require:
- Kind cluster with Cilium CNI (see `.github/workflows/ci.yaml:269-282`)
- PolicyReport CRD installed
- Netcheck operator and probe images loaded into cluster

Run operator locally (outside cluster):
```bash
cd operator
poetry run kopf run netchecks_operator/main.py --liveness=http://0.0.0.0:8080/healthz
```

### Docker Build

Build probe image:
```bash
docker build -t ghcr.io/hardbyte/netchecks:main .
```

Build operator image:
```bash
docker build -t ghcr.io/hardbyte/netchecks-operator:main operator/
```

### Code Quality

Format code with ruff (CLI uses uv, operator uses poetry):
```bash
# CLI
uv run ruff format .

# Operator
cd operator
poetry run ruff format .
```

Lint with ruff:
```bash
# CLI
uv run ruff check .

# Operator
cd operator
poetry run ruff check .
```

## Testing Philosophy

- **Unit tests** in `tests/` test the CLI and library functions
- **Integration tests** in `operator/tests/` deploy NetworkAssertion resources to a real Kubernetes cluster and verify PolicyReport results
- CI runs tests on ubuntu-latest, windows-latest, and macOS-13 with Python 3.11 and 3.12

## Key Configuration Files

- `pyproject.toml` - CLI package metadata and dependencies (using uv)
- `operator/pyproject.toml` - Operator package metadata and dependencies (using poetry)
- `operator/charts/netchecks/values.yaml` - Helm chart configuration
- `operator/manifests/deploy.yaml` - Static Kubernetes manifests

## Release Process

1. Update version in `pyproject.toml` (CLI) and `operator/pyproject.toml` (operator)
2. Push to main branch
3. Create GitHub release
4. CI automatically:
- Publishes package to PyPI
- Builds and pushes Docker images to ghcr.io
- Runs integration tests with Kind + Cilium

## CEL Validation Examples

Default DNS validation rule:
```cel
data['response-code'] == 'NOERROR' &&
size(data['A']) >= 1 &&
(timestamp(data['endTimestamp']) - timestamp(data['startTimestamp']) < duration('10s'))
```

Default HTTP validation rule:
```cel
data['status-code'] in [200, 201]
```

Custom validation with JSON parsing:
```cel
parse_json(data.body).headers['X-Header'] == 'expected-value'
```

## Important Notes

- Operator uses Kopf framework for handling Kubernetes CRD lifecycle events
- Template strings in NetworkAssertion specs use `{{ variable }}` syntax and are replaced via `netcheck/context.py:replace_template()`
- Sensitive fields (headers) are redacted from output unless `--disable-redaction` flag is used
- PolicyReport CRD must be installed before operator (from wg-policy-prototypes)
- Operator metrics exposed on port 9090 (configurable) using OpenTelemetry + Prometheus
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
uv sync --frozen --no-dev --no-install-project

# Copy the application code to the build stage
ADD . /app

Check failure on line 15 in Dockerfile

View workflow job for this annotation

GitHub Actions / check-linters

DL3020 error: Use COPY instead of ADD for files and folders

RUN --mount=type=cache,target=/root/.cache \
uv sync --frozen --no-dev
Expand All @@ -24,7 +24,7 @@
# Set environment variables for Python
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
# VIRTUAL_ENV=/app/.venv \
VIRTUAL_ENV=/app/.venv \
PATH="/app/.venv/bin:$PATH" \
USERNAME=netchecks \
USER_UID=1000 \
Expand Down
15 changes: 3 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,25 +254,16 @@ Kubernetes operator to inject data.
Update version in `pyproject.toml`, push to `main` and create a release on GitHub. Pypi release will be carried
out by GitHub actions.

Install dev dependencies with Poetry:
Install dev dependencies with `uv`:

```shell
poetry install --with dev
```

### Manual Release
To release manually, use Poetry:

```shell
poetry version patch
poetry build
poetry publish
uv sync
```

### Testing

Pytest is used for testing.

```shell
poetry run pytest
uv run pytest
```
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ npm run dev

Finally, open [http://localhost:3000](http://localhost:3000) in your browser to view the website.

## Customising
## Customizing

You can start editing this template by modifying the files in the `/src` folder. The site will auto-update as you edit these files.

Expand Down
2 changes: 1 addition & 1 deletion docs/src/pages/docs/alerting.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ policy reports a failure status. The alert includes details about the failing po

## Configure Alerting in Grafana

Grafana can visualise the alerts generated by Prometheus and also send notifications through various channels such as email, Slack, or PagerDuty. Grafana alert can be configured manually via the UI, or via a configuration file.
Grafana can visualize the alerts generated by Prometheus and also send notifications through various channels such as email, Slack, or PagerDuty. Grafana alert can be configured manually via the UI, or via a configuration file.

### Configure Alerting via UI

Expand Down
2 changes: 1 addition & 1 deletion docs/src/pages/docs/dns.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ metadata:
name: cluster-dns-should-work
namespace: default
annotations:
description: Check cluster dns behaviour
description: Check cluster dns behavior
spec:
# Every 20 minutes
schedule: "*/20 * * * *"
Expand Down
2 changes: 1 addition & 1 deletion docs/src/pages/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ The `PolicyReport` contains information about the test run and the results of th

{% quick-link title="Architecture guide" icon="presets" href="/" description="Learn how the internals work and contribute." /%}

{% quick-link title="API reference" icon="theming" href="/" description="Learn to easily customise and modify your app's visual design to fit your brand." /%}
{% quick-link title="API reference" icon="theming" href="/" description="Learn to easily customize and modify your app's visual design to fit your brand." /%}

{% quick-link title="Examples" icon="plugins" href="/" description="See how others are using the library in their projects." /%}

Expand Down
29 changes: 26 additions & 3 deletions netcheck/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,14 +80,30 @@ def __init__(self, directory, *args, **kwargs):
self.directory = directory
super().__init__(*args, **kwargs)
# Pre-populate the dictionary with keys for each file in the directory
# Skip Kubernetes ConfigMap metadata files (symlinks starting with '..')
for filename in os.listdir(directory):
# Skip hidden files and Kubernetes ConfigMap symlinks (..data, ..2025_*, etc.)
if filename.startswith('.'):
continue
# We'll use None as a placeholder for the file contents
# We could strip filename extensions, but I think it is clearer not to
# os.path.splitext(filename)[0]
self[filename] = None

def __getitem__(self, key):
# Prevent path traversal attacks by checking if key contains path separators
if os.path.sep in key or (os.altsep and os.altsep in key) or key.startswith('.'):
raise KeyError(f"Invalid key: {key}. Path separators and relative paths are not allowed.")

filepath = os.path.join(self.directory, key)

# Additional safety check: ensure the resolved path is within the directory
try:
filepath = os.path.realpath(filepath)
directory = os.path.realpath(self.directory)
if not filepath.startswith(directory + os.path.sep):
raise KeyError(f"Path traversal detected: {key}")
except (OSError, ValueError) as e:
raise KeyError(f"Invalid path: {key}") from e

if super().__getitem__(key) is None and os.path.isfile(filepath):
# If the value is None (our placeholder), replace it with the actual file contents
with open(filepath, "rt") as f:
Expand All @@ -96,5 +112,12 @@ def __getitem__(self, key):

def items(self):
# Override items() to call __getitem__ for each key
# Required because CEL calls items() when converting to CEL Map type.
return [(key, self[key]) for key in self]

def materialize(self):
"""
Force load all lazy-loaded file contents and return a regular dict.
This is needed for compatibility with the Rust CEL library which doesn't
properly handle dict subclasses.
"""
return {key: self[key] for key in self}
8 changes: 5 additions & 3 deletions netcheck/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,11 @@ def run_from_config(
inline_context = replace_template(inline_context, context)
context[c["name"]] = inline_context
elif c["type"] == "directory":
# Return a Dict like object that lazy loads individual files
# from the directory (with caching) and add them to the context
context[c["name"]] = LazyFileLoadingDict(c["path"])
# Load a LazyFileLoadingDict and immediately materialize it to a regular dict
# The Rust CEL library doesn't properly handle dict subclasses, so we need
# to convert it to a regular dict before using it in CEL expressions
lazy_dict = LazyFileLoadingDict(c["path"])
context[c["name"]] = lazy_dict.materialize()
else:
logger.warning(f"Unknown context type '{c['type']}'")

Expand Down
53 changes: 27 additions & 26 deletions netcheck/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
import json
import logging
from typing import Dict

import celpy
import yaml
from celpy import CELParseError, CELEvalError, json_to_cel

from cel import cel


logger = logging.getLogger("netcheck.validation")

Expand All @@ -31,35 +31,36 @@ def evaluate_cel_with_context(context: Dict, validation_rule: str):
Raises:
ValueError: If the CEL expression is invalid.
"""

env = celpy.Environment()

# Validate the CEL validation rule and compile to ast
try:
ast = env.compile(validation_rule)
except CELParseError:
print("Invalid CEL expression. Treating as error.")
raise ValueError("Invalid CEL expression")

# create the CEL program
functions = {
"parse_json": lambda s: json_to_cel(json.loads(s)),
"parse_yaml": lambda s: json_to_cel(yaml.safe_load(s)),
"parse_json": lambda s: json.loads(s),
"parse_yaml": lambda s: yaml.safe_load(s),
"b64decode": lambda s: base64.b64decode(s).decode("utf-8"),
"b64encode": lambda s: base64.b64encode(s.encode()).decode(),
}
prgm = env.program(ast, functions=functions)

# Set up the context
activation = celpy.json_to_cel(context)
env = cel.Context(
variables=context,
functions=functions,
)

# Evaluate the CEL expression
try:
context = prgm.evaluate(activation)
except CELEvalError:
# Note this can fail if the context is missing a key e.g. the probe
# failed to return a value for a key that the validation rule expects

result = cel.evaluate(validation_rule, env)
except ValueError as e:
error_msg = str(e)
# Distinguish between parse errors (config bugs) and execution errors (validation failures)
if "Failed to parse" in error_msg:
# Parse/syntax errors indicate invalid CEL configuration - raise to surface
logger.error(f"Invalid CEL expression syntax: {e}")
raise ValueError(f"Invalid CEL expression: {e}") from e
else:
# Execution errors (type mismatches, etc.) indicate validation failure
# These can happen with valid expressions that fail at runtime
logger.debug(f"CEL execution failed: {e}")
return False
except RuntimeError as e:
# Runtime errors (undefined variables) indicate validation failure
# This can happen if the probe failed to return expected values
logger.debug(f"CEL evaluation failed: {e}")
return False

return context
return result
Loading
Loading