This repository contains automated scripts for deploying Cortex (Prometheus long-term storage) on Google Cloud Platform (GCP) using Kubernetes clusters. The setup includes both main cluster (Cortex server) and worker clusters (Prometheus agents) configurations.
GOOGLE_GCS/
├── main_cluster/
│ ├── cortexMainCluster.sh # Main deployment script
│ ├── env_variables.sh # Environment configuration
│ ├── helper.sh # Helper functions and YAML generation
│ └── key.json # GCP service account key
└── worker_clusters/
└── workerCluster.sh # Worker cluster setup script
- Automated Cortex Deployment: Complete setup of Cortex with GCS backend storage
- Multi-cluster Support: Main cluster for Cortex and worker clusters for Prometheus agents
- GCP Integration: Uses Google Cloud Storage for blocks, rules, and alertmanager storage
- NGINX Authentication: Built-in basic authentication for secure access
- Prometheus Integration: Automatic Prometheus setup with remote write to Cortex
- Consul Service Discovery: Integrated Consul for service discovery
- Monitoring: ServiceMonitor configurations for complete observability
- kubectl - Kubernetes command-line tool
- helm - Kubernetes package manager
- kind - Kubernetes in Docker (for local development)
- Docker - Container runtime
- Google Cloud Platform account with billing enabled
- Service account with appropriate permissions
- Three GCS buckets for storage:
cortex-blocks-dev
(blocks storage)cortex-alert-rules-dev
(ruler and alertmanager storage)
Your service account needs the following roles:
Storage Admin
orStorage Object Admin
Kubernetes Engine Admin
(if using GKE)
- Create a service account in Google Cloud Console
- Download the JSON key file
- Replace the content in
main_cluster/key.json
with your service account key
Current Configuration:
- Project ID:
openuser-devops-labs
- Service Account:
[email protected]
Edit main_cluster/env_variables.sh
:
# Namespace for Cortex deployment
NS=cortex
# GCP bucket names (update with your bucket names)
GCP_Blocks_BucketName=your-cortex-blocks-bucket
GCP_Ruler_BucketName=your-cortex-rules-bucket
GCP_Alert_BucketName=your-cortex-alert-bucket
# Basic auth credentials (base64 encoded)
prometheus_users=your-base64-username
prometheus_password=your-base64-password
# NGINX users (generate from https://wtools.io/generate-htpasswd-online)
keyValuePairs["username"]="$apr1$hash$encryptedpassword"
Default Values:
- Namespace:
cortex
- Default buckets:
cortex-blocks-dev
,cortex-alert-rules-dev
- Default credentials:
openuser
(base64:YmVycnlieXRlcw==
)
cd main_cluster/
chmod +x cortexMainCluster.sh
./cortexMainCluster.sh
The script will prompt you to choose between:
- Y: Local deployment (uses Kind cluster named
openuser
) - N: Server deployment (uses existing Kubernetes cluster)
Main Cluster Components Deployed:
- Cortex: Main long-term storage system with GCS backend
- Consul: Service discovery (HashiCorp Consul)
- NGINX: Load balancer with basic authentication
- Prometheus: Monitoring with kube-prometheus-stack
cd worker_clusters/
chmod +x workerCluster.sh
./workerCluster.sh
The script will:
- Create a Kind cluster named
monitoring
- Install Prometheus server using Helm
- Prompt for:
- Cortex-nginx IP or Domain (e.g.,
http://192.168.1.100
orhttp://cortex.example.com
) - Basic auth username and password
- Cortex-nginx IP or Domain (e.g.,
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ NGINX Proxy │────│ Cortex Stack │────│ GCS Storage │
│ (LoadBalancer) │ │ (Namespace: │ │ - Blocks │
│ Port: 80 │ │ cortex) │ │ - Rules │
│ Basic Auth │ │ │ │ - Alerts │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
│ ┌─────────────────┐
│ │ Consul Server │
│ │ Service Disc. │
│ │ Port: 8500 │
└──────────────┴─────────────────┘
The system uses GCS storage with the following structure:
- Blocks Storage:
cortex-blocks-dev
- Time series data chunks - Ruler Storage:
cortex-alert-rules-dev
- Recording and alerting rules - Alertmanager Storage:
cortex-alert-rules-dev
- Alert configuration and state
NGINX Basic Auth:
- Default user:
openuser
- Password hash generated via: https://wtools.io/generate-htpasswd-online
- Header injection:
X-Scope-OrgID: $remote_user
Prometheus Remote Write:
- Endpoint:
http://cortex-nginx.cortex/api/prom/push
- Uses basic auth with base64 encoded credentials
- Secret name:
openuser-secrets
# Get all services in cortex namespace
kubectl get services -n cortex
# Check NGINX service (LoadBalancer)
kubectl get service cortex-nginx -n cortex
# Check Prometheus service (LoadBalancer on port 9092)
kubectl get service prometheus-app
# Access Cortex API with authentication
curl -u openuser:openuser http://<nginx-ip>/api/prom/push
- NGINX: Port 80 (LoadBalancer with basic auth)
- Prometheus: Port 9092 (LoadBalancer)
- Consul: Port 8500 (Internal service discovery)
- Cortex Components: Various internal ports with ServiceMonitors
# Check worker cluster services
kubectl config use-context kind-monitoring
kubectl get services
# Prometheus server endpoint
kubectl get service prometheus-server
All Cortex components include ServiceMonitor with label release: prom
:
- Distributor
- Ingester
- Querier
- Query Frontend
- Ruler
- Alertmanager
- Store Gateway
- Compactor
- NGINX
- Main Cluster: Uses kube-prometheus-stack
- Worker Clusters: Uses prometheus-community/prometheus
- Remote Write: Configured to send metrics to Cortex
- Retention: 10 days (configurable)
-
Kind Cluster Issues
# Delete and recreate Kind cluster kind delete cluster --name openuser kind create cluster --name openuser # For worker clusters kind delete cluster --name monitoring kind create cluster --name monitoring
-
GCS Permission Issues
# Verify service account key kubectl get secret gcp-sa -n cortex -o yaml # Check pod logs for GCS errors kubectl logs -n cortex deployment/cortex-distributor
-
Authentication Issues
# Check NGINX secret kubectl get secret nginx-user-secrets -n cortex -o yaml # Test basic auth echo -n "openuser:openuser" | base64
-
Consul Connectivity
# Check Consul leader kubectl exec -n cortex deployment/consul-server -- consul members # Test Consul API kubectl port-forward -n cortex service/consul-server 8500:8500 curl http://localhost:8500/v1/status/leader
# Check all pods in cortex namespace
kubectl get pods -n cortex
# Describe problematic pods
kubectl describe pod <pod-name> -n cortex
# Check logs
kubectl logs -n cortex <pod-name> -f
# Check ConfigMaps
kubectl get configmap -n cortex cortex -o yaml
# Verify secrets
kubectl get secrets -n cortex
# 1. Check all services are running
kubectl get pods -n cortex
kubectl get services -n cortex
# 2. Test NGINX endpoint
NGINX_IP=$(kubectl get service cortex-nginx -n cortex -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl -u openuser:openuser http://$NGINX_IP/ready
# 3. Test Prometheus access
PROM_IP=$(kubectl get service prometheus-app -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl http://$PROM_IP:9092/api/v1/query?query=up
# Switch to worker cluster context
kubectl config use-context kind-monitoring
# Check Prometheus is sending data
kubectl logs deployment/prometheus-server | grep "remote_write"
- Generate htpasswd entry: https://wtools.io/generate-htpasswd-online
- Add to
keyValuePairs
inenv_variables.sh
:keyValuePairs["newuser"]="$apr1$generated$hash"
- Re-run the deployment
Update the bucket names in env_variables.sh
:
GCP_Blocks_BucketName=your-new-blocks-bucket
GCP_Ruler_BucketName=your-new-rules-bucket
GCP_Alert_BucketName=your-new-alerts-bucket
Modify the cortex-values.yaml
generation in helper.sh
to add replicas and resources:
ingester:
replicas: 3
resources:
requests:
memory: "2Gi"
cpu: "1000m"
cortexMainCluster.sh
: Main deployment script that orchestrates the entire setupenv_variables.sh
: Contains all configuration variables and user credentialshelper.sh
: Generates Kubernetes YAML files and applies configurationskey.json
: GCP service account credentials
workerCluster.sh
: Sets up worker clusters with Prometheus remote write to Cortex
- Monitor GCS Usage: Check bucket sizes and costs regularly
- Update Helm Charts: Keep Cortex and Prometheus charts updated
- Rotate Credentials: Update service account keys and passwords periodically
- Check Logs: Monitor component logs for errors
# Backup configurations
cp -r main_cluster/ backup/main_cluster_$(date +%Y%m%d)/
# Export Kubernetes resources
kubectl get all -n cortex -o yaml > cortex-backup.yaml
# Delete Kind clusters
kind delete cluster --name openuser
kind delete cluster --name monitoring
# Delete Kubernetes resources
kubectl delete namespace cortex
helm uninstall cortex consul stable -n cortex
- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature
- Test thoroughly in a development environment
- Submit a pull request with detailed description
- Service Account Key: Never commit actual GCP service account keys to version control
- Passwords: Use strong passwords and rotate them regularly
- Network Security: Consider using private clusters and VPC firewalls
- RBAC: Review and minimize Kubernetes permissions
This project is licensed under [Your License] - see the LICENSE file for details.
For issues and questions:
- Create an issue in the repository
- Check logs using the troubleshooting commands above
- Review GCP and Kubernetes documentation