Skip to content

[CI] Add Terraform resources for daily CronJob that processes LLVM commits #495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions premerge/gke_cluster/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ resource "google_container_cluster" "llvm_premerge" {
# for adding windows nodes to the cluster.
networking_mode = "VPC_NATIVE"
ip_allocation_policy {}

workload_identity_config {
workload_pool = "llvm-premerge-checks.svc.id.goog"
}
}

resource "google_container_node_pool" "llvm_premerge_linux_service" {
Expand All @@ -23,6 +27,10 @@ resource "google_container_node_pool" "llvm_premerge_linux_service" {

node_config {
machine_type = "e2-highcpu-4"

workload_metadata_config {
mode = "GKE_METADATA"
}
# Terraform wants to recreate the node pool everytime whe running
# terraform apply unless we explicitly set this.
# TODO(boomanaiden154): Look into why terraform is doing this so we do
Expand Down
81 changes: 81 additions & 0 deletions premerge/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -190,3 +190,84 @@ resource "kubernetes_manifest" "metrics_deployment" {

depends_on = [kubernetes_namespace.metrics, kubernetes_secret.metrics_secrets]
}

# Resources for collecting LLVM operational metrics data

# Service accounts and bindings to grant access to the
# BigQuery API for our cronjob
resource "google_service_account" "operational_metrics_gsa" {
account_id = "operational-metrics-gsa"
display_name = "Operational Metrics GSA"
}

resource "google_project_iam_binding" "bigquery_jobuser_binding" {
project = google_service_account.operational_metrics_gsa.project
role = "roles/bigquery.jobUser"

members = [
"serviceAccount:${google_service_account.operational_metrics_gsa.email}",
]

depends_on = [google_service_account.operational_metrics_gsa]
}

resource "kubernetes_namespace" "operational_metrics" {
metadata {
name = "operational-metrics"
}
provider = kubernetes.llvm-premerge-us-central
}

resource "kubernetes_service_account" "operational_metrics_ksa" {
metadata {
name = "operational-metrics-ksa"
namespace = "operational-metrics"
annotations = {
"iam.gke.io/gcp-service-account" = google_service_account.operational_metrics_gsa.email
}
}

depends_on = [kubernetes_namespace.operational_metrics]
}

resource "google_service_account_iam_binding" "workload_identity_binding" {
service_account_id = google_service_account.operational_metrics_gsa.name
role = "roles/iam.workloadIdentityUser"

members = [
"serviceAccount:${google_service_account.operational_metrics_gsa.project}.svc.id.goog[operational-metrics/operational-metrics-ksa]",
]

depends_on = [
google_service_account.operational_metrics_gsa,
kubernetes_service_account.operational_metrics_ksa,
]
}

resource "kubernetes_secret" "operational_metrics_secrets" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a separate Github token instead of reusing one of the existing ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same Github token, just under a separate secrets object to keep separation between the premerge metrics and operational metrics

Although I'm not opposed to scrapping this and just reusing the metrics secrets if that's more appropriate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this creates any tangible separation if they're the same token. You should reuse the metrics container secret, but probably rename the kubernetes_secret object and maybe the underlying GCP object. You'll need to use a terraform moved block (https://developer.hashicorp.com/terraform/language/modules/develop/refactoring#moved-block-syntax) so that TF doesn't try to delete and recreate everything.

Copy link
Contributor Author

@jriv01 jriv01 Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this and since k8s secrets are namespace scoped, the premerge metrics deployment and the operational metrics cronjob would need to share the same namespace. Is that a change we want to make? (i.e just putting everything under metrics)

Both metrics containers are relatively similar in purpose / usage regardless but I was under the assumption we'd like that separation wherever possible.

metadata {
name = "operational-metrics-secrets"
namespace = "operational-metrics"
}

data = {
"github-token" = data.google_secret_manager_secret_version.metrics_github_pat.secret_data
"grafana-api-key" = data.google_secret_manager_secret_version.metrics_grafana_api_key.secret_data
"grafana-metrics-userid" = data.google_secret_manager_secret_version.metrics_grafana_metrics_userid.secret_data
}

type = "Opaque"
provider = kubernetes.llvm-premerge-us-central
depends_on = [kubernetes_namespace.operational_metrics]
}

resource "kubernetes_manifest" "operational_metrics_cronjob" {
manifest = yamldecode(file("operational_metrics_cronjob.yaml"))
provider = kubernetes.llvm-premerge-us-central

depends_on = [
kubernetes_namespace.operational_metrics,
kubernetes_secret.operational_metrics_secrets,
kubernetes_service_account.operational_metrics_ksa,
]
}
45 changes: 45 additions & 0 deletions premerge/operational_metrics_cronjob.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# operational_metrics_cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: operational-metrics-cronjob
namespace: operational-metrics
spec:
# Midnight PDT
schedule: "0 7 * * *"
timeZone: "Etc/UTC"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
serviceAccountName: operational-metrics-ksa
nodeSelector:
iam.gke.io/gke-metadata-server-enabled: "true"
containers:
- name: process-llvm-commits
image: ghcr.io/llvm/operations-metrics:latest
env:
- name: GITHUB_TOKEN
valueFrom:
secretKeyRef:
name: operational-metrics-secrets
key: github-token
- name: GRAFANA_API_KEY
valueFrom:
secretKeyRef:
name: operational-metrics-secrets
key: grafana-api-key
- name: GRAFANA_METRICS_USERID
valueFrom:
secretKeyRef:
name: operational-metrics-secrets
key: grafana-metrics-userid
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
restartPolicy: OnFailure
Loading