Skip to content

Conversation

7ing
Copy link
Contributor

@7ing 7ing commented Nov 22, 2024

certmanager_csi_certificate_request_expiration_timestamp_seconds certmanager_csi_certificate_request_ready_status
certmanager_csi_certificate_request_renewal_timestamp_seconds certmanager_csi_driver_issue_call_count_total
certmanager_csi_driver_issue_error_count_total
certmanager_csi_managed_certificate_count_total
certmanager_csi_managed_volume_count_total

fixes: #60

@cert-manager-prow cert-manager-prow bot added the dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. label Nov 22, 2024
@cert-manager-prow
Copy link
Contributor

Hi @7ing. Thanks for your PR.

I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@cert-manager-prow cert-manager-prow bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 22, 2024
@erikgb
Copy link
Member

erikgb commented Nov 22, 2024

/ok-to-test

@cert-manager-prow cert-manager-prow bot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 22, 2024
Copy link
Member

@munnerz munnerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks Jing 🙌 the integration and unit tests here make it far easier to review confidently!

My main questions/concerns are around the construction logic in the metrics subpackage, which I think we need to decouple from net.Listener (and allow more flexibility for projects that already have their own prometheus.Registry they'd like to re-use).

}

// NewServer registers Prometheus metrics and returns a new Prometheus metrics HTTP server.
func (m *Metrics) NewServer(ln net.Listener) *http.Server {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't called outside of test cases, and I guess that is by design as it is expected that the corresponding implementation should call NewServer on metrics.Metrics to register their Listener.

Could you possibly update the example/ implementation in the root of this repository to demonstrate how to actually add the /metrics endpoint? I also wonder if the Managed should be extended to be able to auto-serve this endpoint in cases where a user doesn't need to provide their own listener (but does want metrics to be served).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we renamed this to Register(*prometheus.Registry) rather than tying the net.Listener logic into the registration?

We can always have/find some kind of csihelpers.Handle(*http.Server, *prometheus.Registry) function elsewhere then, which needn't be opinionated about csi-lib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you possibly update the example/ implementation in the root of this repository to demonstrate how to actually add the /metrics endpoint?

Yes, it is there. And modified accordingly based on recent changes.
https://github.com/7ing/csi-lib/blob/abf15631238fa809d10d9c206178c414d9495b4f/test/integration/metrics_test.go#L83-L95

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions. I have changed to use a DefaultHandler instead, which could be served as a reference implementation for http handler.

@7ing
Copy link
Contributor Author

7ing commented Feb 5, 2025

@munnerz Thank you for your valuable inputs. Sorry took so long to make the change. But I guess this version addressed most of your concerns.

@cert-manager-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joshvanl for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 30, 2025
@wallrj wallrj self-requested a review June 6, 2025 20:26

// Should expose that CertificateRequest as ready with expiry and renewal time
// node="f56fd9f8b" is the hash value of "test-node" defined in driver_testing.go
expectedOutputTemplate := `# HELP certmanager_csi_certificate_request_expiration_timestamp_seconds The date after which the certificate request expires. Expressed as a Unix Epoch Time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will certmanager_csi_... be the prefix for the metrics when exposed in https://github.com/cert-manager/csi-driver?
Can we make sure they are all using similar names? Does this happen automatically?
Could you provide an example of before and after for https://github.com/cert-manager/csi-driver with these changes applied?

Copy link
Contributor Author

@7ing 7ing Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @inteon for your review.

Yes, certmanager_csi_... will be the prefix for the metrics when in the csi-driver. It is part of the definition from:
https://github.com/7ing/csi-lib/blob/b9186bad5b6f9af9bc93dbd28571d2d9219700e6/metrics/metrics.go#L27-L31

We choose this name based on cert-manager.io definition: https://github.com/cert-manager/cert-manager/blob/5e09ef6c0552df0bde64746c735cb1ff324b6261/pkg/metrics/metrics.go#L44
All cert-manager controllers have certmanager_.. prefix.

Our https://github.com/cert-manager/csi-driver does not have any certmanager related metrics. Currently it only serves k8s components metrics, like cpu / mem etc. That's why this PR exist. This test files show the expected output regarding certmanager metrics (besides the k8s metrics upon driver configuration).

@wallrj wallrj removed their request for review June 20, 2025 09:58
@erikgb
Copy link
Member

erikgb commented Aug 29, 2025

@7ing We have made some major dependency upgrades in this module. Are you able to rebase your PR, preparing for another round of review? Sorry for the inconvenience and for the delays in review. 😒

@7ing
Copy link
Contributor Author

7ing commented Aug 29, 2025

/retest

@7ing
Copy link
Contributor Author

7ing commented Aug 29, 2025

@erikgb done with rebase

Copy link
Member

@erikgb erikgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff! Thanks @7ing! I did my first pass on this PR now, and I think I would prefer using a Prometheus collector to avoid the add/remove metrics for metrics based on API resources. Please take a look and let me know what you think! It's not a blocker, but a pattern we are adopting in cert-manager projects nowadays.

@7ing 7ing requested review from erikgb, hjoshi123 and inteon September 12, 2025 18:42
@7ing
Copy link
Contributor Author

7ing commented Sep 12, 2025

Updated the certificate requests' metrics implementation with collector pattern, to align with cert-manager.io project.

Basically the collector will periodically scan the certificaterequest sharedinformer (for expiry) and host metadata files (for renewal time), then record the metrics.

@erikgb erikgb requested a review from Copilot September 12, 2025 18:51
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Prometheus metrics support to the CSI library to expose operational metrics about certificate management. The implementation includes metrics for certificate request expiration timestamps, ready status, renewal times, driver issue call counts, and managed volume/certificate counts.

Key changes:

  • Added comprehensive metrics collection system with Prometheus integration
  • Implemented certificate request collector for tracking certificate lifecycle metrics
  • Added metrics tracking to the manager for monitoring issue calls and errors
  • Created extensive test coverage for metrics functionality

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/util/testutil.go Exported function to support metrics testing
test/integration/metrics_test.go Added comprehensive integration test for metrics server functionality
test/driver/driver_testing.go Added metrics support to test driver infrastructure
metrics/metrics.go Core metrics implementation with Prometheus registry and HTTP handler
metrics/certificaterequest_test.go Unit tests for certificate request metrics collection
metrics/certificaterequest_collector.go Prometheus collector for certificate request lifecycle metrics
manager/manager.go Integrated metrics tracking into certificate issuance operations
go.mod Added prometheus client dependency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@erikgb
Copy link
Member

erikgb commented Sep 12, 2025

This looks really great, @7ing! Since the PR is large, I will need some time to review it thoroughly. I've already asked @hjoshi123 for some help in review. I would also appreciate it if @munnerz could take another look. Most maintainers from CyberArk are busy with other stuff right now, so our capacity to review PRs is rather low.

@hjoshi123
Copy link

Yes @erikgb. Set myself a reminder to review it today.

}

// New creates a Metrics struct and populates it with prometheus metric types.
func New(logger *logr.Logger, registry *prometheus.Registry) *Metrics {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand the need for the registry to come from the user but shouldnt we call this New function to set up the metrics? Also we need to call this function SetupCertificateRequestCollector too otherwise the collector would never work.. I am a bit confused now.. @erikgb if I understand correctly the proposal was to take the registry as an input right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also a bit confused now. It would have been interesting to see how this would actually be used in a real application. Are you planning to plug in this new feature into https://github.com/cert-manager/csi-driver, or into some other closed-source Apple csi-driver, @7ing? Maybe this will become clearer if the API metrics, obtained through the collector, are more clearly separated from the "normal" code metrics? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a sample implementation to simple-csi example. The idea is to support cert-manager/csi-driver project.
Yup, we could make SetupCertificateRequestCollector as part of New function if required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example, I create the registry and start the metrics server in the same function. It can be separated and provided by the user as input as well. Really depends on how the driver implementation specifies.
@erikgb @hjoshi123 hope this version clarifies your questions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I was looking at the example @7ing.. are we saying SetupCertificateRequestCollector this function is optional.. meaning that those metrics are optional to be exposed? Thought being what happens if the user forgets to call the function? I understand the need to not create new registry and handler since that should come from the user but I do feel the collector should be registered if the metrics server is running

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjoshi123 I have removed the SetupCertificateRequestCollector function. So user need to call New(*) func to initialize all metrics data.

@cert-manager-prow cert-manager-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 16, 2025
Following metrics added:
  certmanager_csi_certificate_request_expiration_timestamp_seconds
  certmanager_csi_certificate_request_ready_status
  certmanager_csi_certificate_request_renewal_timestamp_seconds
  certmanager_csi_driver_issue_call_count
  certmanager_csi_driver_issue_error_count
  certmanager_csi_managed_certificate_count
  certmanager_csi_managed_volume_count

fixes: cert-manager#60
Signed-off-by: Jing Liu <[email protected]>
@cert-manager-prow cert-manager-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 16, 2025
@7ing
Copy link
Contributor Author

7ing commented Sep 16, 2025

/retest-required

@7ing 7ing requested a review from hjoshi123 September 17, 2025 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. ok-to-test size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support prometheus metrics
5 participants