Skip to content

[CI] Remove need for operational metrics persistent storage #501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jriv01
Copy link
Contributor

@jriv01 jriv01 commented Jul 15, 2025

Originally, the script for scraping LLVM commit info required persistent storage to keep track of the last commit we've seen. This was to prevent any overlap in our scrapes and avoid processing the same commits multiple times.

However, currently the script will only ever scrape a full day at a time, so there should never be any overlap. We no longer need persistent storage to maintain the most recently viewed commits since we don't expect the CronJob to be executed twice in the same day.

@jriv01
Copy link
Contributor Author

jriv01 commented Jul 15, 2025

@boomanaiden154

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look reasonable enough to me.

I'm not super confident in the read_past_commits implementation though. The commit date I'm assuming is the date set by the commit when you run git commit or when you manually override the date. I don't think that would work for this kind of query.

But it seems like you have done some testing and this works as expected?

@jriv01
Copy link
Contributor Author

jriv01 commented Jul 18, 2025

The changes look reasonable enough to me.

I'm not super confident in the read_past_commits implementation though. The commit date I'm assuming is the date set by the commit when you run git commit or when you manually override the date. I don't think that would work for this kind of query.

But it seems like you have done some testing and this works as expected?

Thanks for pointing that out, after looking into it seems like this is the date git commit was ran or when the commit was merged/rebased.

We might miss some commits when the commit date is set to a date the CronJob has already scraped, but I think that's okay for the purposes of this dashboard. From a quick check of recent history (since 01-01-2022), only about 1.4% of commits aren't chronologically consistent with their position in the branch's history.

To get the amount of pushes made on a particular day, we'd probably have to use the GitHub Events API which is something we want to lean away from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants