pkg-auto: Parallelize generating SDK reports and package updates handling. #3210

krnowak · 2025-08-21T13:16:46Z

The automation generates reports using emerge in two separate SDK containers - one with old packages and one with new packages. Both jobs create their reports in separate directories, so if we take care of printing messages to the terminal without producing garbled text and with being able to discern which job produced the terminal output, then there is nothing else that would prevent to run them in parallel.

Parallelizing handling of package updates was a bit more involved:

We don't want to spawn 400+ processes, so each one processes one package. But rather spawn a small number of processes and tell them to process packages in batches.
These jobs were writing information into the same file (like summary and changelog stubs), so we give each job its own directory to write to, and after all packages were processed, we merge the files into one.

In order to get the last point, I needed to refactor the some of the code to take an output directory path instead of hardcoding it to some subdirectory of ${REPORTS_DIR} and to split off the code that handles the package update as this code would be running inside the job instead of the main process.

I think it's best to review the PR commit by commit.

The library will be used for running emerge report and package update report generation in separate processes to make them faster. I initially wanted to use the relatively unknown feature of bash named coprocs, but it was an unfinished feature as of bash 5.2, so I decided to write my own then. The library is rather basic - allows to fork a subprocess that will run some bash function, communicate with it using subprocesses' standard input/output, and reap the subprocess. Signed-off-by: Krzesimir Nowak <[email protected]>

We can run report generation for old and new in parallel in two separate processes. Ought to be a bit less of wait. This is more or less straightforward parallelization, since there are only two jobs running. The only thing that needs taking care of is forwarding job's output to the terminal and handling job failures. Signed-off-by: Krzesimir Nowak <[email protected]>

This will come in handy for spawning jobs for handling package updates. Since we don't want to spawn as many jobs as there are packages, then limiting ourselves to the job count matching the processor or core count sounds like a better idea. Signed-off-by: Krzesimir Nowak <[email protected]>

The slots were only used to repeatedly generate the same path to a directory where the package ebuild diff is saved. So instead, generate the output paths somewhere in outer scope, put them into a struct and pass that around. That means that: - We pass one parameter less (a name of a struct instead of two slots). - We can make it easier to change the output directory later (changing it in a function like update_dir or update_dir_non_slot may affect locations we didn't want to change, whereas changing the value in struct scopes the affected areas). This will come in handy later, when we put package update handling into jobs, where each job will have its own output directory. This does not remove the repeated generation of the paths, but it is a first step. Signed-off-by: Krzesimir Nowak <[email protected]>

…dling This is a step towards using different output directory in package handling. This will be needed for the eventual package handling jobs system, where each job has its own output directory. Signed-off-by: Krzesimir Nowak <[email protected]>

This is a step towards using different output directory in package handling. This will be needed for the eventual package handling jobs system, where each job has its own output directory. Signed-off-by: Krzesimir Nowak <[email protected]>

…ling This is a step towards using different output directory in package handling. This will be needed for the eventual package handling jobs system, where each job has its own output directory. Signed-off-by: Krzesimir Nowak <[email protected]>

This is a step towards using different output directory in package handling. This will be needed for the eventual package handling jobs system, where each job has its own output directory. Signed-off-by: Krzesimir Nowak <[email protected]>

This is a continuation of passing the explicit location of an output directory instead of hardcoding `${REPORTS_DIR}`. Signed-off-by: Krzesimir Nowak <[email protected]>

These functions were either inlined in those few (one?) place they were used or just replaced. Signed-off-by: Krzesimir Nowak <[email protected]>

The purpose of this struct is to collect all the information that is needed for handling package updates in one place. It is not really used right now, but when the package handling is split off into a separate function, it will come in handy as we can then pass a couple of parameters to the new function instead of many. Also, in future the struct will grow, when we add ignoring irrelevant information in summary stubs or license filtering. Signed-off-by: Krzesimir Nowak <[email protected]>

There is no functional change, other than the fact that the new function now uses the bunch of maps to access some package information. The split off inches us closer towards running the package handling in multiple jobs. Signed-off-by: Krzesimir Nowak <[email protected]>

This is to fill the silent moment between report generation in SDKs and the beginning of package updates handling. Also adds missing info about handling non-package updates. Signed-off-by: Krzesimir Nowak <[email protected]>

This spawns some jobs, where each is waiting for messages from main process. The message can be either a number followed by the number of packages to handle (a batch) or command to shut down when there is no more packages left to process. On the other hand, job can send a message to the main process that it is done with the batch and is ready for the next one. Any other message is printed on the terminal by the main process. After the packages are processed, the main process will collect and merge the job reports into the main one. Signed-off-by: Krzesimir Nowak <[email protected]>

After the split off and adding jobs, the comment was bit outdated and out of place, but still useful enough to keep it, but reword it and move into a more relevant place. Signed-off-by: Krzesimir Nowak <[email protected]>

Signed-off-by: Krzesimir Nowak <[email protected]>

Mostly to avoid repeating variable names when declaring them and initializing them. Signed-off-by: Krzesimir Nowak <[email protected]>

krnowak · 2025-08-27T14:02:42Z

Rebased, mostly for adding DCO to commits.

github-actions · 2025-08-27T14:02:50Z

Build action triggered: https://github.com/flatcar/scripts/actions/runs/17269040020

krnowak requested a review from a team as a code owner August 21, 2025 13:16

krnowak had a problem deploying to development August 21, 2025 13:16 — with GitHub Actions Error

krnowak added the main label Aug 21, 2025

krnowak added this to Flatcar tactical, release planning, and roadmap Aug 21, 2025

krnowak moved this to ✅ Testing / in Review in Flatcar tactical, release planning, and roadmap Aug 21, 2025

github-actions bot mentioned this pull request Aug 22, 2025

Monthly contributions report 2025-07-22 - 2025-08-21 flatcar/Flatcar#1863

Closed

krnowak added 21 commits August 27, 2025 16:01

pkg-auto: Diff directories based on updates directory

832d405

This is a continuation of passing the explicit location of an output directory instead of hardcoding `${REPORTS_DIR}`. Signed-off-by: Krzesimir Nowak <[email protected]>

pkg-auto: Drop unused functions

c4cd493

These functions were either inlined in those few (one?) place they were used or just replaced. Signed-off-by: Krzesimir Nowak <[email protected]>

pkg-auto: More printing of current actions

1c3ce92

This is to fill the silent moment between report generation in SDKs and the beginning of package updates handling. Also adds missing info about handling non-package updates. Signed-off-by: Krzesimir Nowak <[email protected]>

pkg-auto: Move and reword a comment

93e5441

After the split off and adding jobs, the comment was bit outdated and out of place, but still useful enough to keep it, but reword it and move into a more relevant place. Signed-off-by: Krzesimir Nowak <[email protected]>

pkg-auto: Update docs of package handling functions

59c632d

Signed-off-by: Krzesimir Nowak <[email protected]>

pkg-auto: Code style changes

7fdaa4e

Mostly to avoid repeating variable names when declaring them and initializing them. Signed-off-by: Krzesimir Nowak <[email protected]>

krnowak force-pushed the krnowak/pkg-auto-jobs branch from 647f115 to 7fdaa4e Compare August 27, 2025 14:02

krnowak had a problem deploying to development August 27, 2025 14:02 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pkg-auto: Parallelize generating SDK reports and package updates handling. #3210

pkg-auto: Parallelize generating SDK reports and package updates handling. #3210

Uh oh!

krnowak commented Aug 21, 2025

Uh oh!

krnowak commented Aug 27, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Uh oh!

pkg-auto: Parallelize generating SDK reports and package updates handling. #3210

Are you sure you want to change the base?

pkg-auto: Parallelize generating SDK reports and package updates handling. #3210

Uh oh!

Conversation

krnowak commented Aug 21, 2025

Uh oh!

krnowak commented Aug 27, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Uh oh!