Skip to content

Conversation

@cgwalters
Copy link
Collaborator

@cgwalters cgwalters commented Nov 19, 2025

See individual patches.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two valuable cleanups. First, it refactors the tmt test execution to directly discover and run tests instead of plans, which simplifies the configuration and removes duplication. This is a great improvement for maintainability. Second, it adds a command to show the tmt report on failure, which will significantly aid in debugging failing tests. I've noted a few minor inconsistencies in the updated logging and error messages where 'plan' was not fully replaced with 'test'. Please see my specific comments for details.

@henrywang
Copy link
Collaborator

The failure comes from test-26-examples-build in aarch64. This test has not been included into tmt plan before, so it only runs on github runner, but not on packit. I'm working on it to fix the issue.

@henrywang henrywang force-pushed the tmt-discover branch 2 times, most recently from 9b18985 to 03ff8a8 Compare November 20, 2025 14:42
@jeckersb
Copy link
Collaborator

xtask: Run tmt report on failure

Why tmt doesn't do this by default, I have no idea. It's also baffling that it takes three levels of verbose i.e. -vvv to get the actual test error.

This tmt behavior also drives me crazy. Glad to see this change 👍

--json pretty \
--output "/boot/$kver.efi"
# To support multi arch
systemd_boot_file=$(ls /usr/lib/systemd/boot/efi/*.efi | xargs -n 1 basename)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right though we're going to need the same fix in Dockerfile.cfsuki. Which I think again is a driving use case for having a higher level wrapper for this.

@cgwalters
Copy link
Collaborator Author

Strange, I don't understand why the soft reboot test is failing with testing-farm here but works in previous runs.

@henrywang
Copy link
Collaborator

henrywang commented Nov 20, 2025

Strange, I don't understand why the soft reboot test is failing with testing-farm here but works in previous runs.

Already worked on it. Not figure out the root cause in TF. Will debug it in pure TF to see what happens.

@cgwalters
Copy link
Collaborator Author

Wait a minute...does testing-farm+tmt run all tests from a single plan on the same provisioned environment? It's sure looking like that's the case, see e.g. this log from the run here.

It's quite impressive actually that it's just that test that fails because some of these tests are going to leak state.

Yeah this seems to be the case. OK so this PR needs a rework.

@cgwalters
Copy link
Collaborator Author

because some of these tests are going to leak state.

(Of course, we could optimize this if we relied on a factory reset; I'm sure we could do that for most tests)

@cgwalters cgwalters assigned cgwalters and unassigned henrywang Nov 21, 2025
@cgwalters
Copy link
Collaborator Author

I've come to the reluctant conclusion there's no sane option to fix the "triple test definition" problem we have other than going to generating the tmt stuff. Working on that.

@cgwalters cgwalters marked this pull request as draft November 21, 2025 14:25
@github-actions github-actions bot added the area/documentation Updates to the documentation label Nov 21, 2025
@cgwalters
Copy link
Collaborator Author

One thing I will say is I am so glad that AI exists to learn things like fmf for me

@cgwalters cgwalters force-pushed the tmt-discover branch 2 times, most recently from 95c8140 to ec438ac Compare November 21, 2025 20:29
@cgwalters cgwalters changed the title Two tmt cleanups Various tmt + testing improvements Nov 21, 2025
@cgwalters cgwalters marked this pull request as ready for review November 21, 2025 20:43
@cgwalters
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the testing infrastructure, primarily by refactoring the tmt test execution into a dedicated xtask module. The changes make the test setup more robust and extensible, for example by adding support for synthetic upgrade tests.

My review focuses on a few key areas:

  • A critical issue in the newly added tmt.rs file where an outdated rand crate API is used, which will prevent compilation.
  • A couple of medium-severity suggestions to improve maintainability and clarity in the new test scripts and xtask code. One is about an inconsistency in a Dockerfile, and the other suggests refactoring a large loop for better readability and error handling.

Overall, the changes are a great step forward for the project's testing capabilities. Addressing the identified issues will make the new testing code even better.

Comment on lines +28 to +49
fn generate_random_suffix() -> String {
let mut rng = rand::rng();
const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789";
(0..8)
.map(|_| {
let idx = rng.random_range(0..CHARSET.len());
CHARSET[idx] as char
})
.collect()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The rand::rng() function and random_range method are from an older version of the rand crate. With rand version 0.9+, you should use rand::thread_rng() to get a thread-local random number generator, and the gen_range() method to generate a value in a given range. This change is necessary for the code to compile and aligns with current best practices for the rand crate.

Suggested change
fn generate_random_suffix() -> String {
let mut rng = rand::rng();
const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789";
(0..8)
.map(|_| {
let idx = rng.random_range(0..CHARSET.len());
CHARSET[idx] as char
})
.collect()
}
fn generate_random_suffix() -> String {
let mut rng = rand::thread_rng();
const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789";
(0..8)
.map(|_| {
let idx = rng.gen_range(0..CHARSET.len());
CHARSET[idx] as char
})
.collect()
}

Comment on lines 220 to 502
for plan in plans {
let plan_name = sanitize_plan_name(plan);
let vm_name = format!("bootc-tmt-{}-{}", random_suffix, plan_name);

println!("\n========================================");
println!("Running plan: {}", plan);
println!("VM name: {}", vm_name);
println!("========================================\n");

// Launch VM with bcvk

let launch_result = cmd!(
sh,
"bcvk libvirt run --name {vm_name} --detach {COMMON_INST_ARGS...} {image}"
)
.run()
.context("Launching VM with bcvk");

if let Err(e) = launch_result {
eprintln!("Failed to launch VM for plan {}: {:#}", plan, e);
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}

// Ensure VM cleanup happens even on error (unless --preserve-vm is set)
let cleanup_vm = || {
if preserve_vm {
return;
}
if let Err(e) = cmd!(sh, "bcvk libvirt rm --stop --force {vm_name}")
.ignore_stderr()
.ignore_status()
.run()
{
eprintln!("Warning: Failed to cleanup VM {}: {}", vm_name, e);
}
};

// Wait for VM to be ready and get SSH info
let vm_info = wait_for_vm_ready(sh, &vm_name);
let (ssh_port, ssh_key) = match vm_info {
Ok((port, key)) => (port, key),
Err(e) => {
eprintln!("Failed to get VM info for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}
};

println!("VM ready, SSH port: {}", ssh_port);

// Save SSH private key to a temporary file
let key_file = tempfile::NamedTempFile::new().context("Creating temporary SSH key file");

let key_file = match key_file {
Ok(f) => f,
Err(e) => {
eprintln!("Failed to create SSH key file for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}
};

let key_path = Utf8PathBuf::try_from(key_file.path().to_path_buf())
.context("Converting key path to UTF-8");

let key_path = match key_path {
Ok(p) => p,
Err(e) => {
eprintln!("Failed to convert key path for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}
};

if let Err(e) = std::fs::write(&key_path, ssh_key) {
eprintln!("Failed to write SSH key for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}

// Set proper permissions on the key file (SSH requires 0600)
{
use std::os::unix::fs::PermissionsExt;
let perms = std::fs::Permissions::from_mode(0o600);
if let Err(e) = std::fs::set_permissions(&key_path, perms) {
eprintln!("Failed to set key permissions for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}
}

// Verify SSH connectivity
println!("Verifying SSH connectivity...");
if let Err(e) = verify_ssh_connectivity(sh, ssh_port, &key_path) {
eprintln!("SSH verification failed for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}

println!("SSH connectivity verified");

let ssh_port_str = ssh_port.to_string();

// Run tmt for this specific plan using connect provisioner
println!("Running tmt tests for plan {}...", plan);

// Generate a unique run ID for this test
// Use the VM name which already contains a random suffix for uniqueness
let run_id = vm_name.clone();

// Run tmt for this specific plan
// Note: provision must come before plan for connect to work properly
let context = context.clone();
let how = ["--how=connect", "--guest=localhost", "--user=root"];
let env = ["TMT_SCRIPTS_DIR=/var/lib/tmt/scripts", "BCVK_EXPORT=1"]
.into_iter()
.chain(args.env.iter().map(|v| v.as_str()))
.flat_map(|v| ["--environment", v]);
let test_result = cmd!(
sh,
"tmt {context...} run --id {run_id} --all {env...} provision {how...} --port {ssh_port_str} --key {key_path} plan --name {plan}"
)
.run();

// Clean up VM regardless of test result (unless --preserve-vm is set)
cleanup_vm();

match test_result {
Ok(_) => {
println!("Plan {} completed successfully", plan);
test_results.push((plan.to_string(), true, Some(run_id)));
}
Err(e) => {
eprintln!("Plan {} failed: {:#}", plan, e);
all_passed = false;
test_results.push((plan.to_string(), false, Some(run_id)));
}
}

// Print VM connection details if preserving
if preserve_vm {
// Copy SSH key to a persistent location
let persistent_key_path = Utf8Path::new("target").join(format!("{}.ssh-key", vm_name));
if let Err(e) = std::fs::copy(&key_path, &persistent_key_path) {
eprintln!("Warning: Failed to save persistent SSH key: {}", e);
} else {
println!("\n========================================");
println!("VM preserved for debugging:");
println!("========================================");
println!("VM name: {}", vm_name);
println!("SSH port: {}", ssh_port_str);
println!("SSH key: {}", persistent_key_path);
println!("\nTo connect via SSH:");
println!(
" ssh -i {} -p {} -o IdentitiesOnly=yes root@localhost",
persistent_key_path, ssh_port_str
);
println!("\nTo cleanup:");
println!(" bcvk libvirt rm --stop --force {}", vm_name);
println!("========================================\n");
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic inside this for loop is quite long and contains repetitive error handling boilerplate. This makes it hard to follow the main flow of setting up and running a test plan.

Consider refactoring the body of this loop into a separate function, for example run_plan_in_vm(sh: &Shell, plan: &str, ...) -> Result<String>.

This would allow you to:

  1. Use the ? operator for cleaner error propagation within the new function.
  2. Manage VM cleanup automatically using a guard struct that calls cleanup_vm() on Drop. This is a common pattern in Rust for resource management (RAII).

An example of a guard struct:

struct VmGuard<'a, 'b> {
    sh: &'a Shell,
    vm_name: &'b str,
    preserve: bool,
}

impl<'a, 'b> Drop for VmGuard<'a, 'b> {
    fn drop(&mut self) {
        if self.preserve {
            return;
        }
        if let Err(e) = cmd!(self.sh, "bcvk libvirt rm --stop --force {vm_name}", vm_name = self.vm_name)
            .ignore_stderr()
            .ignore_status()
            .run()
        {
            eprintln!("Warning: Failed to cleanup VM {}: {}", self.vm_name, e);
        }
    }
}

The main loop would then become much simpler, focusing on iterating through plans and reporting results.

@@ -0,0 +1,3 @@
# Just creates a file as a new layer for a synthetic upgrade test
FROM localhost/bootc-integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Justfile's _build-upgrade-image rule overrides this FROM instruction with localhost/bootc-integration-bin using podman build --from. For clarity and to avoid confusion, this FROM instruction should match what is used in the Justfile, or use a generic placeholder like scratch to indicate it's always overridden.

FROM localhost/bootc-integration-bin

@cgwalters
Copy link
Collaborator Author

    content: /bin/sh: error while loading shared libraries: /lib64/libc.so.6: cannot apply additional memory protection after relocation: Permission denied

Hmm, I saw that once locally but it went away and I didn't debug. Dang it...I think this is a complicated side effect of us exporting the host container storage via virtiofs - basically the host selinux policy is tripping over the virtiofs_t type. Hum...I have an idea

@cgwalters
Copy link
Collaborator Author

Hum...I have an idea

➡️ bootc-dev/bcvk#159

Haven't reproduced the bug yet locally though but it at least doesn't hurt

bootc status
let st = bootc status --json | from json
let is_composefs = ($st.status.booted.composefs? != null)
if is_composefs {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One failure comes from is_composefs. It should be $is_composefs.

@henrywang
Copy link
Collaborator

Another failure comes from podman build and podman run in image-pushpull-upgrade, logically-bound-switch, soft-reboot and custom-selinux-policy. I can reproduce this issue locally and the fix works in my local env.

diff --git i/tmt/tests/booted/test-image-pushpull-upgrade.nu w/tmt/tests/booted/test-image-pushpull-upgrade.nu
index 5367dcc8..eca7e9dd 100644
--- i/tmt/tests/booted/test-image-pushpull-upgrade.nu
+++ w/tmt/tests/booted/test-image-pushpull-upgrade.nu
@@ -49,9 +49,9 @@ COPY usr/ /usr/
 RUN echo test content > /usr/share/blah.txt
 " | save Dockerfile
     # Build it
-    podman build -t localhost/bootc-derived .
+    podman build --security-opt label=disable -t localhost/bootc-derived .
     # Just sanity check it
-    let v = podman run --rm localhost/bootc-derived cat /usr/share/blah.txt | str trim
+    let v = podman run --rm --security-opt label=disable localhost/bootc-derived cat /usr/share/blah.txt | str trim
     assert equal $v "test content"

     let orig_root_mtime = ls -Dl /ostree/bootc | get modified
@@ -158,7 +158,7 @@ COPY usr/ /usr/
 RUN echo test content2 > /usr/share/blah.txt
 " | save Dockerfile
     # Build it
-    podman build -t localhost/bootc-derived .
+    podman build --security-opt label=disable -t localhost/bootc-derived .
     let booted_digest = $booted.imageDigest
     print $"booted_digest = ($booted_digest)"
     # We should already be fetching updates from container storage

@henrywang henrywang force-pushed the tmt-discover branch 2 times, most recently from 242a527 to 28e4d5a Compare November 24, 2025 13:03
@cgwalters
Copy link
Collaborator Author

Thanks for picking this up! I hope that bootc-dev/bcvk#159 fixes the selinux issue. That said, I haven't yet reproduced the failure locally, and I'm not understanding why.

@henrywang
Copy link
Collaborator

henrywang commented Nov 24, 2025

cs10 flaky issue appears again. So my workaround sync and sleep does not work. Investigate this issue now, have to fix it.

@henrywang
Copy link
Collaborator

henrywang commented Nov 24, 2025

github test failures:

  1. test-24-image-upgrade-reboot failed on CS9 only:
content: local image push + pull + upgrade
content: Applying localhost/bootc-integration-upgrade
content: error: Switching: Switching (ostree): Preparing import: Fetching manifest: failed to invoke method OpenImage: reference "[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]localhost/bootc-integration-upgrade:latest" does not resolve to an image ID
  1. test-26-examples-build failed on Fedora 42 only:
content: + ./bootc internals cfs --repo tmp/sysroot/composefs --insecure oci pull containers-storage:17c41e1c378b4616c1f17ccbffa0ec55ebed6a6e9ff384ad99f81c0b5dda1d0a
content: error: Unable to pull container image containers-storage:17c41e1c378b4616c1f17ccbffa0ec55ebed6a6e9ff384ad99f81c0b5dda1d0a: Failed to pull config Descriptor { media_type: ImageConfig, digest: Digest { algorithm: Sha256, value: "sha256:17c41e1c378b4616c1f17ccbffa0ec55ebed6a6e9ff384ad99f81c0b5dda1d0a", split: 6 }, size: 11833, urls: None, annotations: None, platform: None, artifact_type: None, data: None }: failed to invoke method GetBlob: open /run/host-container-storage/overlay/7f7c23aa7156afa1e272fdfa3506ba2f1947fc7224e6f3d377c99365d678c47e/diff/sysroot/ostree/repo/objects/68/7b65d1fc247722ad67c6fb16bbef684d45e1ec46c4d535ff0ba430a26bffda.file-xattrs-link: too many open files

Testing farm/Packit test failure:

  1. test-23-usroverlay failed on CS10 only:
err: Call to Reboot failed: Access denied

@cgwalters
Copy link
Collaborator Author

cgwalters commented Nov 24, 2025

localhost/bootc-integration-upgrade:latest" does not resolve to an image ID

Hmmm...it might be older podman doesn't handle additional image stores in the same way? I'll take a look.

Nah it's really simple, --bind-storage-ro doesn't work with C9S because it's missing https://www.freedesktop.org/software/systemd/man/latest/systemd-debug-generator.html#systemd.extra-unit.*

It's pretty tempting to try to "backfill" this...

too many open files

I think this one is a flake...but if it keeps happening we'll have to debug.

@cgwalters
Copy link
Collaborator Author

Nah it's really simple, --bind-storage-ro doesn't work with C9S because it's missing https://www.freedesktop.org/software/systemd/man/latest/systemd-debug-generator.html#systemd.extra-unit.*

I mean, the problem was simple to understand...the fix is definitely uglier and more involved as it requires knowledge about the target environment capabilities "up front" in our tests. Anyways, done now and tested both cases locally.

cgwalters and others added 8 commits November 24, 2025 17:11
To fix SELinux issues.

Signed-off-by: Colin Walters <[email protected]>
We need to run most of our tests in a separate provisioned machine,
which means it needs an individual plan. And then we need a test
for that plan. And then we need the *actual test code*.

This "triplication" is a huge annoying pain.

TMT is soooo complicated, yet as far as I can tell it doesn't offer
us any tools to solve this. So we'll do it here, cut over to
generating the TMT stuff from metadata defined in the test file.

Hence adding a test is just:

- Write a new tests/booted/foo.nu
- `cargo xtask update-generated`

Signed-off-by: Colin Walters <[email protected]>
…tion

Move TMT test runner code from xtask.rs to tmt module:
- `run_tmt()` and `tmt_provision()` functions
- Helper functions for VM management and SSH connectivity
- Related constants

Also refactor `update_integration()` to use serde_yaml::Value for
building YAML structures instead of string concatenation.

Add detailed error reporting for failed TMT tests:
- Assign run IDs using `tmt run --id`
- Display verbose reports with `tmt run -i {id} report -vvv`

Assisted-by: Claude Code (Sonnet 4.5)
Signed-off-by: Colin Walters <[email protected]>
Otherwise we compile many dependencies twice unnecessarily.

Signed-off-by: Colin Walters <[email protected]>
To make it easier to do upgrade tests.

Signed-off-by: Colin Walters <[email protected]>
Co-authored-by: Xiaofeng Wang <[email protected]>
Signed-off-by: Colin Walters <[email protected]>
This ensures it all can work much more elegantly/naturally
with sealed UKI builds - we don't want to do the build-on-target
thing.

Signed-off-by: Colin Walters <[email protected]>
CentOS 9 lacks systemd.extra-unit.* support which is required for
--bind-storage-ro to work with bcvk. This was causing test failures
on centos-9 while working fine on Fedora.

Change the approach so tests express intent via `extra.try_bind_storage: true`
metadata, and xtask handles the details:

- Detect distro by running the container image and parsing os-release
- Pass distro to tmt via --context=distro=<id>-<version>
- Only add --bind-storage-ro when test wants it AND distro supports it
- When bind storage is available, also set BOOTC_upgrade_image env var
- Tests can detect missing $env.BOOTC_upgrade_image and fall back to
  building the upgrade image locally

Add --upgrade-image CLI option to specify the upgrade image path,
replacing the old --env=BOOTC_upgrade_image approach.

Extract magic values to clear const declarations at the top of the file
for better maintainability.

Assisted-by: Claude Code (Sonnet 4.5)
Signed-off-by: Colin Walters <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/documentation Updates to the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants