Skip to content

Conversation

@sebroy
Copy link

@sebroy sebroy commented Jun 26, 2025

Once again, another conflict in .pre-commit-config.yaml (same conflict as last month that was addressed by #75).

This resolves the conflict by restoring the .pre-commit-config.yaml file but commenting out the one mypi check that doesn't run. This will reduces the chance of future conflicts in the rest of the file in the future.

ab-pre-push: https://selfservice-jenkins.eng-tools-prd.aws.delphixcloud.com/job/appliance-build-orchestrator-pre-push/12177/

@sebroy sebroy force-pushed the dlpx/pr/sebroy/cf484e30-86d4-4277-91d9-c1444e52d4e3 branch from 30a20bf to 6938fa8 Compare June 26, 2025 23:47
@sebroy sebroy changed the title Merge remote-tracking branch 'origin/upstreams/develop' into develop DLPX-94248 address drgn merge conflict with upstream Jun 26, 2025
@sebroy sebroy marked this pull request as ready for review June 26, 2025 23:51
osandov and others added 21 commits September 8, 2025 10:23
I'm getting "error: impossible constraint in 'asm'" build errors on
aarch64, which is apparently caused by compiling with -O0. We compile
drgn_test_kthread_fn* with -O0. Use more specific attributes and
barriers to achieve the same result instead.

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
It'd be better to not use --no-warn-return-any on vmtest, but I'd rather
not run mypy twice.

Signed-off-by: Omar Sandoval <[email protected]>
For parallel vmtests, we want a more flexible interface than the
queue-based API of download(). Refactor it into a class with methods for
specific downloads.

Signed-off-by: Omar Sandoval <[email protected]>
We don't need two synchronous APIs, so use Downloader everywhere and
fold download() into download_thread(). Some vestigial uses of
DownloadCompiler/DownloadKernel remain.

Signed-off-by: Omar Sandoval <[email protected]>
Otherwise, building the test kmod will fail if the compiler hasn't been
downloaded before.

Fixes: 033510a ("vmtest.vm: add --{build,insert}-test-kmod options")
Signed-off-by: Omar Sandoval <[email protected]>
…nd KernelFlavor

Various parts of the vmtest code go through some trouble to key
Architecture and KernelFlavor on name to avoid hashing and comparing the
other fields. Instead, we can use a dataclass with eq=False disabled so
that it's all done by identity.

Signed-off-by: Omar Sandoval <[email protected]>
Normal dicts are guaranteed to be ordered since Python 3.7.

Signed-off-by: Omar Sandoval <[email protected]>
This will be required for reliable parallel test runs. Even for serial
runs, the time to run tests is the same or slightly faster with fewer
CPUs, likely due to bottlenecking on 9pfs and less setup.

Signed-off-by: Stephen Brennan <[email protected]>
[Omar: expand commit message]
Signed-off-by: Omar Sandoval <[email protected]>
For running tests in parallel, we want to log to a file instead of
getting interleaved output.

Signed-off-by: Stephen Brennan <[email protected]>
[Omar: rebase, add rootfsbuild, remove main_thread argument superseded by pdeathsig]
Signed-off-by: Omar Sandoval <[email protected]>
The full test suite, including foreign architectures and alternative
kernel configurations, can take a long time to run. However it's mostly
work that can happen in parallel. Add a -j option to do this. By
default, everything still happens serially.

Closes osandov#489.

Co-authored-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
[Omar: rework threading model, various cleanups]
Signed-off-by: Omar Sandoval <[email protected]>
There is a window between a process being flagged as stopped and it
actually descheduling. Various stack tracing tests have been flaky with
"cannot unwind stack of running task" errors due to catching the process
in this window.

Fix it by waiting for /proc/pid/syscall to not return "running" (which
is what we did before the fixes commit, but now we don't need to check
for a specific syscall number).

Fixes: bab4f43 ("tests: replace fork_and_sigwait() and fork_and_call() with fork_and_stop()")
Signed-off-by: Omar Sandoval <[email protected]>
There are two issues with the error margin we allow for the counters in
these tests:

* VmRSS is the sum of three counters, so its error margin should also be
  tripled.
* Before the switch to per-CPU counters, the error margin was
  nr_threads * 64 * (fault_around_bytes / PAGE_SIZE).

Signed-off-by: Omar Sandoval <[email protected]>
Use typing.Deque instead.

Signed-off-by: Omar Sandoval <[email protected]>
…ntation

The semantics of this helper are really fuzzy because the underlying
timestamps are updated lazily, so let's do our best to explain it.

Signed-off-by: Omar Sandoval <[email protected]>
Especially when running vmtest in parallel, this test sometimes fails
because the rq clock hasn't been updated. Force it to update by forcing
the process to migrate CPUs.

Signed-off-by: Omar Sandoval <[email protected]>
Fixes: 91da9ac ("Migrate runq related helpers from drgn-tools")
Signed-off-by: Omar Sandoval <[email protected]>
It can be nice to modify or play around in the chroots after creation.
For example, to install new packages without rebuilding. While there is
a tool for running commands in a vmtest VM, the VMs have no network
access so they're less flexible. While the necessary command isn't
really all that complicated, it's nice to not have to think of it. Add a
new script to enter the rootfs.

Signed-off-by: Stephen Brennan <[email protected]>
The tests.linux_kernel.test_stack_trace.TestStackTrace.test_local_variable
test is failing on Arm on Linux 5.4 and 4.19. This was apparently fixed
by removing -fno-var-tracking-assignments from the compiler flags in
v5.10. Backport the patch.

Signed-off-by: Omar Sandoval <[email protected]>
In osandov#537 it was pointed out that the ability to pipe output produced by
executing Python statements would be very useful. Unfortunately the
shell redirection operators are part of the Python grammar as well, and
there are many cases of ambiguity, where a command could be split into
python code and shell pipeline in multiple valid ways.

However, these ambiguities may not be a dealbreaker. We can resolve them
by always splitting on the first shell operator which produces a valid
Python code on the left hand side. In cases where you want to force a
different interpretation, you can wrap your Python code in parentheses.
These ensure that any shell operator within the parentheses doesn't
introduce a pipeline, because the code prior to them is incomplete
without the closing parenthesis.

Signed-off-by: Stephen Brennan <[email protected]>
We need to append to KBUILD_CFLAGS, not reassign it. This was causing
weird build failures on every architecture.

Fixes: 27f069e ("vmtest.kbuild: add patch to fix missing debug info on old Arm kernels")
Signed-off-by: Omar Sandoval <[email protected]>
qiyeliu and others added 2 commits September 10, 2025 12:46
Add ptov command for drgn.

Signed-off-by: Ye Liu <[email protected]>
Signed-off-by: Song Hu <[email protected]>
Between this PR being tested and merged, commit 3f47e27
("vmtest.vm: Reduce smp to 2") was merged, which broke a test case that
hard-coded CPU 2. Change to getting the list of all CPUs instead.

Signed-off-by: Omar Sandoval <[email protected]>
@sebroy sebroy force-pushed the dlpx/pr/sebroy/cf484e30-86d4-4277-91d9-c1444e52d4e3 branch from e86116a to f9e13dc Compare September 11, 2025 13:31
@sebroy sebroy force-pushed the dlpx/pr/sebroy/cf484e30-86d4-4277-91d9-c1444e52d4e3 branch from f9e13dc to 86fb304 Compare September 11, 2025 13:31
@sebroy sebroy requested a review from mmaybee September 11, 2025 13:38
Copy link

@mmaybee mmaybee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document somewhere/somehow that the --no-verify flag should be used when doing git review on PRs for this repo?

@sebroy
Copy link
Author

sebroy commented Sep 11, 2025

@mmaybee

Should we document somewhere/somehow that the --no-verify flag should be used when doing git review on PRs for this repo?

Yes, good idea. I'll create a Systems Platform build monitoring page in confluence where we can start accumulating tips like this (including the ZFS information you talked about today).

@sebroy sebroy enabled auto-merge September 17, 2025 01:28
@sebroy sebroy merged commit ebc1020 into develop Sep 17, 2025
2 of 9 checks passed
@sebroy sebroy deleted the dlpx/pr/sebroy/cf484e30-86d4-4277-91d9-c1444e52d4e3 branch September 17, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

7 participants