Skip to content

Conversation

@sagigrimberg
Copy link
Contributor

Some devices are "hidden" from userspace due to various reasons. One reason is native nvme multipathing, where the path devices do not present a device node to the user but only exposes the stacking device as the only nvme device. Still there are sysfs attributes for these hidden devices, but we should ignore them altogether because these are not "real" devices at all.

@k8s-ci-robot
Copy link
Collaborator

Hi @sagigrimberg. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bobbypage
Copy link
Contributor

Thanks for the PR. What is the issue this is solving currently? Are these devices somehow reported currently?

@sagigrimberg
Copy link
Contributor Author

Kubelet fails to access these devices, preventing csi metrics collection.

kubelet[4192]: E0102 10:01:46.083009    4192 info.go:99] Failed to get disk map: open /sys/block/nvme0c1n1/dev: no such file or directory

@bobbypage
Copy link
Contributor

Got it. Is the /hidden a a common way that devices are reported hidden via sysfs? Is this documented somewhere?

@sagigrimberg
Copy link
Contributor Author

Yes, it is a common way. You can also refer to util-linux/lsblk.c that also does the same thing...

Unfortunately, we didn't add a proper documentation entry for it in sysfs/block ABI documentation
I can do that.

@sagigrimberg sagigrimberg force-pushed the ignore-hidden-devices branch from d1e25fc to 38d4b11 Compare March 3, 2023 08:30
@sagigrimberg
Copy link
Contributor Author

ping?

@sagigrimberg
Copy link
Contributor Author

How can I trigger tests on this PR?

@iwankgb
Copy link
Collaborator

iwankgb commented Mar 14, 2023

/ok-to-test

@sagigrimberg
Copy link
Contributor Author

Can I get a review on this?

@sagigrimberg
Copy link
Contributor Author

ping?

@sagigrimberg
Copy link
Contributor Author

@iwankgb @bobbypage ?

@sagigrimberg sagigrimberg force-pushed the ignore-hidden-devices branch from 38d4b11 to 1dfb815 Compare March 29, 2023 11:43
@sagigrimberg
Copy link
Contributor Author

Any reason why this PR was not merged already? @bobbypage @iwankgb ?

@sagigrimberg sagigrimberg requested a review from bobbypage May 28, 2023 11:13
@sagigrimberg sagigrimberg force-pushed the ignore-hidden-devices branch from 1dfb815 to 4149224 Compare June 11, 2023 21:14
@sagigrimberg sagigrimberg requested a review from sagi-lb June 11, 2023 21:14
@sagigrimberg
Copy link
Contributor Author

@bobbypage this is sitting here forever now, any reason why this is not taken?

@sagigrimberg
Copy link
Contributor Author

/retest

@sagigrimberg
Copy link
Contributor Author

retest

@sagigrimberg
Copy link
Contributor Author

@bobbypage can we get this in please? The failed test looks completely unrelated to the change?

This has been sitting here forever and I don't understand why this should be the case TBH. This PR has been primarily ignored while fixing a real issue in cadvisor.

@sagigrimberg sagigrimberg force-pushed the ignore-hidden-devices branch 2 times, most recently from ea1dac6 to 2393946 Compare June 26, 2023 12:09
@sagi-lb
Copy link

sagi-lb commented Jun 26, 2023

/retest

@iwankgb
Copy link
Collaborator

iwankgb commented Jun 26, 2023

@sagigrimberg, while I understand your sentiment regarding maintainers not reviewing code in timely manner, I have to say that majority of us are volunteers and we work on cAdvisor in our free time.
Take it into consideration when assigning blame for ignoring important changes.

@sagigrimberg
Copy link
Contributor Author

@iwankgb I was not assigning blame. I was just stating a fact. This PR was primarily ignored. This PR stands since March 2'nd, which since then cadvisor accumulated a few dosands of of PRs merged, while this one has gone, presumably ready to merge since about then (outside of breaking lately and me fixing it after a rebase).

If you don't have free time to work on the project, or you don't think that it is a suitable fix for the project, or if you think that there is risk associated it is fine, but it would be appreciated to get at least some feedback...

Glad this at least got some attention though...

// https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git
// - c8487d854ba5 ("lsblk: Ignore hidden devices")
hidden, err := os.ReadFile(path.Join(blockDir, name, "/hidden"))
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-nil error that is not os.ErrNotExists must not be silently ignored. Returned wrapped error and handle it accordingly wherever this function is called, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense @iwankgb, fixing this now.

@sagigrimberg sagigrimberg force-pushed the ignore-hidden-devices branch from 2393946 to b7b05f8 Compare June 26, 2023 21:16
@sagigrimberg sagigrimberg requested a review from iwankgb June 26, 2023 21:17
return false, nil
}
if err != nil {
return false, fmt.Errorf("failed to read %s: %s", devHiddenPath, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather wrap the error:

Suggested change
return false, fmt.Errorf("failed to read %s: %s", devHiddenPath, err)
return false, fmt.Errorf("failed to read %s: %w", devHiddenPath, err)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some devices are "hidden" from userspace due to various
reasons. One reason is native nvme multipathing, where the
path devices do not present a device node to the user but
only exposes the stacking device as the only nvme device.
Still there are sysfs attributes for these hidden devices, but
we should ignore them altogether because these are not "real"
devices at all.

Signed-off-by: Sagi Grimberg <[email protected]>
@sagigrimberg sagigrimberg force-pushed the ignore-hidden-devices branch from b7b05f8 to 88bfa0d Compare June 27, 2023 20:00
@iwankgb iwankgb merged commit a9bf713 into google:master Aug 4, 2023
ringerc pushed a commit to ringerc/google-cadvisor-patches that referenced this pull request Jun 3, 2025
sysfs: trim spaces in device hidden check

(cherry picked from commit 96c346e)

Backports google#3260 from
cadvisor v0.51.0.

This is needed to fix container_fs_ metrics on systems with multipath
devices. Without it, errors like:

    Failed to get disk map: open /sys/block/nvme0c0n1/dev

will be logged, and cadvisor will emit nonsensical metrics with
the label device="" for filesystem I/O.

See google#3693 for full explanation.

Signed-off-by: Craig Ringer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants