-
Notifications
You must be signed in to change notification settings - Fork 23
RFC: Keep a cache of frequently accessed files #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
take() gives ownership to the caller. object() does not.
- Try a Q_OBJECT macro for a template class... - Don't be a template class - Become Object-based, add contructor - Implement a constructor
|
One problem with evaluating this is getting a proper benchmark. Suggestions are highly welcome of a test scenario which can somewhat reliably So the whole thing is currently a "should theoretically improve things" thing. |
|
A comment about the logic. Without this change, the communication goes something like this:
With this change, the part of the Daemon becomes:
As the cache holds a list of unpatched files, it will continue to grow. we QCache also has logic to evaluate which entries are useful to keep in the This way, it should over time become a list of "hot", i.e. frequently accessed See the QCache docs about those internals. |
|
Thanks for the detailed write-up and for starting working on this! Indeed, I remember people having races at initialization that only happen when patchmanager is installed so this is indeed of interest for the whole system. In this phase, since I did not look at the code yet, my only questions are:
Thanks! |
The idea is to have a list of "hot false positives", i.e. files that are accessed by almost all processes (like libc, /etc/passwd, and so on) but are usually not patched, and return early (quickly, in minimal time) for those. And QCache with its auto-eviction logic (as well as a Bloom filter with its "maybe in set/definitely not in set" characteristic) isn't suitable for a list of patched files I think. Such a list would have to be persistent and managed by us.
Not at the moment, but this could (really should) be done, every time the "list of active patches" changes. |
|
Oh, and don't get scared by the long list of commits, the actual changes are not that many. |
b100dian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thanks for trying out this in-memory-do-not-touch-the-storage cache.
I have a few comments on the code, and one suggestion for testing:
That is, a simple find /usr -type f -exec head -n1 {} \; could probably be run a couple of times with FSCache on and off to see if there are differences.
If /usr is too big for the cache then maybe some narrower path.
Also, I would be interested in the memory usage - since its a number of hashed keys times one allocated object - what would that mean to fragmentation in the end (that last part) ?
| payload = fakePath.toLatin1(); | ||
| bool passAsIs = true; | ||
| if ( | ||
| (!m_failed) // return unaltered for failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy, looking up where m_failed is set reminds me we should rewrite patchmanager in a simpler way :))
Lipstick crash? Journal match ? 😮
| insert(entry, new QObject(), HOTCACHE_COST_STRONG); | ||
| } | ||
| } | ||
| // they may be wrong, so use a higher cost than default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow why do we pre-load a set of sonames if we're going to make them be the first out anyways.
| QString list; | ||
| list + "Filter Stats:" | ||
| + "\n===========================" | ||
| + "\n Hotcache entries:: .............." + size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really curious some stats after a booted system and after some days of usage, how do they look like ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statistics have been approved a bit in the latest commit, and you can now trigger logging (and a string reply) from DBus:
busctl call org.SfietKonstantin.patchmanager /org/SfietKonstantin/patchmanager org.SfietKonstantin.patchmanager statistics| xargs echo -e
Example output:
Patchmanager version: 3.2.12+obs.20251103104027.132.g5670c8e
Applied Patches: 20
Patched files: 97
Watched files: 50
Filter Stats:
===========================
Hotcache entries:: ..............126
Hotcache cost: ..................259/5000
Hotcache hit/miss: ..............1284/106 (0.00%)
===========================
Hotcache top entries: ...........
/usr/share/patchmanager/patches/patch-keyboard-no-vibration/patch.json
/usr/share/patchmanager/patches/patch-named-sims/patch.json
/usr/share/patchmanager/patches/patch-email-headers/unified_diff.patch
/usr/share/patchmanager/patches/patch-icon-filter/patch.json
/etc/passwd
/usr/lib64/libblkid.so.1.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! what happens if you run find / -type f -exec grep something {} \; with the cache?
(honest question, I do this all the times ;D)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I should make my own build but I am away from any personal computer for the week)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! what happens if you run
find / -type f -exec grep something {} \;with the cache?(honest question, I do this all the times ;D)
You need plocate ;)
So it goes to something like this.
busctl call "org.SfietKonstantin.patchmanager" /org/SfietKonstantin/patc
hmanager org.SfietKonstantin.patchmanager statistics b true | xargs echo -e|head
s Patchmanager version: 3.2.12+obs.20251103192017.134.g49ef3e1
Applied Patches: 20
Patched files: 97
Watched files: 50
Filter Stats:
===========================
Hotcache entries:: ..............1512
Hotcache cost: ..................3033/5000
Hotcache hit/miss: ..............59162/1506 (0.98%)
===========================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, don't actually do find /, that can crash or hang the device (because of stuff in /proc or /sys or so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, you MUST run find as non-root, otherwise the preload lib will do nothing :)
e94e2e6 to
ec2aae2
Compare
RFC enhancement: Have the daemon keep track of files which are asked about by the preload lib, but are not patched.
This is the overwhelming majority of requests in normal runtime scenarios.
The idea is that calls to QFileInfo::exists() are minimized, and file names or unpatched files returned as soon as possible.
(Open question: Can we be smarter than the QFileInfo implementation and the kernel fs cacche? ;) )
The original idea was presented in
#431 (comment)
which suggests a Bloom filter to handle these file paths.
I experimented with bloom filters, but found the resulting logic too hard for
my small brain to handle.
So this implementation abuses Qt's QCache,
but uses a class which means the actual cache could be replaced with something better later.
@b100dian if you have the time, maybe look over the idea and review it.