Skip to content

Conversation

@nephros
Copy link
Contributor

@nephros nephros commented Oct 29, 2025

RFC enhancement: Have the daemon keep track of files which are asked about by the preload lib, but are not patched.

This is the overwhelming majority of requests in normal runtime scenarios.

The idea is that calls to QFileInfo::exists() are minimized, and file names or unpatched files returned as soon as possible.

(Open question: Can we be smarter than the QFileInfo implementation and the kernel fs cacche? ;) )

The original idea was presented in
#431 (comment)
which suggests a Bloom filter to handle these file paths.

I experimented with bloom filters, but found the resulting logic too hard for
my small brain to handle.

So this implementation abuses Qt's QCache,
but uses a class which means the actual cache could be replaced with something better later.

@b100dian if you have the time, maybe look over the idea and review it.

  • Hotcache: Add a cache object to track nonexisting files
  • Hotcache: remove entry if it existed
  • Hotcache: simplify logic
  • Hotcache: store Object, retrieve one on lookup
  • Hotcache: remove log line
  • Hotcache: use object() not take()
  • Hotcache: move debugging messages after writing to the connection
  • Hotcache: set up the filter with some initial contents
  • Hotcache: Add a Settings switch
  • Hotcache: wire up settings switch
  • Hotcache: move into a class, rename to m_filter
  • Hotcache: Don't be a template class
  • Hotcache: Prepare for stats collecting
  • Hotcache: Static initializer lists
  • Hotcache: Implement a simpler stats method
  • Hotcache: Make the lists public
  • Hotcache: Use foreach again.
  • Hotcache: Missing braces
  • Hotcache: check for active filter in startReadingLocalServer
  • Hotcache: Use plain qstring for stats
  • Hotcache: Add some debug prints
  • Hotcache: Don't copy around strings, us a bool.
  • Print hotcache stats on PM exit
  • Resolve symlinks before adding to primed cache

@nephros
Copy link
Contributor Author

nephros commented Oct 29, 2025

One problem with evaluating this is getting a proper benchmark.

Suggestions are highly welcome of a test scenario which can somewhat reliably
compare "performance".

So the whole thing is currently a "should theoretically improve things" thing.

@nephros
Copy link
Contributor Author

nephros commented Oct 29, 2025

A comment about the logic.

Without this change, the communication goes something like this:

  1. A process launches, preload library gets injected
  2. Process tries to open a file /path/to/foo
  3. Preload lib takes the file path string and writes it to the Socket
  4. PM Daemon reads the file name from the Socket
  5. PM Daemon checks whether the path exists in its tmp overlay, which means it is a patched file. This check uses QFileInfo::exists() which is potentially expensive.
  6. If it exists, it writes the path of the patched file version back to the Socket
  7. If not, it writes the original path name to the Socket
  8. Preload lib reads the path string, and passes the result of open/open64 on the path received to the original Process,

With this change, the part of the Daemon becomes:

  1. PM Daemon has a cache, which holds a list of unpatched files.
  2. PM Daemon reads the file name from the Socket
  3. PM Daemon checks whether the file name is contained in the cache (the cache notices this check/query).
  4. If yes, write the original file name back.
  5. If no, check that the file exists in the tmp overlay (same as above, but should happen more rarely).
  6. If it exists, it writes the path of the patched file version back to the Socket
  7. If it does not exist, register the path in the cache,

As the cache holds a list of unpatched files, it will continue to grow. we
rely on the QCache eviction logic to make sure it doesn't grow too large
(that's built-in).

QCache also has logic to evaluate which entries are useful to keep in the
cache, which is the most accessed.

This way, it should over time become a list of "hot", i.e. frequently accessed
file paths, improving efficiency.

See the QCache docs about those internals.

@nephros nephros changed the title Keep a cache of frequently accessed files RFC: Keep a cache of frequently accessed files Oct 31, 2025
@b100dian
Copy link
Contributor

Thanks for the detailed write-up and for starting working on this! Indeed, I remember people having races at initialization that only happen when patchmanager is installed so this is indeed of interest for the whole system.

In this phase, since I did not look at the code yet, my only questions are:

  1. why are you caching the NON-patched files, that are larger number that the patched ones
  2. are you also evicting the cache in some patchmanager related cases (such as: patch installed).

Thanks!

@nephros
Copy link
Contributor Author

nephros commented Oct 31, 2025

Thanks for the detailed write-up and for starting working on this! Indeed, I remember people having races at initialization that only happen when patchmanager is installed so this is indeed of interest for the whole system.

In this phase, since I did not look at the code yet, my only questions are:

1. why are you caching the NON-patched files, that are larger number that the patched ones

The idea is to have a list of "hot false positives", i.e. files that are accessed by almost all processes (like libc, /etc/passwd, and so on) but are usually not patched, and return early (quickly, in minimal time) for those.

And QCache with its auto-eviction logic (as well as a Bloom filter with its "maybe in set/definitely not in set" characteristic) isn't suitable for a list of patched files I think. Such a list would have to be persistent and managed by us.
But doing that reliably is hard, it would be a switch from "just check whether the file exists on the system" to monitoring things like app install/uninstall to maintain the list correctly.

2. are you also evicting the cache in some patchmanager related cases (such as: patch installed).

Not at the moment, but this could (really should) be done, every time the "list of active patches" changes.
Actually the cache could be cleared on every activate/deactivate event, and would be re-populated by the next processes accessing files. If this caching is at all useful, it shouldn't be impacted much by eviction, as these events are relatively rare.

@nephros
Copy link
Contributor Author

nephros commented Oct 31, 2025

Oh, and don't get scared by the long list of commits, the actual changes are not that many.

Copy link
Contributor

@b100dian b100dian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks for trying out this in-memory-do-not-touch-the-storage cache.

I have a few comments on the code, and one suggestion for testing:

That is, a simple find /usr -type f -exec head -n1 {} \; could probably be run a couple of times with FSCache on and off to see if there are differences.

If /usr is too big for the cache then maybe some narrower path.

Also, I would be interested in the memory usage - since its a number of hashed keys times one allocated object - what would that mean to fragmentation in the end (that last part) ?

payload = fakePath.toLatin1();
bool passAsIs = true;
if (
(!m_failed) // return unaltered for failed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy, looking up where m_failed is set reminds me we should rewrite patchmanager in a simpler way :))

Lipstick crash? Journal match ? 😮

insert(entry, new QObject(), HOTCACHE_COST_STRONG);
}
}
// they may be wrong, so use a higher cost than default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow why do we pre-load a set of sonames if we're going to make them be the first out anyways.

QString list;
list + "Filter Stats:"
+ "\n==========================="
+ "\n Hotcache entries:: .............." + size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really curious some stats after a booted system and after some days of usage, how do they look like ?

Copy link
Contributor Author

@nephros nephros Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statistics have been approved a bit in the latest commit, and you can now trigger logging (and a string reply) from DBus:

busctl call org.SfietKonstantin.patchmanager /org/SfietKonstantin/patchmanager org.SfietKonstantin.patchmanager statistics| xargs echo -e

Example output:

Patchmanager version: 3.2.12+obs.20251103104027.132.g5670c8e
Applied Patches: 20
Patched files: 97
Watched files: 50
Filter Stats:
===========================
  Hotcache entries:: ..............126
  Hotcache cost: ..................259/5000
  Hotcache hit/miss: ..............1284/106 (0.00%)
===========================
  Hotcache top entries: ...........
/usr/share/patchmanager/patches/patch-keyboard-no-vibration/patch.json
/usr/share/patchmanager/patches/patch-named-sims/patch.json
/usr/share/patchmanager/patches/patch-email-headers/unified_diff.patch
/usr/share/patchmanager/patches/patch-icon-filter/patch.json
/etc/passwd
/usr/lib64/libblkid.so.1.1.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! what happens if you run find / -type f -exec grep something {} \; with the cache?

(honest question, I do this all the times ;D)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I should make my own build but I am away from any personal computer for the week)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! what happens if you run find / -type f -exec grep something {} \; with the cache?

(honest question, I do this all the times ;D)

You need plocate ;)

So it goes to something like this.

busctl call "org.SfietKonstantin.patchmanager" /org/SfietKonstantin/patc
hmanager org.SfietKonstantin.patchmanager statistics b true | xargs echo -e|head
s Patchmanager version: 3.2.12+obs.20251103192017.134.g49ef3e1
Applied Patches: 20
Patched files: 97
Watched files: 50
Filter Stats:
===========================
Hotcache entries:: ..............1512
Hotcache cost: ..................3033/5000
Hotcache hit/miss: ..............59162/1506 (0.98%)
===========================

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, don't actually do find /, that can crash or hang the device (because of stuff in /proc or /sys or so).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, you MUST run find as non-root, otherwise the preload lib will do nothing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants