Skip to content

fix: fix progress bar for RDataFrame with Range limits #19294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MohamedElashri
Copy link

This Pull request:

Changes or fixes:

This PR fixes the progress bar display for RDataFrame when using Range() transformations. Previously, the progress bar would always show the total entries from the original dataset, even when using Range() to limit the number of processed entries. Now the progress bar correctly shows the actual number of entries being processed.

Also this PR adds new tutorial df108_ProgressRange.C demonstrating the correct usage with Range()

I need to write a unit test but still trying to figure this out so this is a work in progress PR. I

Checklist:

  • tested changes locally - Everything works!
  • updated the docs (if necessary) - If adding new tutorial counts but can work on updating to ROOT docs itself.

This PR fixes #15323

Technical Details:
- Add `RResultPtr<ULong64_t>` fTotalEntries member to ProgressHelper class
- Update `ProgressHelper` constructor to accept optional `totalEntries` parameter
- Modify `PrintStats`PrintStatsFinal`, and `PrintProgressBar` methods to use
  `fTotalEntries` when available and ready, falling back to `ComputeNEventsSoFar()`
- Update `AddProgressBar` to pass `node.Count()` result to ProgressHelper
- Add needed `#include <ROOT/RResultPtr.hxx>` directives
- Use `const_cast` to access `RResultPtr` value in const methods (workaround
  for `RResultPtr's` lack of const accessors)

New features:
- Progress bar now shows accurate total entries when `RResultPtr` is available
- Supports progress tracking for transformed datasets (e.g. after `Range()`)

Documentation:
- Add tutorial `df108_ProgressRange.C` demonstrating correct usage with `Range()`
Copy link

github-actions bot commented Jul 6, 2025

Test Results

    20 files      20 suites   3d 10h 42m 23s ⏱️
 3 192 tests  3 191 ✅ 0 💤 1 ❌
62 196 runs  62 194 ✅ 1 💤 1 ❌

For more details on these failures, see this check.

Results for commit 03c994d.

@vepadulano
Copy link
Member

Hi @MohamedElashri ,

Thank you for attempting to fix this issue, really appreciated! I tried your tutorial and I noticed that with the current changes the implementation of the progress bar is only displayed at the end of the event loop, which unfortunately defeats its purpose. You can see with the following example that I created by slightly modifying your tutorial

#include <ROOT/RDataFrame.hxx>
#include <iostream>
#include <ROOT/RDFHelpers.hxx>
#include <ROOT/RLogger.hxx>

auto verbosity = ROOT::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(), ROOT::ELogLevel::kInfo);

void df108_ProgressRange()
{
   ROOT::RDataFrame df(1000000);
   auto ranged = df.Range(500);
   ROOT::RDF::Experimental::AddProgressBar(ranged);
   auto h = ranged.Define("x", []() { return double{42}; }).Histo1D<double>("x");
   std::cout << h->GetEntries() << std::endl;
}

int main()
{
df108_ProgressRange();
}

The output is

Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/root/tree/dataframe/src/RLoopManager.cxx:900 in void ROOT::Detail::RDF::RLoopManager::Run(bool)>: Starting event loop number 0.
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/root/tree/dataframe/src/RLoopManager.cxx:859 in void ROOT::Detail::RDF::RLoopManager::Jit()>: Nothing to jit and execute.
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/root/tree/dataframe/src/RLoopManager.cxx:938 in void ROOT::Detail::RDF::RLoopManager::Run(bool)>: Finished event loop number 0 (0s CPU, 0.000498056s elapsed).
[Total elapsed time: 0:00m  processed files: 0 / 0  processed evts: 500 / 500] 

As you can see the progressbar only appears at the end of the event loop. I believe there is some hidden trigger of the computation graph in your changes which makes them unfortunately not fit for review at the moment.

@MohamedElashri
Copy link
Author

Hi @vepadulano

Thanks for the feedback. This is my first contribution so I wanted just to make it small and limited to the particular problem of wrong progress-bar with Range(). But what you are describing is actually the current behavior of RDataFrame progress-bar so it is not something new.

However, I agree that the current progress-bar is useless. I'll work on making a useful version but it would need some time to think about handling large and small number of events/iterations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RDF Progress bar seems to ignore Range() calls
2 participants