Skip to content

Conversation

Edwardvaneechoud
Copy link
Owner

This pull request refactors how execution location and worker offloading are handled throughout the codebase. It removes the "auto" execution location, clarifies the distinction between local and remote execution, and introduces utility functions for determining and validating execution locations based on global settings. Additionally, it updates related logic in data sampling, node execution, and flow graph management to align with these changes.

Execution Location and Offloading Refactor

  • Removed the "auto" option from ExecutionLocationsLiteral, restricting execution locations to "local" or "remote", and added utility functions (get_global_execution_location, is_valid_execution_location_in_current_global_settings, get_prio_execution_location) to determine and validate execution location based on global settings.
  • Updated configuration flags in settings.py to use MutableBool for SINGLE_FILE_MODE and OFFLOAD_TO_WORKER, and decoupled their logic for clearer control over worker offloading.

Flow Graph and Node Execution Logic

  • Refactored FlowGraph to encapsulate flow_settings with getter/setter, ensuring the graph resets when execution location or mode changes, and updated the handling of execution location in methods and initialization. [1] [2] [3] [4]
  • Updated FlowNode logic to default to "worker" execution, removed checks for SINGLE_FILE_MODE in execution decisions, and improved error handling for remote worker connection issues. [1] [2] [3] [4]

Data Engine and Sampling

  • Refactored FlowDataEngine.get_sample to accept an explicit execution_location parameter, use the new global execution location logic, and simplify how the number of records is determined for sampling.
  • Added get_number_of_records_in_process method to clearly distinguish between local and worker-based record counting, and updated related usages for clarity and correctness. [1] [2] [3]
  • Simplified cross join and record counting logic to align with new execution location handling.

Backward Compatibility and Defaults

  • Changed the default execution location in compatibility enhancements from "auto" to "remote" to match the new allowed values.

Miscellaneous

  • Removed unused imports and parameters related to the old execution location logic, and improved error logging for connection issues. [1] [2] [3] [4]

@Edwardvaneechoud Edwardvaneechoud marked this pull request as ready for review August 23, 2025 09:09
@Edwardvaneechoud Edwardvaneechoud merged commit f30fccb into main Aug 23, 2025
12 checks passed
@Edwardvaneechoud Edwardvaneechoud deleted the feature/unify_execution_methods branch August 23, 2025 11:55
Bennylave pushed a commit to Bennylave/Flowfile that referenced this pull request Aug 26, 2025
* removing auto to improve maintainability

* Ensure the offload per worker is determined per graph and there is no dependency on a global variable.

* Small improvement in logging

* Removing global change in tests

* skipping test in docker
Bennylave pushed a commit to Bennylave/Flowfile that referenced this pull request Aug 26, 2025
* removing auto to improve maintainability

* Ensure the offload per worker is determined per graph and there is no dependency on a global variable.

* Small improvement in logging

* Removing global change in tests

* skipping test in docker
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant