Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Sep 3, 2025

This PR simplifies try_swapping_with_projection by:

  • Removing the need to copy over bits of one DataSourceExec to another in DataSourceExec:: try_swapping_with_projection
  • Removes the abstraction leakage where a DataSource creates a DataSourceExec (which in turn references the DataSource). Now DataSource doesn't know anything about DataSourceExec or ProjectionExec/ExecutionPlan.
  • Is overall less LOC and in my opinion less complexity.

@github-actions github-actions bot added the datasource Changes to the datasource crate label Sep 3, 2025
@adriangb adriangb changed the title DRAFT: refactor DataSourceExec::try_swapping_with_projection to simplify and remove abstraction leakage Refactor DataSourceExec::try_swapping_with_projection to simplify and remove abstraction leakage Sep 3, 2025
Comment on lines -331 to -346
// Project the equivalence properties to the new schema
let projected_eq_properties = self
.cache
.eq_properties
.project(&projection_mapping, new_data_source_exec.schema());

let preserved_exec = DataSourceExec {
data_source: Arc::clone(&new_data_source_exec.data_source),
cache: PlanProperties::new(
projected_eq_properties,
new_data_source_exec.cache.partitioning.clone(),
new_data_source_exec.cache.emission_type,
new_data_source_exec.cache.boundedness,
)
.with_scheduling_type(new_data_source_exec.cache.scheduling_type),
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get to remove all of this complexity now 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for anyone wondering why: previously we had to store cache information because it was lost on projection (#17077).

Now FileScanConfig::eq_properties works correctly (fixed by Adrian in #17323), so when passed to DataSourceExec, the new information is used

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Sep 3, 2025
@adriangb adriangb requested a review from blaginin September 3, 2025 20:35
.transpose()
.transpose()?;

Ok(res)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@blaginin blaginin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much cleaner, thank you for that!

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 3, 2025
@adriangb
Copy link
Contributor Author

adriangb commented Sep 3, 2025

Thanks for the review @blaginin ! I also added a note to the upgrade guide.

}
}

pub type ProjectionExpr = (Arc<dyn PhysicalExpr>, String);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend a future PR transform this into a struct with named fields. That will be a breaking change so I am not going to do it in this PR, it should be done in isolation.

@adriangb adriangb merged commit e6c4f0d into apache:main Sep 4, 2025
29 checks passed
@adriangb adriangb deleted the simplify-push-down-expr branch September 4, 2025 00:06
destrex271 pushed a commit to destrex271/datafusion that referenced this pull request Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate documentation Improvements or additions to documentation physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants