-
Notifications
You must be signed in to change notification settings - Fork 392
[Append Scan] Introduce an AbstractTableScan with default methods
#2230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Append Scan] Introduce an AbstractTableScan with default methods
#2230
Conversation
|
|
||
|
|
||
| S = TypeVar("S", bound="TableScan", covariant=True) | ||
| A = TypeVar("A", bound="AbstractTableScan", covariant=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub's diff here got messed up - TableScan and S remains, just below.
|
|
||
| return type(self)(**{**kwargs, **overrides}) | ||
|
|
||
| def to_pandas(self, **kwargs: Any) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting this change. I've added default implementations based on to_arrow() in this class, whereas before previously these methods were abstract on TableScan.
This technically changes the TableScan class in a user-facing way because it now has default implementations, but this felt fine to me. Subclasses can still override these methods if they wish.
The motivation here was to reduce duplication between DataScan and IncrementalAppendScan - by introducing these default implementations in the superclass, all table scans get those defaults for free.
| """ | ||
| return self.to_arrow().to_pandas(**kwargs) | ||
|
|
||
| def to_duckdb(self, table_name: str, connection: Optional[DuckDBPyConnection] = None) -> DuckDBPyConnection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting this also - to_duckdb and to_ray were previously only present on DataScan, not even on TableScan. With the change here moving them to AbstractTableScan, TableScan now has these additional methods. I felt like this was fine. Curious for reviewers' thoughts!
(Again, the motivation here was to reduce duplication)
|
|
||
| return current_schema.select(*self.selected_fields, case_sensitive=self.case_sensitive) | ||
|
|
||
| @abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointing out that these removed methods have just been moved to AbstractTableScan, so they still exist on TableScan that subclasses it. The goal of this PR is reorganising while not introducing a breaking change.
| S = TypeVar("S", bound="TableScan", covariant=True) | ||
|
|
||
|
|
||
| class TableScan(AbstractTableScan, ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Methods on this class like use_ref and snapshot don't translate nicely into incremental scans. I understand that the motivation behind this class was an abstract class for table scans, like Java has, but I felt like a less restrictive superclass is better for us, hence AbstractTableScan.
|
@Fokko @kevinjqliu, as 0.10 has gone out now wondering if there's time to return to this work now? |
Rationale for this change
Split up from incremental append scan work - see #2031 (comment). PyIceberg doesn't support incremental reading of appended data between snapshots, like Spark does.
This PR introduces an
AbstractTableScanto reduce duplication between table scans and provide a superclass not specific to reads of a single snapshot.Are these changes tested?
N/A
Are there any user-facing changes?
Yes, see #2230 (comment) / #2230 (comment).