-
Notifications
You must be signed in to change notification settings - Fork 58
perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
shuoweil
wants to merge
15
commits into
main
Choose a base branch
from
shuowei-anywidget-remove-len-call
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
31d0b35
remove expensive len() call
shuoweil f7eca6b
add testcase
shuoweil 6827775
fix a typo
shuoweil c5e2baf
change how row_count is updated
shuoweil 77915f3
testcase stil fails, need to merged in 1888
shuoweil 8056b47
update the method of using PandasBatches.total_rows
shuoweil d1a0c44
change tests in read_gbq_colab
shuoweil cba67a0
polish comment
shuoweil 61752cc
fix a test
shuoweil 01300e9
change code and update more testcase
shuoweil c31b112
remove unneeded except
shuoweil 156c5ba
add assert for total_rows
shuoweil 6b87339
get actual row_counts
shuoweil b442fce
avoid two query calls
shuoweil 0caaa52
remove double query when display widget
shuoweil File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ | |
import pandas as pd | ||
|
||
import bigframes | ||
import bigframes.core.blocks | ||
import bigframes.display.html | ||
|
||
# anywidget and traitlets are optional dependencies. We don't want the import of this | ||
|
@@ -45,8 +46,10 @@ | |
|
||
|
||
class TableWidget(WIDGET_BASE): | ||
""" | ||
An interactive, paginated table widget for BigFrames DataFrames. | ||
"""An interactive, paginated table widget for BigFrames DataFrames. | ||
|
||
This widget provides a user-friendly way to display and navigate through | ||
large BigQuery DataFrames within a Jupyter environment. | ||
""" | ||
|
||
def __init__(self, dataframe: bigframes.dataframe.DataFrame): | ||
|
@@ -60,32 +63,37 @@ def __init__(self, dataframe: bigframes.dataframe.DataFrame): | |
"Please `pip install anywidget traitlets` or `pip install 'bigframes[anywidget]'` to use TableWidget." | ||
) | ||
|
||
super().__init__() | ||
self._dataframe = dataframe | ||
self._initializing = True | ||
super().__init__() | ||
|
||
# Initialize attributes that might be needed by observers FIRST | ||
# Initialize attributes that might be needed by observers first | ||
self._table_id = str(uuid.uuid4()) | ||
self._all_data_loaded = False | ||
self._batch_iter: Optional[Iterator[pd.DataFrame]] = None | ||
self._cached_batches: List[pd.DataFrame] = [] | ||
|
||
# respect display options for initial page size | ||
# Respect display options for initial page size | ||
initial_page_size = bigframes.options.display.max_rows | ||
|
||
# Initialize data fetching attributes. | ||
self._batches = dataframe.to_pandas_batches(page_size=initial_page_size) | ||
execute_result = dataframe._block.session._executor.execute( | ||
dataframe._block.expr, | ||
ordered=True, | ||
use_explicit_destination=True, | ||
) | ||
|
||
# set traitlets properties that trigger observers | ||
self.page_size = initial_page_size | ||
# The query issued by `to_pandas_batches()` already contains metadata | ||
# about how many results there were. Use that to avoid doing an extra | ||
# COUNT(*) query that `len(...)` would do. | ||
self.row_count = execute_result.total_rows or 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you switch it to use the execute result instead of the PandasBatches object returned by |
||
|
||
# len(dataframe) is expensive, since it will trigger a | ||
# SELECT COUNT(*) query. It is a must have however. | ||
# TODO(b/428238610): Start iterating over the result of `to_pandas_batches()` | ||
# before we get here so that the count might already be cached. | ||
self.row_count = len(dataframe) | ||
shuoweil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Create pandas batches from the ExecuteResult | ||
self._batches = execute_result.to_pandas_batches(page_size=initial_page_size) | ||
|
||
self.page_size = initial_page_size | ||
|
||
# get the initial page | ||
self._set_table_html() | ||
self._initializing = False | ||
|
||
@functools.cached_property | ||
def _esm(self): | ||
|
@@ -167,8 +175,7 @@ def _get_next_batch(self) -> bool: | |
@property | ||
def _batch_iterator(self) -> Iterator[pd.DataFrame]: | ||
"""Lazily initializes and returns the batch iterator.""" | ||
if self._batch_iter is None: | ||
self._batch_iter = iter(self._batches) | ||
self._batch_iter = iter(self._batches) | ||
return self._batch_iter | ||
|
||
@property | ||
|
@@ -180,7 +187,16 @@ def _cached_data(self) -> pd.DataFrame: | |
|
||
def _reset_batches_for_new_page_size(self): | ||
"""Reset the batch iterator when page size changes.""" | ||
self._batches = self._dataframe.to_pandas_batches(page_size=self.page_size) | ||
# Execute with explicit destination for consistency with __init__ | ||
execute_result = self._dataframe._block.session._executor.execute( | ||
self._dataframe._block.expr, | ||
ordered=True, | ||
use_explicit_destination=True, | ||
) | ||
|
||
# Create pandas batches from the ExecuteResult | ||
self._batches = execute_result.to_pandas_batches(page_size=self.page_size) | ||
|
||
self._cached_batches = [] | ||
self._batch_iter = None | ||
self._all_data_loaded = False | ||
|
@@ -210,11 +226,15 @@ def _set_table_html(self): | |
@traitlets.observe("page") | ||
def _page_changed(self, _change: Dict[str, Any]): | ||
"""Handler for when the page number is changed from the frontend.""" | ||
if self._initializing: | ||
return | ||
self._set_table_html() | ||
|
||
@traitlets.observe("page_size") | ||
def _page_size_changed(self, _change: Dict[str, Any]): | ||
"""Handler for when the page size is changed from the frontend.""" | ||
if self._initializing: | ||
return | ||
# Reset the page to 0 when page size changes to avoid invalid page states | ||
self.page = 0 | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to use an explicit destination here. This would result in creating a BigQuery job every time, which is not desirable. We want to allow the faster job optional code paths.