-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When I enable page index filtering incorrect answers result
NOTE that page index filtering is not enabled by default (as we are still working on it) so this issue will not likely affect users:
To Reproduce
- Download data from repro.zip
- Run datafusion CLI:
Expected behavior
Same answer should be produced with and without page index filtering enabled. However, the answers are different
Without page index 15963 rows are produced
(arrow_dev) alamb@MacBook-Pro-8:~/Downloads$ DATAFUSION_EXECUTION_PARQUET_ENABLE_PAGE_INDEX=false datafusion-cli -f script.sql
DataFusion CLI v13.0.0
0 rows in set. Query took 0.001 seconds.
+-------------------------------------------------+---------+
| name | setting |
+-------------------------------------------------+---------+
| datafusion.execution.batch_size | 8192 |
| datafusion.execution.coalesce_batches | true |
| datafusion.execution.coalesce_target_batch_size | 4096 |
| datafusion.execution.parquet.enable_page_index | false |
| datafusion.execution.parquet.pushdown_filters | false |
| datafusion.execution.parquet.reorder_filters | false |
| datafusion.execution.time_zone | UTC |
| datafusion.explain.logical_plan_only | false |
| datafusion.explain.physical_plan_only | false |
| datafusion.optimizer.filter_null_join_keys | false |
| datafusion.optimizer.max_passes | 3 |
| datafusion.optimizer.skip_failed_rules | true |
+-------------------------------------------------+---------+
12 rows in set. Query took 0.001 seconds.
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 53819 |
+-----------------+
1 row in set. Query took 0.002 seconds.
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 15963 |
+-----------------+
1 row in set. Query took 0.002 seconds.WITH page filtering, 0 rows are produced 😱
(arrow_dev) alamb@MacBook-Pro-8:~/Downloads$ DATAFUSION_EXECUTION_PARQUET_ENABLE_PAGE_INDEX=true datafusion-cli -f script.sql
DataFusion CLI v13.0.0
0 rows in set. Query took 0.001 seconds.
+-------------------------------------------------+---------+
| name | setting |
+-------------------------------------------------+---------+
| datafusion.execution.batch_size | 8192 |
| datafusion.execution.coalesce_batches | true |
| datafusion.execution.coalesce_target_batch_size | 4096 |
| datafusion.execution.parquet.enable_page_index | true |
| datafusion.execution.parquet.pushdown_filters | false |
| datafusion.execution.parquet.reorder_filters | false |
| datafusion.execution.time_zone | UTC |
| datafusion.explain.logical_plan_only | false |
| datafusion.explain.physical_plan_only | false |
| datafusion.optimizer.filter_null_join_keys | false |
| datafusion.optimizer.max_passes | 3 |
| datafusion.optimizer.skip_failed_rules | true |
+-------------------------------------------------+---------+
12 rows in set. Query took 0.001 seconds.
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 53819 |
+-----------------+
1 row in set. Query took 0.002 seconds.
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 0 |
+-----------------+
1 row in set. Query took 0.002 seconds.Additional context
I found this issue and reproducer while working on the integration test #3976
I suspect @Ted-Jiang is already working on this issue
Ted-Jiang
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working