Upgrade to arrow 56.1.0 #17275

alamb · 2025-08-21T18:42:00Z

Which issue does this PR close?

Related to Release arrow-rs / parquet Minor version 56.1.0 (August 2025) arrow-rs#7837

Rationale for this change

Upgrade to the latest arrow release

What changes are included in this PR?

Upgrade to 56.1.0 (preview in Prepare for 56.1.0 release arrow-rs#8202)
Update to remove deprecated APIs
Add new Parquet option to control the size of the predicate cache
Hook up new ArrowReaderMetrics to DataFusion's parquet metrics

Are these changes tested?

Functionally By CI
I will also run benchmarks against this branch

Follow on Issues:

Potential performance regression with parquet 56.1.0 / data ranges #17575

Are there any user-facing changes?

alamb · 2025-08-21T19:07:06Z

datafusion-cli/src/main.rs

        +-----------------------------------+-----------------+---------------------+------+------------------+
        | alltypes_plain.parquet            | 1851            | 10181               | 2    | page_index=false |
-        | alltypes_tiny_pages.parquet       | 454233          | 881634              | 2    | page_index=true  |
+        | alltypes_tiny_pages.parquet       | 454233          | 881418              | 2    | page_index=true  |


I don't really know why the in-memory size of the ParquetMetadata has decreased, but it seems like a good improvement to me

alamb · 2025-08-21T19:07:47Z

datafusion/datasource-parquet/src/opener.rs

            .unwrap_or_else(|e| e.as_ref().clone());
-        let mut reader =
-            ParquetMetaDataReader::new_with_metadata(m).with_page_indexes(true);
+        let mut reader = ParquetMetaDataReader::new_with_metadata(m)


Due to this change from @kczimm

Optionally read parquet page indexes arrow-rs#8070

alamb · 2025-08-21T19:08:34Z

Cargo.toml

-datafusion-spark = { path = "datafusion/spark", version = "49.0.0" }
-datafusion-sql = { path = "datafusion/sql", version = "49.0.0" }
-datafusion-substrait = { path = "datafusion/substrait", version = "49.0.0" }
+datafusion = { path = "datafusion/core", version = "49.0.1", default-features = false }


drive by change to update all versions in Cargo.toml to the latest

alamb · 2025-08-21T19:12:00Z

datafusion/sqllogictest/test_files/explain_tree.slt

 12)│       DataSourceExec      ││       DataSourceExec      │
 13)│    --------------------   ││    --------------------   │
-14)│        bytes: 6040        ││        bytes: 6040        │
+14)│        bytes: 5932        ││        bytes: 5932        │


I believe the in memory size may have improved due to

Use Vec directly in builders arrow-rs#7984

And the Vec doesn't have the same minimum alignment / size that the builders had

alamb · 2025-08-21T19:12:45Z

datafusion/physical-plan/src/spill/mod.rs


        let size = get_record_batch_memory_size(&batch);
-        assert_eq!(size, 8320);
+        assert_eq!(size, 8208);


Also due to Use Vec directly in builders arrow-rs#7984

alamb · 2025-08-21T21:16:29Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff using: clickbench_pushdown
Results will be posted here when complete

alamb · 2025-08-21T21:46:38Z

🤖: Benchmark completed

Details

Comparing HEAD and alamb_update_arrow
--------------------
Benchmark clickbench_pushdown.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.12 ms │            2.16 ms │     no change │
│ QQuery 1     │    53.64 ms │           53.34 ms │     no change │
│ QQuery 2     │   137.06 ms │          139.44 ms │     no change │
│ QQuery 3     │   162.87 ms │          166.12 ms │     no change │
│ QQuery 4     │  1064.08 ms │         1032.53 ms │     no change │
│ QQuery 5     │  1486.94 ms │         1487.45 ms │     no change │
│ QQuery 6     │     2.17 ms │            2.15 ms │     no change │
│ QQuery 7     │    76.67 ms │           73.51 ms │     no change │
│ QQuery 8     │  1433.99 ms │         1457.77 ms │     no change │
│ QQuery 9     │  1795.39 ms │         1791.48 ms │     no change │
│ QQuery 10    │   420.27 ms │          488.87 ms │  1.16x slower │
│ QQuery 11    │   500.34 ms │          555.10 ms │  1.11x slower │
│ QQuery 12    │  1789.42 ms │         1521.70 ms │ +1.18x faster │
│ QQuery 13    │  2698.81 ms │         2426.41 ms │ +1.11x faster │
│ QQuery 14    │  1898.26 ms │         1644.37 ms │ +1.15x faster │
│ QQuery 15    │  1211.08 ms │         1179.20 ms │     no change │
│ QQuery 16    │  2643.67 ms │         2615.41 ms │     no change │
│ QQuery 17    │  2617.48 ms │         2620.64 ms │     no change │
│ QQuery 18    │  5366.43 ms │         4887.05 ms │ +1.10x faster │
│ QQuery 19    │   125.17 ms │          149.12 ms │  1.19x slower │
│ QQuery 20    │  2109.40 ms │         1932.01 ms │ +1.09x faster │
│ QQuery 21    │  2442.36 ms │         2322.44 ms │     no change │
│ QQuery 22    │  5457.70 ms │         4063.91 ms │ +1.34x faster │
│ QQuery 23    │  2056.23 ms │         1470.65 ms │ +1.40x faster │
│ QQuery 24    │   291.14 ms │          252.50 ms │ +1.15x faster │
│ QQuery 25    │  1032.66 ms │          649.43 ms │ +1.59x faster │
│ QQuery 26    │   549.09 ms │          380.68 ms │ +1.44x faster │
│ QQuery 27    │  4127.01 ms │         2982.51 ms │ +1.38x faster │
│ QQuery 28    │ 26766.22 ms │        24180.99 ms │ +1.11x faster │
│ QQuery 29    │   971.85 ms │          956.54 ms │     no change │
│ QQuery 30    │  2164.88 ms │         2106.25 ms │     no change │
│ QQuery 31    │  2079.48 ms │         2061.44 ms │     no change │
│ QQuery 32    │  4410.45 ms │         4578.39 ms │     no change │
│ QQuery 33    │  5717.87 ms │         5584.02 ms │     no change │
│ QQuery 34    │  5719.25 ms │         5811.24 ms │     no change │
│ QQuery 35    │  1978.73 ms │         1989.30 ms │     no change │
│ QQuery 36    │    26.85 ms │           26.37 ms │     no change │
│ QQuery 37    │    25.78 ms │           26.11 ms │     no change │
│ QQuery 38    │    25.72 ms │           25.21 ms │     no change │
│ QQuery 39    │    25.86 ms │           25.17 ms │     no change │
│ QQuery 40    │    26.79 ms │           26.81 ms │     no change │
│ QQuery 41    │    26.15 ms │           25.77 ms │     no change │
│ QQuery 42    │    25.53 ms │           25.14 ms │     no change │
└──────────────┴─────────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 93542.84ms │
│ Total Time (alamb_update_arrow)   │ 85796.69ms │
│ Average Time (HEAD)               │  2175.41ms │
│ Average Time (alamb_update_arrow) │  1995.27ms │
│ Queries Faster                    │         12 │
│ Queries Slower                    │          3 │
│ Queries with No Change            │         28 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘

alamb · 2025-08-21T21:46:41Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff using: tpch_topk
Results will be posted here when complete

alamb · 2025-08-21T21:46:44Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_update_arrow
Results will be posted here when complete

alamb · 2025-08-21T22:46:44Z

🤖: Benchmark completed

Details

group                                         alamb_update_arrow                     main
-----                                         ------------------                     ----
logical_aggregate_with_join                   1.00    625.5±9.69µs        ? ?/sec    1.02    638.9±5.74µs        ? ?/sec
logical_select_all_from_1000                  1.01     11.4±0.08ms        ? ?/sec    1.00     11.2±0.05ms        ? ?/sec
logical_select_one_from_700                   1.00    410.3±2.69µs        ? ?/sec    1.03    422.5±1.78µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    369.5±1.82µs        ? ?/sec    1.03    380.9±3.30µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    353.6±1.68µs        ? ?/sec    1.03    365.4±3.88µs        ? ?/sec
physical_intersection                         1.00    825.2±3.33µs        ? ?/sec    1.02    839.6±4.30µs        ? ?/sec
physical_join_consider_sort                   1.00  1372.4±11.75µs        ? ?/sec    1.03  1408.5±14.20µs        ? ?/sec
physical_join_distinct                        1.00    344.1±0.94µs        ? ?/sec    1.03    356.1±4.54µs        ? ?/sec
physical_many_self_joins                      1.00     10.0±0.03ms        ? ?/sec    1.05     10.5±0.03ms        ? ?/sec
physical_plan_clickbench_all                  1.00    188.9±2.79ms        ? ?/sec    1.01    190.0±2.58ms        ? ?/sec
physical_plan_clickbench_q1                   1.00      2.5±0.02ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_clickbench_q10                  1.03      3.5±0.20ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
physical_plan_clickbench_q11                  1.01      3.6±0.06ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
physical_plan_clickbench_q12                  1.01      3.8±0.05ms        ? ?/sec    1.00      3.7±0.03ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      3.4±0.03ms        ? ?/sec    1.00      3.4±0.10ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      3.6±0.05ms        ? ?/sec    1.00      3.6±0.08ms        ? ?/sec
physical_plan_clickbench_q15                  1.02      3.5±0.05ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      3.3±0.03ms        ? ?/sec    1.01      3.3±0.03ms        ? ?/sec
physical_plan_clickbench_q17                  1.01      3.4±0.03ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
physical_plan_clickbench_q18                  1.02      3.0±0.02ms        ? ?/sec    1.00      2.9±0.04ms        ? ?/sec
physical_plan_clickbench_q19                  1.00      3.8±0.03ms        ? ?/sec    1.00      3.8±0.03ms        ? ?/sec
physical_plan_clickbench_q2                   1.01      3.0±0.03ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q20                  1.00      2.6±0.02ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
physical_plan_clickbench_q21                  1.00      3.0±0.03ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_clickbench_q22                  1.00      3.6±0.03ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      3.9±0.04ms        ? ?/sec    1.01      3.9±0.02ms        ? ?/sec
physical_plan_clickbench_q24                  1.01      4.4±0.05ms        ? ?/sec    1.00      4.4±0.03ms        ? ?/sec
physical_plan_clickbench_q25                  1.01      3.2±0.03ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_clickbench_q26                  1.01      3.0±0.03ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
physical_plan_clickbench_q27                  1.01      3.2±0.04ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
physical_plan_clickbench_q28                  1.00      3.9±0.03ms        ? ?/sec    1.00      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q29                  1.01      4.7±0.08ms        ? ?/sec    1.00      4.6±0.05ms        ? ?/sec
physical_plan_clickbench_q3                   1.00      2.9±0.03ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q30                  1.01     13.2±0.25ms        ? ?/sec    1.00     13.2±0.09ms        ? ?/sec
physical_plan_clickbench_q31                  1.00      3.9±0.04ms        ? ?/sec    1.00      3.9±0.03ms        ? ?/sec
physical_plan_clickbench_q32                  1.00      3.9±0.05ms        ? ?/sec    1.01      3.9±0.04ms        ? ?/sec
physical_plan_clickbench_q33                  1.00      3.4±0.03ms        ? ?/sec    1.01      3.4±0.04ms        ? ?/sec
physical_plan_clickbench_q34                  1.01      3.1±0.05ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_clickbench_q35                  1.00      3.2±0.02ms        ? ?/sec    1.00      3.2±0.03ms        ? ?/sec
physical_plan_clickbench_q36                  1.01      3.9±0.06ms        ? ?/sec    1.00      3.9±0.03ms        ? ?/sec
physical_plan_clickbench_q37                  1.00      3.9±0.04ms        ? ?/sec    1.03      4.0±0.10ms        ? ?/sec
physical_plan_clickbench_q38                  1.00      3.9±0.03ms        ? ?/sec    1.01      4.0±0.06ms        ? ?/sec
physical_plan_clickbench_q39                  1.00      3.7±0.04ms        ? ?/sec    1.01      3.8±0.09ms        ? ?/sec
physical_plan_clickbench_q4                   1.00      2.6±0.03ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
physical_plan_clickbench_q40                  1.00      4.4±0.03ms        ? ?/sec    1.02      4.4±0.07ms        ? ?/sec
physical_plan_clickbench_q41                  1.00      4.0±0.06ms        ? ?/sec    1.01      4.0±0.04ms        ? ?/sec
physical_plan_clickbench_q42                  1.00      3.9±0.05ms        ? ?/sec    1.01      3.9±0.07ms        ? ?/sec
physical_plan_clickbench_q43                  1.01      4.2±0.07ms        ? ?/sec    1.00      4.2±0.04ms        ? ?/sec
physical_plan_clickbench_q44                  1.00      2.7±0.04ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_clickbench_q45                  1.00      2.7±0.03ms        ? ?/sec    1.02      2.8±0.05ms        ? ?/sec
physical_plan_clickbench_q46                  1.00      3.2±0.02ms        ? ?/sec    1.01      3.2±0.04ms        ? ?/sec
physical_plan_clickbench_q47                  1.00      3.8±0.05ms        ? ?/sec    1.01      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q48                  1.00      4.5±0.07ms        ? ?/sec    1.00      4.5±0.07ms        ? ?/sec
physical_plan_clickbench_q49                  1.01      4.8±0.09ms        ? ?/sec    1.00      4.8±0.08ms        ? ?/sec
physical_plan_clickbench_q5                   1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_clickbench_q50                  1.00      4.2±0.04ms        ? ?/sec    1.01      4.3±0.06ms        ? ?/sec
physical_plan_clickbench_q51                  1.00      3.3±0.03ms        ? ?/sec    1.00      3.3±0.04ms        ? ?/sec
physical_plan_clickbench_q6                   1.01      2.8±0.03ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q7                   1.02      2.6±0.02ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
physical_plan_clickbench_q8                   1.01      3.5±0.05ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
physical_plan_clickbench_q9                   1.01      3.3±0.02ms        ? ?/sec    1.00      3.2±0.02ms        ? ?/sec
physical_plan_tpcds_all                       1.00   1020.7±4.89ms        ? ?/sec    1.00   1016.2±4.29ms        ? ?/sec
physical_plan_tpch_all                        1.00     62.1±0.18ms        ? ?/sec    1.00     61.8±0.27ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.0±0.03ms        ? ?/sec    1.00      2.0±0.01ms        ? ?/sec
physical_plan_tpch_q10                        1.01      3.8±0.03ms        ? ?/sec    1.00      3.8±0.02ms        ? ?/sec
physical_plan_tpch_q11                        1.01      3.3±0.01ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q12                        1.00  1811.5±11.50µs        ? ?/sec    1.00  1811.6±19.56µs        ? ?/sec
physical_plan_tpch_q13                        1.00   1446.9±7.19µs        ? ?/sec    1.00   1442.2±8.85µs        ? ?/sec
physical_plan_tpch_q14                        1.00  1955.0±12.84µs        ? ?/sec    1.00  1952.8±11.32µs        ? ?/sec
physical_plan_tpch_q16                        1.02      2.5±0.06ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
physical_plan_tpch_q17                        1.01      2.4±0.05ms        ? ?/sec    1.00      2.4±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.00      2.7±0.00ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
physical_plan_tpch_q19                        1.01      3.2±0.04ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
physical_plan_tpch_q2                         1.00      5.5±0.06ms        ? ?/sec    1.00      5.5±0.01ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.1±0.00ms        ? ?/sec    1.01      3.1±0.06ms        ? ?/sec
physical_plan_tpch_q21                        1.00      4.1±0.01ms        ? ?/sec    1.00      4.1±0.06ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.7±0.02ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.5±0.01ms        ? ?/sec    1.01      2.6±0.00ms        ? ?/sec
physical_plan_tpch_q4                         1.00   1501.8±2.72µs        ? ?/sec    1.01   1519.2±5.89µs        ? ?/sec
physical_plan_tpch_q5                         1.00      3.1±0.01ms        ? ?/sec    1.00      3.1±0.01ms        ? ?/sec
physical_plan_tpch_q6                         1.01   868.0±11.63µs        ? ?/sec    1.00    863.3±3.82µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.3±0.01ms        ? ?/sec    1.01      4.3±0.09ms        ? ?/sec
physical_plan_tpch_q8                         1.01      5.1±0.01ms        ? ?/sec    1.00      5.1±0.01ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.1±0.01ms        ? ?/sec    1.00      4.1±0.01ms        ? ?/sec
physical_select_aggregates_from_200           1.01     16.8±0.06ms        ? ?/sec    1.00     16.7±0.03ms        ? ?/sec
physical_select_all_from_1000                 1.00     24.7±0.15ms        ? ?/sec    1.00     24.7±0.07ms        ? ?/sec
physical_select_one_from_700                  1.00   1053.0±5.55µs        ? ?/sec    1.06  1116.6±11.98µs        ? ?/sec
physical_sorted_union_orderby                 1.00     41.2±0.13ms        ? ?/sec    1.01     41.4±0.13ms        ? ?/sec
physical_theta_join_consider_sort             1.00  1744.8±74.52µs        ? ?/sec    1.01  1770.8±13.60µs        ? ?/sec
physical_unnest_to_join                       1.00   1289.4±3.11µs        ? ?/sec    1.02   1317.8±6.54µs        ? ?/sec
with_param_values_many_columns                1.00    142.8±1.14µs        ? ?/sec    1.00    143.4±1.66µs        ? ?/sec

alamb · 2025-08-21T22:46:47Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb · 2025-08-21T23:39:42Z

🤖: Benchmark completed

Details

Comparing HEAD and alamb_update_arrow
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2665.48 ms │         2563.47 ms │ no change │
│ QQuery 1     │  1314.17 ms │         1287.20 ms │ no change │
│ QQuery 2     │  2542.19 ms │         2476.30 ms │ no change │
│ QQuery 3     │  1165.07 ms │         1191.78 ms │ no change │
│ QQuery 4     │  2216.35 ms │         2193.16 ms │ no change │
│ QQuery 5     │ 27207.42 ms │        27046.80 ms │ no change │
│ QQuery 6     │  4248.15 ms │         4133.04 ms │ no change │
│ QQuery 7     │  3311.04 ms │         3321.23 ms │ no change │
└──────────────┴─────────────┴────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 44669.87ms │
│ Total Time (alamb_update_arrow)   │ 44212.98ms │
│ Average Time (HEAD)               │  5583.73ms │
│ Average Time (alamb_update_arrow) │  5526.62ms │
│ Queries Faster                    │          0 │
│ Queries Slower                    │          0 │
│ Queries with No Change            │          8 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.04 ms │            2.17 ms │  1.06x slower │
│ QQuery 1     │    49.04 ms │           48.92 ms │     no change │
│ QQuery 2     │   134.51 ms │          136.80 ms │     no change │
│ QQuery 3     │   153.85 ms │          167.46 ms │  1.09x slower │
│ QQuery 4     │   997.19 ms │         1019.02 ms │     no change │
│ QQuery 5     │  1496.89 ms │         1452.65 ms │     no change │
│ QQuery 6     │     2.10 ms │            2.09 ms │     no change │
│ QQuery 7     │    53.44 ms │           54.92 ms │     no change │
│ QQuery 8     │  1433.45 ms │         1439.05 ms │     no change │
│ QQuery 9     │  1815.52 ms │         1771.95 ms │     no change │
│ QQuery 10    │   398.01 ms │          375.42 ms │ +1.06x faster │
│ QQuery 11    │   452.48 ms │          423.68 ms │ +1.07x faster │
│ QQuery 12    │  1363.05 ms │         1349.15 ms │     no change │
│ QQuery 13    │  2141.34 ms │         2105.27 ms │     no change │
│ QQuery 14    │  1279.64 ms │         1225.10 ms │     no change │
│ QQuery 15    │  1166.89 ms │         1152.90 ms │     no change │
│ QQuery 16    │  2584.06 ms │         2609.31 ms │     no change │
│ QQuery 17    │  2565.90 ms │         2633.82 ms │     no change │
│ QQuery 18    │  4863.11 ms │         4815.32 ms │     no change │
│ QQuery 19    │   126.29 ms │          125.43 ms │     no change │
│ QQuery 20    │  2027.63 ms │         1948.73 ms │     no change │
│ QQuery 21    │  2346.63 ms │         2269.79 ms │     no change │
│ QQuery 22    │  4015.64 ms │         3892.22 ms │     no change │
│ QQuery 23    │ 14288.53 ms │        13655.02 ms │     no change │
│ QQuery 24    │   275.52 ms │          244.57 ms │ +1.13x faster │
│ QQuery 25    │   534.68 ms │          501.65 ms │ +1.07x faster │
│ QQuery 26    │   284.20 ms │          244.40 ms │ +1.16x faster │
│ QQuery 27    │  2855.71 ms │         2775.80 ms │     no change │
│ QQuery 28    │ 24690.09 ms │        22698.76 ms │ +1.09x faster │
│ QQuery 29    │   974.05 ms │          971.38 ms │     no change │
│ QQuery 30    │  1347.63 ms │         1284.08 ms │     no change │
│ QQuery 31    │  1344.82 ms │         1302.20 ms │     no change │
│ QQuery 32    │  4207.96 ms │         4407.11 ms │     no change │
│ QQuery 33    │  5550.52 ms │         5492.29 ms │     no change │
│ QQuery 34    │  5776.52 ms │         5820.94 ms │     no change │
│ QQuery 35    │  1970.81 ms │         2010.33 ms │     no change │
│ QQuery 36    │   124.23 ms │          120.61 ms │     no change │
│ QQuery 37    │    53.64 ms │           55.53 ms │     no change │
│ QQuery 38    │   119.37 ms │          121.20 ms │     no change │
│ QQuery 39    │   197.41 ms │          193.47 ms │     no change │
│ QQuery 40    │    41.85 ms │           41.69 ms │     no change │
│ QQuery 41    │    39.44 ms │           40.15 ms │     no change │
│ QQuery 42    │    32.61 ms │           31.27 ms │     no change │
└──────────────┴─────────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 96178.30ms │
│ Total Time (alamb_update_arrow)   │ 93033.61ms │
│ Average Time (HEAD)               │  2236.70ms │
│ Average Time (alamb_update_arrow) │  2163.57ms │
│ Queries Faster                    │          6 │
│ Queries Slower                    │          2 │
│ Queries with No Change            │         35 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ alamb_update_arrow ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 170.34 ms │          171.29 ms │ no change │
│ QQuery 2     │  25.47 ms │           26.72 ms │ no change │
│ QQuery 3     │  44.85 ms │           43.89 ms │ no change │
│ QQuery 4     │  26.37 ms │           26.51 ms │ no change │
│ QQuery 5     │  72.01 ms │           72.53 ms │ no change │
│ QQuery 6     │  19.48 ms │           19.69 ms │ no change │
│ QQuery 7     │ 144.55 ms │          143.25 ms │ no change │
│ QQuery 8     │  32.47 ms │           33.31 ms │ no change │
│ QQuery 9     │  83.75 ms │           82.22 ms │ no change │
│ QQuery 10    │  59.78 ms │           58.36 ms │ no change │
│ QQuery 11    │  40.66 ms │           41.71 ms │ no change │
│ QQuery 12    │  50.52 ms │           51.92 ms │ no change │
│ QQuery 13    │  44.99 ms │           45.74 ms │ no change │
│ QQuery 14    │  12.98 ms │           13.18 ms │ no change │
│ QQuery 15    │  23.66 ms │           24.09 ms │ no change │
│ QQuery 16    │  24.28 ms │           23.39 ms │ no change │
│ QQuery 17    │ 142.57 ms │          145.82 ms │ no change │
│ QQuery 18    │ 316.30 ms │          322.67 ms │ no change │
│ QQuery 19    │  36.37 ms │           35.75 ms │ no change │
│ QQuery 20    │  47.57 ms │           47.76 ms │ no change │
│ QQuery 21    │ 220.02 ms │          218.34 ms │ no change │
│ QQuery 22    │  19.60 ms │           18.80 ms │ no change │
└──────────────┴───────────┴────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 1658.59ms │
│ Total Time (alamb_update_arrow)   │ 1666.95ms │
│ Average Time (HEAD)               │   75.39ms │
│ Average Time (alamb_update_arrow) │   75.77ms │
│ Queries Faster                    │         0 │
│ Queries Slower                    │         0 │
│ Queries with No Change            │        22 │
│ Queries with Failure              │         0 │
└───────────────────────────────────┴───────────┘

alamb · 2025-08-22T13:12:59Z

Comparing HEAD and alamb_update_arrow

Benchmark clickbench_pushdown.json

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ HEAD ┃ alamb_update_arrow ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 10 │ 420.27 ms │ 488.87 ms │ 1.16x slower │
│ QQuery 11 │ 500.34 ms │ 555.10 ms │ 1.11x slower │
│ QQuery 12 │ 1789.42 ms │ 1521.70 ms │ +1.18x faster │
│ QQuery 13 │ 2698.81 ms │ 2426.41 ms │ +1.11x faster │
│ QQuery 14 │ 1898.26 ms │ 1644.37 ms │ +1.15x faster │
│ QQuery 18 │ 5366.43 ms │ 4887.05 ms │ +1.10x faster │
│ QQuery 19 │ 125.17 ms │ 149.12 ms │ 1.19x slower │
│ QQuery 20 │ 2109.40 ms │ 1932.01 ms │ +1.09x faster │
│ QQuery 22 │ 5457.70 ms │ 4063.91 ms │ +1.34x faster │
│ QQuery 23 │ 2056.23 ms │ 1470.65 ms │ +1.40x faster │
│ QQuery 24 │ 291.14 ms │ 252.50 ms │ +1.15x faster │
│ QQuery 25 │ 1032.66 ms │ 649.43 ms │ +1.59x faster │
│ QQuery 26 │ 549.09 ms │ 380.68 ms │ +1.44x faster │
│ QQuery 27 │ 4127.01 ms │ 2982.51 ms │ +1.38x faster │
│ QQuery 28 │ 26766.22 ms │ 24180.99 ms │ +1.11x faster │

I believe this is directly attributable to the predicate caching @XiangpengHao added in Speed up Parquet filter pushdown with predicate cache arrow-rs#8203

alamb · 2025-08-22T14:11:45Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb · 2025-08-22T15:08:15Z

🤖: Benchmark completed

Details

Comparing HEAD and alamb_update_arrow
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2692.59 ms │         2666.68 ms │ no change │
│ QQuery 1     │  1321.98 ms │         1271.38 ms │ no change │
│ QQuery 2     │  2495.28 ms │         2458.95 ms │ no change │
│ QQuery 3     │  1172.25 ms │         1144.66 ms │ no change │
│ QQuery 4     │  2247.47 ms │         2245.24 ms │ no change │
│ QQuery 5     │ 27484.95 ms │        27616.36 ms │ no change │
│ QQuery 6     │  4282.47 ms │         4134.84 ms │ no change │
│ QQuery 7     │  3709.81 ms │         3637.91 ms │ no change │
└──────────────┴─────────────┴────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 45406.80ms │
│ Total Time (alamb_update_arrow)   │ 45176.03ms │
│ Average Time (HEAD)               │  5675.85ms │
│ Average Time (alamb_update_arrow) │  5647.00ms │
│ Queries Faster                    │          0 │
│ Queries Slower                    │          0 │
│ Queries with No Change            │          8 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.07 ms │            2.17 ms │     no change │
│ QQuery 1     │    50.88 ms │           50.54 ms │     no change │
│ QQuery 2     │   134.35 ms │          135.69 ms │     no change │
│ QQuery 3     │   168.01 ms │          165.49 ms │     no change │
│ QQuery 4     │  1052.98 ms │         1098.17 ms │     no change │
│ QQuery 5     │  1503.44 ms │         1507.13 ms │     no change │
│ QQuery 6     │     2.16 ms │            2.16 ms │     no change │
│ QQuery 7     │    54.93 ms │           54.34 ms │     no change │
│ QQuery 8     │  1417.15 ms │         1511.29 ms │  1.07x slower │
│ QQuery 9     │  1793.97 ms │         1889.36 ms │  1.05x slower │
│ QQuery 10    │   394.83 ms │          377.25 ms │     no change │
│ QQuery 11    │   458.12 ms │          432.90 ms │ +1.06x faster │
│ QQuery 12    │  1332.79 ms │         1432.08 ms │  1.07x slower │
│ QQuery 13    │  2151.43 ms │         2141.78 ms │     no change │
│ QQuery 14    │  1300.96 ms │         1284.70 ms │     no change │
│ QQuery 15    │  1208.29 ms │         1285.02 ms │  1.06x slower │
│ QQuery 16    │  2688.10 ms │         2708.07 ms │     no change │
│ QQuery 17    │  2633.73 ms │         2690.02 ms │     no change │
│ QQuery 18    │  5156.57 ms │         5011.25 ms │     no change │
│ QQuery 19    │   130.35 ms │          125.98 ms │     no change │
│ QQuery 20    │  2067.29 ms │         1934.95 ms │ +1.07x faster │
│ QQuery 21    │  2362.03 ms │         2275.48 ms │     no change │
│ QQuery 22    │  4118.76 ms │         3881.38 ms │ +1.06x faster │
│ QQuery 23    │ 20288.59 ms │        13781.94 ms │ +1.47x faster │
│ QQuery 24    │   265.30 ms │          245.44 ms │ +1.08x faster │
│ QQuery 25    │   529.99 ms │          496.27 ms │ +1.07x faster │
│ QQuery 26    │   279.59 ms │          262.16 ms │ +1.07x faster │
│ QQuery 27    │  2934.27 ms │         2818.28 ms │     no change │
│ QQuery 28    │ 24861.76 ms │        22844.16 ms │ +1.09x faster │
│ QQuery 29    │   972.69 ms │          946.02 ms │     no change │
│ QQuery 30    │  1365.65 ms │         1328.64 ms │     no change │
│ QQuery 31    │  1387.62 ms │         1312.99 ms │ +1.06x faster │
│ QQuery 32    │  4623.95 ms │         4420.68 ms │     no change │
│ QQuery 33    │  5800.89 ms │         5702.78 ms │     no change │
│ QQuery 34    │  5839.49 ms │         5929.93 ms │     no change │
│ QQuery 35    │  2056.08 ms │         2097.38 ms │     no change │
│ QQuery 36    │   120.68 ms │          120.81 ms │     no change │
│ QQuery 37    │    52.93 ms │           54.11 ms │     no change │
│ QQuery 38    │   121.11 ms │          119.85 ms │     no change │
│ QQuery 39    │   200.98 ms │          199.32 ms │     no change │
│ QQuery 40    │    44.11 ms │           45.91 ms │     no change │
│ QQuery 41    │    40.86 ms │           37.86 ms │ +1.08x faster │
│ QQuery 42    │    33.09 ms │           33.24 ms │     no change │
└──────────────┴─────────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 104002.83ms │
│ Total Time (alamb_update_arrow)   │  94794.99ms │
│ Average Time (HEAD)               │   2418.67ms │
│ Average Time (alamb_update_arrow) │   2204.53ms │
│ Queries Faster                    │          10 │
│ Queries Slower                    │           4 │
│ Queries with No Change            │          29 │
│ Queries with Failure              │           0 │
└───────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ alamb_update_arrow ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 167.49 ms │          168.72 ms │ no change │
│ QQuery 2     │  27.33 ms │           26.48 ms │ no change │
│ QQuery 3     │  44.74 ms │           44.68 ms │ no change │
│ QQuery 4     │  26.49 ms │           26.91 ms │ no change │
│ QQuery 5     │  73.09 ms │           73.93 ms │ no change │
│ QQuery 6     │  19.41 ms │           19.79 ms │ no change │
│ QQuery 7     │ 142.35 ms │          140.77 ms │ no change │
│ QQuery 8     │  32.60 ms │           31.96 ms │ no change │
│ QQuery 9     │  82.16 ms │           84.85 ms │ no change │
│ QQuery 10    │  57.53 ms │           57.88 ms │ no change │
│ QQuery 11    │  40.64 ms │           41.08 ms │ no change │
│ QQuery 12    │  50.99 ms │           51.21 ms │ no change │
│ QQuery 13    │  45.86 ms │           45.45 ms │ no change │
│ QQuery 14    │  13.10 ms │           13.49 ms │ no change │
│ QQuery 15    │  23.62 ms │           24.13 ms │ no change │
│ QQuery 16    │  23.29 ms │           23.84 ms │ no change │
│ QQuery 17    │ 143.40 ms │          144.07 ms │ no change │
│ QQuery 18    │ 324.45 ms │          313.69 ms │ no change │
│ QQuery 19    │  36.25 ms │           36.56 ms │ no change │
│ QQuery 20    │  48.43 ms │           48.81 ms │ no change │
│ QQuery 21    │ 220.49 ms │          221.06 ms │ no change │
│ QQuery 22    │  19.30 ms │           18.88 ms │ no change │
└──────────────┴───────────┴────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 1663.01ms │
│ Total Time (alamb_update_arrow)   │ 1658.24ms │
│ Average Time (HEAD)               │   75.59ms │
│ Average Time (alamb_update_arrow) │   75.37ms │
│ Queries Faster                    │         0 │
│ Queries Slower                    │         0 │
│ Queries with No Change            │        22 │
│ Queries with Failure              │         0 │
└───────────────────────────────────┴───────────┘

nuno-faria · 2025-09-03T10:58:42Z

Cargo.toml

 ] }
 unused_qualifications = "deny"
+
+## Temporary arrow-rs patch until 56.1.0 is released


arrow-rs 56.1.0 has been released, so this can now be updated.

alamb · 2025-09-04T15:52:08Z

Thanks @nuno-faria -- I'll try and polish this over the next few days if no one beats me to it

nuno-faria · 2025-09-08T14:38:00Z

I found a potential performance regression with parquet 56.1.0. Now more data pages will be returned if their size is less than the execution batch size. For example:

use datafusion::error::Result;
use datafusion::prelude::{ParquetReadOptions, SessionConfig, SessionContext};

#[tokio::main]
async fn main() -> Result<()> {
    let config = SessionConfig::new().with_target_partitions(1);
    let ctx = SessionContext::new_with_config(config);
    ctx.sql("set datafusion.execution.parquet.pushdown_filters = true")
        .await?
        .collect()
        .await?;

    ctx.sql(
        "
        copy (
            select i as k
            from generate_series(1, 1000000) as t(i)
            order by k
        ) to 't.parquet'
        options (MAX_ROW_GROUP_SIZE 100000, DATA_PAGE_ROW_COUNT_LIMIT 1000, WRITE_BATCH_SIZE 1000, DICTIONARY_ENABLED FALSE);",
    )
    .await?
    .collect()
    .await?;

    ctx.register_parquet("t", "t.parquet", ParquetReadOptions::new())
        .await?;

    ctx.sql("explain analyze select k from t where k = 123456")
        .await?
        .show()
        .await?;

    Ok(())
}

With parquet 56.0.0:

metrics=[..., bytes_scanned=1273, ...]

# some debug info showing that a single page is retrieved
total=1273
ranges=[132974..134247]

With parquet 56.1.0:

metrics=[..., bytes_scanned=9929, ...]

# some debug info showing that multiple pages are retrieved
total=9929
ranges=[125400..126482, 126482..127564, 127564..128646, 128646..129728, 129728..130810, 130810..131892, 131892..132974, 132974..134247, 134247..135329]

I think this is a consequence of apache/arrow-rs#7850, more specifically https://github.com/apache/arrow-rs/blame/0c7cb2ac3f3132216a08fd557f9b1edc7f90060f/parquet/src/arrow/arrow_reader/selection.rs#L445.

alamb · 2025-09-08T20:29:25Z

I saw this @nuno-faria I hope to look at it tomorrow.

alamb · 2025-09-09T17:39:00Z

I found a potential performance regression with parquet 56.1.0. Now more data pages will be returned if their size is less than the execution batch size. For example:

Thanks @nuno-faria -- this is a great find. @XiangpengHao and I purposely added a setting that allows disabling the cache for precisely this reason

So what I think is needed is here is a way to turn this setting off via a DataFusion setting as well, which is what I was trying to say with

. Add new Parquet option to control the size of the predicate cache

Let me give this a try and see if we can get it working better

nuno-faria · 2025-09-09T18:13:02Z

So what I think is needed is here is a way to turn this setting off via a DataFusion setting as well, which is what I was trying to say with

. Add new Parquet option to control the size of the predicate cache

Let me give this a try and see if we can get it working better

Thanks @alamb, a config in datafusion would be ideal.

alamb · 2025-09-09T20:37:07Z

I have one more test to write / fix and then this will be ready. I will get it done tomorrow

alamb · 2025-09-10T11:27:53Z

So what I think is needed is here is a way to turn this setting off via a DataFusion setting as well, which is what I was trying to say with

. Add new Parquet option to control the size of the predicate cache

Let me give this a try and see if we can get it working better

Thanks @alamb, a config in datafusion would be ideal.

@nuno-faria -- I added a config flag

Can you possibly test that if you set

set datafusion.execution.parquet.max_predicate_cache_size = 0

That the I/O goes back to what you it was like in 56.0.0?

alamb · 2025-09-10T12:19:12Z

datafusion/core/tests/parquet/filter_pushdown.rs

+        // final output
+        expected_inner_records: 16,
+        // Expect this to 0 records read as the cache is disabled. However, it is
+        // non zero due to https://github.com/apache/arrow-rs/issues/8307


I did verify that the cache is not being used via the debugger. However, this metric is very confusing. I filed a ticket to track:

[Parquet] predicate cache over reports "cache read" metrics in some cases arrow-rs#8307

nuno-faria · 2025-09-10T13:21:43Z

@nuno-faria -- I added a config flag

Can you possibly test that if you set
set datafusion.execution.parquet.max_predicate_cache_size = 0
That the I/O goes back to what you it was like in 56.0.0?

Thanks. I tried with the latest commit but still see the same behavior.

❯ git log -1
commit 7a6ea93e7b995c216131cc304c34d55c7a2ed528 (HEAD -> alamb/update_arrow)
Author: Andrew Lamb <[email protected]>
Date:   Tue Sep 9 14:24:56 2025 -0400

    Thread through max_predicate_cache_size, add test

Here is a datafusion-cli test:

DataFusion CLI v50.0.0
> set datafusion.execution.parquet.pushdown_filters = true;
0 row(s) fetched.
Elapsed 0.003 seconds.

> set datafusion.execution.parquet.max_predicate_cache_size = 0;
0 row(s) fetched.
Elapsed 0.001 seconds.

> copy (
            select i as k
            from generate_series(1, 1000000) as t(i)
            order by k
        ) to 't.parquet'
        options (MAX_ROW_GROUP_SIZE 100000, DATA_PAGE_ROW_COUNT_LIMIT 1000, WRITE_BATCH_SIZE 1000, DICTIONARY_ENABLED FALSE);
+---------+
| count   |
+---------+
| 1000000 |
+---------+
1 row(s) fetched.
Elapsed 0.861 seconds.

> create external table t stored as parquet location 't.parquet';
0 row(s) fetched.
Elapsed 0.007 seconds.

> explain analyze select k from t where k = 123456;
total=9929
ranges=[125400..126482, 126482..127564, 127564..128646, 128646..129728, 129728..130810, 130810..131892, 131892..132974, 132974..134247, 134247..135329]
total=0
ranges=[]
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | DataSourceExec: file_groups={1 group: [[/t.parquet]]}, projection=[k], file_type=parquet, predicate=k@0 = 123456, pruning_predicate=k_null_count@2 != row_count@3 AND k_min@0 <= 123456 AND 123456 <= k_max@1, required_guarantees=[k in (123456)], metrics=[output_rows=1, elapsed_compute=1ns, batches_split=0, bytes_scanned=9929, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=0, num_predicate_creation_errors=0, page_index_rows_matched=1192, page_index_rows_pruned=98808, predicate_cache_inner_records=16384, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=1, pushdown_rows_pruned=1191, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=9, bloom_filter_eval_time=195.801µs, metadata_load_time=340.301µs, page_index_eval_time=233.801µs, row_pushdown_eval_time=57.201µs, statistics_eval_time=387.401µs, time_elapsed_opening=2.1128ms, time_elapsed_processing=6.9853ms, time_elapsed_scanning_total=5.324ms, time_elapsed_scanning_until_data=5.2613ms] |
|                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.016 seconds.

alamb · 2025-09-15T18:43:20Z

I think we should proceed to reviewing and merging this PR.

Since @AdamGS was already looking at this and

Upgrade arrow/parquet to 56.1.0 #17571

I will file a follow on ticket to track the inability to extend data pages, and we will have until DataFusion 51 to resolve the issue

Also, since arrow-rs 56.1.0 is marked as compatible with previous versions of arrow people can (and probably will) start using this release with DataFusion 50 anyways

AdamGS

non-binding LGTM

Jefffrey

👍

alamb · 2025-09-18T13:26:43Z

Thank you @Jefffrey and @AdamGS

* Update to arrow/parquet 56.1.0 * Adjust for new parquet sizes, update for deprecated API * Thread through max_predicate_cache_size, add test (cherry picked from commit 980c948)

* Update to arrow/parquet 56.1.0 * Adjust for new parquet sizes, update for deprecated API * Thread through max_predicate_cache_size, add test (cherry picked from commit 980c948) Co-authored-by: Andrew Lamb <[email protected]>

* Use `Display` formatting of `DataType`:s in error messages (#17565) * Use Display formatting for DataTypes where I could find them * fix * More places * Less Debug * Cargo fmt * More cleanup * Plural types as Display * Fixes * Update some more tests and error messages * Update test snapshot * last (?) fixes * update another slt * Update instructions on how to run the tests * Ignore pending snapshot files in .gitignore * Running all the tests is so slow * just a trailing space * Update another test * Fix markdown formatting * Improve Display for NativeType * Update code related to error reporting of NativeType * Revert some formatting * fixelyfix * Another snapshot update * docs: Move Google Summer of Code 2025 pages to a section (#17504) * Move GSOC content to its own section * Update to 20205 * feat: Add `OR REPLACE` to creating external tables (#17580) * feat: Add `OR REPLACE` to creating external tables * regen * fmt * make more explicit + add tests * clipy fix --------- Co-authored-by: Dmitrii Blaginin <[email protected]> * `avg(distinct)` support for decimal types (#17560) * chore: mv `DistinctSumAccumulator` to common * feat: add avg distinct support for float64 type * chore: fmt * refactor: update import for DataType in Float64DistinctAvgAccumulator and remove unused sum_distinct module * feat: add avg distinct support for float64 type * feat: add avg distinct support for decimal * feat: more test for avg distinct in rust api * Remove DataFrame API tests for avg(distinct) * Remove proto test * Fix merge errors * Refactoring * Minor cleanup * Decimal slt tests for avg(distinct) * Fix state_fields for decimal distinct avg --------- Co-authored-by: YuNing Chen <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]> * chore(deps): bump taiki-e/install-action from 2.61.8 to 2.61.9 (#17640) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.8 to 2.61.9. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/2fdc5fd6ac805b0f8256893bd4c807bcb666af00...8ea32481661d5e04d602f215b94f17e4014b44f9) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.61.9 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump Swatinem/rust-cache from 2.8.0 to 2.8.1 (#17641) Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.8.0 to 2.8.1. - [Release notes](https://github.com/swatinem/rust-cache/releases) - [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md) - [Commits](https://github.com/swatinem/rust-cache/compare/98c8021b550208e191a6a3145459bfc9fb29c4c0...f13886b937689c021905a6b90929199931d60db1) --- updated-dependencies: - dependency-name: Swatinem/rust-cache dependency-version: 2.8.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Validate the memory consumption in SPM created by multi level merge (#17029) * use GreedyMemoryPool for sanity check * validate whether batch read from spill exceeds max_record_batch_mem * replace err with warn log * fix(SubqueryAlias): use maybe_project_redundant_column (#17478) * fix(SubqueryAlias): use maybe_project_redundant_column Fixes #17405 * chore: format * ci: retry * chore(SubqueryAlias): restructore duplicate detection and add tests * docs: add examples and context to the reproducer * minor: Ensure `datafusion-sql` package dependencies have `sql` flag (#17644) * optimizer: Rewrite `IS NOT DISTINCT FROM` joins as Hash Joins (#17319) * optimizer: Convert to Hash Join for join predicates like 'a IS NOT DISTINCT FROM b' * drop tables in slt * fix rust doc * Update datafusion/optimizer/src/extract_equijoin_predicate.rs Co-authored-by: Jonathan Chen <[email protected]> * Update datafusion/optimizer/src/extract_equijoin_predicate.rs * Update datafusion/sqllogictest/test_files/join_is_not_distinct_from.slt Co-authored-by: Nga Tran <[email protected]> * review: more tests and better error message * review: improve doc --------- Co-authored-by: Jonathan Chen <[email protected]> Co-authored-by: Nga Tran <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> * Upgrade to arrow 56.1.0 (#17275) * Update to arrow/parquet 56.1.0 * Adjust for new parquet sizes, update for deprecated API * Thread through max_predicate_cache_size, add test * fix: Preserves field metadata when creating logical plan for VALUES expression (#17525) * [ISSUE 17425] Initial attempt to fix this problem * Add tests for the fix * Require that the metadata of values in VALUES clause must be identical * fix merge error --------- Co-authored-by: Andrew Lamb <[email protected]> * chore(deps): bump serde from 1.0.223 to 1.0.225 (#17614) Bumps [serde](https://github.com/serde-rs/serde) from 1.0.223 to 1.0.225. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.223...v1.0.225) --- updated-dependencies: - dependency-name: serde dependency-version: 1.0.225 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dmitrii Blaginin <[email protected]> * chore: Update dynamic filter formatting (#17647) * chore: update dynamic filter formatting to indicate expr is placeholder * update tests * update tests * chore(deps): bump taiki-e/install-action from 2.61.9 to 2.61.10 (#17660) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.9 to 2.61.10. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/8ea32481661d5e04d602f215b94f17e4014b44f9...0aa4f22591557b744fe31e55dbfcdfea74a073f7) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.61.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * proto: don't include parquet feature by default (#17577) * feat: add support for RightAnti and RightSemi join types (#17604) Closes #17603 * minor: Ensure `proto` crate has datetime & unicode expr flags in datafusion dev dependency (#17656) * minor: Ensure `proto` crate has datetime & unicode expr flags in datafusion dev dependency * toml formatting * chore(deps): bump indexmap from 2.11.3 to 2.11.4 (#17661) Bumps [indexmap](https://github.com/indexmap-rs/indexmap) from 2.11.3 to 2.11.4. - [Changelog](https://github.com/indexmap-rs/indexmap/blob/main/RELEASES.md) - [Commits](https://github.com/indexmap-rs/indexmap/compare/2.11.3...2.11.4) --- updated-dependencies: - dependency-name: indexmap dependency-version: 2.11.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs: add xorq to list of known users (#17668) * Introduce `TypeSignatureClass::Binary` to allow accepting arbitrarily sized `FixedSizeBinary` arguments (#17531) * Introduce wildcard const for FixedSizeBinary type signature * Add Binary to TypeSignatureClass * Remove FIXED_SIZE_BINARY_WILDCARD * docs: deduplicate links in `introduction.md` (#17669) * docs: deduplicate links in `introduction.md` * Further simplifications * Fix * Add explicit PMC/committers list to governance docs page (#17574) * Add committers explicitly to governance page, with script * add license header * Update Wes McKinney's affiliation in governance.md * Update adriangb's affiliation * Update affiliation * Andy Grove Affiliation * Update Qi Zhu affiliation * Updatd linwei's info * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md * Apply suggestions from code review Co-authored-by: Oleks V <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> * Apply suggestions from code review Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Yang Jiang <[email protected]> Co-authored-by: Yongting You <[email protected]> * Apply suggestions from code review Co-authored-by: Yijie Shen <[email protected]> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Brent Gardner <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]> Co-authored-by: Jax Liu <[email protected]> Co-authored-by: Ifeanyi Ubah <[email protected]> * Apply suggestions from code review Co-authored-by: Will Jones <[email protected]> * Clarify what is updated in the script * Apply suggestions from code review Co-authored-by: Paddy Horan <[email protected]> Co-authored-by: Dan Harris <[email protected]> * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md Co-authored-by: Parth Chandra <[email protected]> * Update docs/source/contributor-guide/governance.md * prettier --------- Co-authored-by: Wes McKinney <[email protected]> Co-authored-by: Adrian Garcia Badaracco <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Qi Zhu <[email protected]> Co-authored-by: 张林伟 <[email protected]> Co-authored-by: xudong.w <[email protected]> Co-authored-by: Oleks V <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Yang Jiang <[email protected]> Co-authored-by: Yongting You <[email protected]> Co-authored-by: Yijie Shen <[email protected]> Co-authored-by: Brent Gardner <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]> Co-authored-by: Jax Liu <[email protected]> Co-authored-by: Ifeanyi Ubah <[email protected]> Co-authored-by: Will Jones <[email protected]> Co-authored-by: Paddy Horan <[email protected]> Co-authored-by: Dan Harris <[email protected]> Co-authored-by: Ruihang Xia <[email protected]> Co-authored-by: Parth Chandra <[email protected]> * fix: Ignore governance doc from typos (#17678) * Support Decimal32/64 types (#17501) * Support Decimal32/64 types * Fix bugs, tests, handle more aggregate functions and schema * Fill out more parts in expr,common and expr-common * Some stragglers and overlooked corners * Actually commit the avg_distinct support --------- Co-authored-by: Andrew Lamb <[email protected]> * minor: Improve hygiene for `datafusion-functions` macros (#17638) * feat(small): Display `NullEquality` in join executor's `EXPLAIN` output (#17664) * Clarify null-equal explain expectations * Format null equality display strings * fix test * review: more concise message * review: more concise message * Custom timestamp format for DuckDB (#17653) * feat(substrait): add time literal support (#17655) Adds support for `ScalarValue::Time64Microsecond` and `ScalarValue::Time64Nanosecond` to be converted to and from Substrait literals. This includes the `PrecisionTime` literal type and specific `TIME_64_TYPE_VARIATION_REF` for 6-digit (microseconds) and 9-digit (nanoseconds) precision. Co-authored-by: Bruno Volpato <[email protected]> * Support LargeList for array_sort (#17657) * Support FixedSizeList for array_except (#17658) * fix: null padding for `array_reverse` on `FixedSizeList` (#17673) * fix: array_reverse with null * update * update * chore: refactor array fn signatures & add more slt tests (#17672) * Support FixedSizeList for array_to_string (#17666) * fix: correct statistics for `NestedLoopJoinExec` (#17680) * fix: correct statistics for nestedloopexec * chore: update comment * minor: add SQLancer fuzzed SLT case for natural joins (#17683) * chore: Upgrade Rust version to 1.90.0 (#17677) * chore: bump workspace rust version to 1.90.0 * fix clippy errors * fix clippy errors * try using dedicate runner temp space * retrigger * inspect disk usage * split build/run * disable debug info in ci profile * revert ci changes * Support FixedSizeList for array_position (#17659) * chore(deps): bump the proto group with 2 updates (#16806) * chore(deps): bump the proto group with 2 updates Bumps the proto group with 2 updates: [pbjson-build](https://github.com/influxdata/pbjson) and [prost-build](https://github.com/tokio-rs/prost). Updates `pbjson-build` from 0.7.0 to 0.8.0 - [Commits](https://github.com/influxdata/pbjson/commits) Updates `prost-build` from 0.13.5 to 0.14.1 - [Release notes](https://github.com/tokio-rs/prost/releases) - [Changelog](https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/prost/compare/v0.13.5...v0.14.1) --- updated-dependencies: - dependency-name: pbjson-build dependency-version: 0.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: proto - dependency-name: prost-build dependency-version: 0.14.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: proto ... Signed-off-by: dependabot[bot] <[email protected]> * Regen protos --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jefffrey <[email protected]> * feat(spark): implement Spark `make_interval` function (#17424) * feat(spark): implement Spark make_interval function * fix name length * add doc * add doc and change test, need more test * fmt * add test and doc, need to work in overflow * clippy * empty params * test ok IntervalMonthDayNano::new(0, 0, 0) in unit test * line blank * fix doc table select * dont panic * update test and not panic fmt * review * review fix test failure * review fix test failure format simple string * test uncomment and link * return test (empty) * changes review * all overflow null * all overflow null fix fmt * changes review * changes review clippy * refactor move * fix error doc date_sub * clean slt * no space device * chore: Update READMEs of crates to be more consistent (#17691) * chore: Update READMEs of crates to be more consistent * Add some more Apache project links * Minor formatting * Formatting * Update datafusion/pruning/README.md Co-authored-by: Andrew Lamb <[email protected]> * suggestion * formatting * formatting --------- Co-authored-by: Andrew Lamb <[email protected]> * chore: update a bunch of dependencies (#17708) * chore: fix wasm-pack installation link in wasmtest README (#17704) * Support FixedSizeList for array_slice via coercion to List (#17667) * docs: Remove disclaimer that `datafusion` 50.0.0 is not released (#17695) * docs: Remove disclaimer that datafusion 50.0.0 is not released * Add section about 51.0.0 * chore(deps): bump taiki-e/install-action from 2.61.10 to 2.62.1 (#17710) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.10 to 2.62.1. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/0aa4f22591557b744fe31e55dbfcdfea74a073f7...d6912b47771be2c443ec90dbb3d28e023987e782) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * perf: Improve the performance of WINDOW functions with many partitions (#17528) * perf: Improve the performance of WINDOW functions with many partitions * Improve variable name in calculate_n_out_row * fix: Partial AggregateMode will generate duplicate field names which will fail DFSchema construct (#17706) * fix: Partial AggregateMode will generate duplicate field names which will fail DFSchema construct * Update datafusion/common/src/dfschema.rs Co-authored-by: Andrew Lamb <[email protected]> * fmt --------- Co-authored-by: Andrew Lamb <[email protected]> * feat: expose `udafs` and `udwfs` methods on `FunctionRegistry` (#17650) * expose udafs and udwfs method on `FunctionRegistry` * fix doc test * add default implementations not to trigger backward incompatible change for others * Support remaining substrait time literal variations (#17707) * Bump MSRV to 1.87.0 (#17724) * Bump MSRV to 1.87.0 * automatic code fixes * Add upgrading entry * Avoid redundant Schema clones (#17643) * Collocate variants of From DFSchema to Schema * Remove duplicated logic for obtaining Schema from DFSchema * Remove Arc clone in hash_nested_array * Avoid redundant Schema clones * Avoid some Field clones * make arc clones explicit * retract the new From * empty: roll the dice 🎲 * Use github link instead of relative link to optimizer_rule.rs in query-optimizer.md (#17723) * Move misplaced upgrading entry about MSRV (#17727) * Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API (#17536) * Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API * Add to roundtrip proto tests * Support `WHERE`, `ORDER BY`, `LIMIT`, `SELECT`, `EXTEND` pipe operators (#17278) * support WHERE pipe operator * support order by * support limit * select pipe * extend support * document supported pipe operators in user guide * fmt * fix where pipe before extend * don't rebind * remove clone * move docs into select.md * avoid confusion by removing `>` in examples --------- Co-authored-by: Jeffrey Vo <[email protected]> * doc: add missing examples for multiple math functions (#17018) * Update Scalar_functions.md * pretier fix * Updated files * Updated Scalar functions * Update datafusion/functions/src/math/log.rs Co-authored-by: Jeffrey Vo <[email protected]> * Update datafusion/functions/src/math/monotonicity.rs Co-authored-by: Jeffrey Vo <[email protected]> * Update datafusion/functions/src/math/monotonicity.rs Co-authored-by: Jeffrey Vo <[email protected]> * Update datafusion/functions/src/math/nans.rs Co-authored-by: Jeffrey Vo <[email protected]> * Update datafusion/functions/src/math/nanvl.rs Co-authored-by: Jeffrey Vo <[email protected]> * Fix tanh example to be tanh not trunc * Run update_function_docs.sh --------- Co-authored-by: Jeffrey Vo <[email protected]> * feat: support for null, date, and timestamp types in approx_distinct (#17618) * feat: let approx_distinct handle null, date and timestamp types Signed-off-by: Dennis Zhuang <[email protected]> * chore: update testing submodule Signed-off-by: Dennis Zhuang <[email protected]> * feat: supports time type and refactor NullHLLAccumulator Signed-off-by: Dennis Zhuang <[email protected]> * bump arrow-testing submodule --------- Signed-off-by: Dennis Zhuang <[email protected]> Co-authored-by: Jefffrey <[email protected]> * fix(agg/corr): return NULL when variance is zero or samples < 2 (#17621) Signed-off-by: Dennis Zhuang <[email protected]> * chore(deps): bump taiki-e/install-action from 2.62.1 to 2.62.4 (#17739) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.1 to 2.62.4. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/d6912b47771be2c443ec90dbb3d28e023987e782...5597bc27da443ba8bf9a3bc4e5459ea59177de42) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump tempfile from 3.22.0 to 3.23.0 (#17741) Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.22.0 to 3.23.0. - [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md) - [Commits](https://github.com/Stebalien/tempfile/compare/v3.22.0...v3.23.0) --- updated-dependencies: - dependency-name: tempfile dependency-version: 3.23.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: make `LimitPushPastWindows` public (#17736) * fix: Remove parquet encryption feature from root deps (#17700) This fix relates to issue #16650 by completing #16649 . * fix: Remove datafusion-macros's dependency on datafusion-expr (#17688) * Remove datafusion-macros's dependency on datafusion-expr * Re-export * chore: remove homebrew publish instructions from release steps (#17735) * minor: create `OptimizerContext` with provided `ConfigOptions` (#17742) * Improve documentation for ordered set aggregate functions (#17744) * docs: fix sidebar overlapping table on configuration page on website (#17738) * solved bug * fix:modified css for table overlapping * Add support for calling async UDF as aggregation expression (#17620) * Add support for calling async UDF as aggregation expression Fixes https://github.com/apache/datafusion/issues/17619 * add explain plans * chore(deps): bump taiki-e/install-action from 2.62.4 to 2.62.5 (#17750) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.4 to 2.62.5. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/5597bc27da443ba8bf9a3bc4e5459ea59177de42...6f69ec9970ed0c500b1b76d648e05c4c7e0e5671) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * (fix): Lag function creates unwanted projection (#17630) (#17639) * fix: Not adding generated windown expr resulting column twice (#17630) * Making clippy happier * Support `LargeList` in `array_has` simplification to `InList` (#17732) * Support `LargeList` in `array_has` simplification to `InList` * refactoring * chore(deps): bump wasm-bindgen-test from 0.3.51 to 0.3.53 (#17642) * chore(deps): bump wasm-bindgen-test from 0.3.51 to 0.3.53 Bumps [wasm-bindgen-test](https://github.com/wasm-bindgen/wasm-bindgen) from 0.3.51 to 0.3.53. - [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases) - [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md) - [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits) --- updated-dependencies: - dependency-name: wasm-bindgen-test dependency-version: 0.3.53 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * testing setting WASM_BINDGEN_TEST_TIMEOUT * more testing * more testing * more testing * more testing * more testing * testing * testing * testing * testing * whoops * whoops * testing * testing * testing * testing * testing * testing * testing * testing * testing * testing * testing * testing * problem commit * please let this work * oops * test 0.3.53 * fix --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeffrey Vo <[email protected]> * feat: support `Utf8View` for more args of `regexp_replace` (#17195) * Stash changes. * Signature cleanup, more test scenarios. * Minor test renaming. * Simplify signature. * Update tests. * Signature change for binary input support. * Return type changes for binary. * Stash. * Stash. * Stash. * Stash. * Fix regx bench. * Clippy. * Fix bench regx. * Refactor signature. I need to remove the match arms that aren't used anymore, update the .slt test for string_view.slt, and understand why String(3) and String(4) is not equivalent to this. * Remove unnecessary match arms. * Update string_view slt test. * Reduce diff by returning to single function with a match arm instead of two. * Simplify template args. * Fix benchmark compilation. * Address PR feedback. * feat(spark): implement Spark `map` function `map_from_arrays` (#17456) * feat(spark): implement Spark `map` function `map_from_arrays` * chore: add test with nested `map_from_arrays` calls, refactor map_deduplicate_keys to remove unnesessary variables and array slices * fix: clippy warning * fix: null and different size input lists treatment, chore: move common map funcs to utils.rs, add more tests * fix: typo * fix: clippy docstring warning * chore: move more helpers needed for multiple map functions to utils * chore: add multi-row tests * fix: null values treatment * fix: docstring warnings * chore(deps): bump object_store from 0.12.3 to 0.12.4 (#17753) Bumps [object_store](https://github.com/apache/arrow-rs-object-store) from 0.12.3 to 0.12.4. - [Changelog](https://github.com/apache/arrow-rs-object-store/blob/main/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs-object-store/compare/v0.12.3...v0.12.4) --- updated-dependencies: - dependency-name: object_store dependency-version: 0.12.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update `arrow` / `parquet` to 56.2.0 (#17631) * temp update to arrow 56.2.0 pin * Update to 56.2.0 * Use released arrow * Update cargo.lock * fix lock * chore(deps): bump taiki-e/install-action from 2.62.5 to 2.62.6 (#17766) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.5 to 2.62.6. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/6f69ec9970ed0c500b1b76d648e05c4c7e0e5671...4575ae687efd0e2c78240087f26013fb2484987f) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.6 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Keep aggregate udaf schema names unique when missing an order-by (#17731) * test: reproducer of bug * fix: make schema names unique for approx_percentile_cont * test: regression test is now resolved * feat : Display function alias in output column name (#17690) * display function's alias name in output column * Update function.rs * updated verbose name format * simplify alias logic and removing args clone * Support join cardinality estimation less conservatively (#17476) * Support join cardinality estimation if distinct_count is set Currently we require max and min to be set, as they might be used to estimate the distinct count. This is unnecessarily conservative if distinct_count has actually been provided, in which case max and min won't be used at all and the presence of max or min has no influence over how good of an estimate it is. * Update datafusion/physical-plan/src/joins/utils.rs Co-authored-by: Piotr Findeisen <[email protected]> * Update tests * Calculate cardinality even if distinct or min/max not provided --------- Co-authored-by: Piotr Findeisen <[email protected]> * chore(deps): bump libc from 0.2.175 to 0.2.176 (#17767) Bumps [libc](https://github.com/rust-lang/libc) from 0.2.175 to 0.2.176. - [Release notes](https://github.com/rust-lang/libc/releases) - [Changelog](https://github.com/rust-lang/libc/blob/0.2.176/CHANGELOG.md) - [Commits](https://github.com/rust-lang/libc/compare/0.2.175...0.2.176) --- updated-dependencies: - dependency-name: libc dependency-version: 0.2.176 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump postgres-types from 0.2.9 to 0.2.10 (#17768) Bumps [postgres-types](https://github.com/rust-postgres/rust-postgres) from 0.2.9 to 0.2.10. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](https://github.com/rust-postgres/rust-postgres/compare/postgres-types-v0.2.9...postgres-types-v0.2.10) --- updated-dependencies: - dependency-name: postgres-types dependency-version: 0.2.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Use `Expr::qualified_name()` and `Column::new()` to extract partition keys from window and aggregate operators (#17757) * Use `Expr::qualified_name()` and `Column::new()` to extract partition keys Using `Expr::schema_name()` and `Column::from_qualified_name()` could incorrectly parse the column name. * Use `Expr::qualified_name()` to extract group by keys * Retrain dataframe tests with filters and aggregates * Prevent exponential planning time for Window functions - v2 (#17684) * fix * Update mod.rs * Update mod.rs * Update mod.rs * tests copied from v1 pr * test case from review comment https://github.com/apache/datafusion/pull/17684#discussion_r2366146307 * one more test case * Update mod.rs * Update datafusion/physical-plan/src/windows/mod.rs Co-authored-by: Andrew Lamb <[email protected]> * Update datafusion/physical-plan/src/windows/mod.rs Co-authored-by: Andrew Lamb <[email protected]> * Update mod.rs * Update mod.rs --------- Co-authored-by: Piotr Findeisen <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> * docs: add Ballista link to landing page (#17746) (#17775) * docs: add Ballista link to landing page (#17746) This adds a link and description for DataFusion Ballista to the landing page, as suggested in issue #17746. Ballista is a distributed compute platform built on top of DataFusion. Closes: #17746 * fix(docs): update Ballista link * updated theory part * chore(deps): bump taiki-e/install-action from 2.62.6 to 2.62.8 (#17781) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.6 to 2.62.8. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/4575ae687efd0e2c78240087f26013fb2484987f...ea0eda622640ac23a17ba349cf09e2709d58f5e1) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.8 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump wasm-bindgen-test from 0.3.53 to 0.3.54 (#17784) Bumps [wasm-bindgen-test](https://github.com/wasm-bindgen/wasm-bindgen) from 0.3.53 to 0.3.54. - [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases) - [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md) - [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits) --- updated-dependencies: - dependency-name: wasm-bindgen-test dependency-version: 0.3.54 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: Action some old TODOs in github actions (#17694) * chore: Action some old TODOs in github actions * Update Cargo.toml * testing * Revert changing cli test runner to use container * Remove sccache * dev: Add benchmark for compilation profiles (#17754) * Add benchmark for compilation profiles * add apache header * add apache header * chore(deps): bump tokio-postgres from 0.7.13 to 0.7.14 (#17785) Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.13 to 0.7.14. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](https://github.com/rust-postgres/rust-postgres/compare/tokio-postgres-v0.7.13...tokio-postgres-v0.7.14) --- updated-dependencies: - dependency-name: tokio-postgres dependency-version: 0.7.14 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump serde from 1.0.226 to 1.0.227 (#17783) Bumps [serde](https://github.com/serde-rs/serde) from 1.0.226 to 1.0.227. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.226...v1.0.227) --- updated-dependencies: - dependency-name: serde dependency-version: 1.0.227 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump regex from 1.11.2 to 1.11.3 (#17782) Bumps [regex](https://github.com/rust-lang/regex) from 1.11.2 to 1.11.3. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/regex/compare/1.11.2...1.11.3) --- updated-dependencies: - dependency-name: regex dependency-version: 1.11.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Support `CAST` from temporal to `Utf8View` (#17535) * Add case expr simplifiers for literal comparisons (#17743) * Add case expr simplifiers for literal comparisons * Update datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs Co-authored-by: Andrew Lamb <[email protected]> * Avoid expr clones --------- Co-authored-by: Andrew Lamb <[email protected]> * chore: dependabot to run weekly (#17797) * [DOCS] Add dbt Fusion engine and R2 Query Engine to "Known Users" (#17793) * Add dbt Fusion engine and R2 Query Engine * Update docs/source/user-guide/introduction.md * Update docs/source/user-guide/introduction.md * feat: change `datafusion-proto` to use `TaskContext` rather than`SessionContext` for physical plan serialization (#17601) * change session context to task context in physical proto ... * fix compilation issue * remove `RuntimeEnv` from few function arguments * update upgrading guide * display window function's alias name in output (#17788) * docs: update wasmtest README with instructions for Apple silicon (#17755) * chore(deps): bump sysinfo from 0.37.0 to 0.37.1 (#17800) Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.37.0 to 0.37.1. - [Changelog](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md) - [Commits](https://github.com/GuillaumeGomez/sysinfo/compare/v0.37.0...v0.37.1) --- updated-dependencies: - dependency-name: sysinfo dependency-version: 0.37.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump taiki-e/install-action from 2.62.8 to 2.62.9 (#17799) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.8 to 2.62.9. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/ea0eda622640ac23a17ba349cf09e2709d58f5e1...71d339ebf191fcbc3d49cd04b9484a4261f29975) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.9 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat(spark): implement Spark `make_dt_interval` function (#17728) * feat(spark): implement Spark make_dt_interval function * fmt * delete pub * test slt * fmt * overflow -> null * sugested changes * fmt * only res in slt * null not void type * explain types * explain types fix url * better comment * Fix potential overflow when we print verbose physical plan (#17798) * change debug to trace for potential overflow * fix comments. * fix * Add SedonaDB as known user to Apache DataFusion (#17806) * Extend datatype semantic equality check to include timestamps (#17777) * Extend datatype semantic equality to include timestamps * test * Respond to comments * cargo fmt --------- Co-authored-by: Shiv Bhatia <[email protected]> * fix: Filter out nulls properly in approx_percentile_cont_with_weight (#17780) * chore: refactor usage of `reassign_predicate_columns` (#17703) * chore: refactor usage of `reassign_predicate_columns` * chore: Address PR comments --------- Co-authored-by: Andrew Lamb <[email protected]> * dev: Add Apache license check to the lint script (#17787) * Add liscense checker ci script * fix the deliberately added bad license header * review: use dev profile and pin the version * Fix: common_sub_expression_eliminate optimizer rule failed (#16066) Common_sub_expression_eliminate rule failed with error: `SchemaError(FieldNotFound {field: <name>}, valid_fields: []})` due to the schema being changed by the second application of `find_common_exprs` As I understood the source of the problem was in sequential call of `find_common_exprs`. First call returned original names as `aggr_expr` and changed names as `new_aggr_expr`. Second call takes into account only `new_aggr_expr` and if names was already changed by first call will return changed names as `aggr_expr`(original ones) and put them into Projection logic. I used NamePreserver mechanism to restore original schema names and generate Projection with original name at the end of aggregate optimization. Co-authored-by: Andrew Lamb <[email protected]> * feat: support multi-threaded writing of Parquet files with modular encryption (#16738) * Initial commit diff --git c/Cargo.lock i/Cargo.lock index 749971532..f0b9d0a5f 100644 --- c/Cargo.lock +++ i/Cargo.lock @@ -246,52 +246,62 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" [[package]] name = "arrow" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "fd798aea3553913a5986813e9c6ad31a2d2b04e931fe8ea4a37155eb541cebb5" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-arith", - "arrow-array", - "arrow-buffer", - "arrow-cast", + "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-csv", - "arrow-data", - "arrow-ipc", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-json", - "arrow-ord", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-pyarrow", - "arrow-row", - "arrow-schema", - "arrow-select", - "arrow-string", + "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "half", "rand 0.9.2", ] [[package]] name = "arrow-arith" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "508dafb53e5804a238cab7fd97a59ddcbfab20cc4d9814b1ab5465b9fa147f2e" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "chrono", + "num", +] + +[[package]] +name = "arrow-arith" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "chrono", "num", ] [[package]] name = "arrow-array" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e2730bc045d62bb2e53ef8395b7d4242f5c8102f41ceac15e8395b9ac3d08461" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "ahash 0.8.12", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "chrono-tz", "half", @@ -299,11 +309,35 @@ dependencies = [ "num", ] +[[package]] +name = "arrow-array" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "ahash 0.8.12", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "chrono", + "half", + "hashbrown 0.15.4", + "num", +] + [[package]] name = "arrow-buffer" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "54295b93beb702ee9a6f6fbced08ad7f4d76ec1c297952d4b83cf68755421d1d" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "bytes", + "half", + "num", +] + +[[package]] +name = "arrow-buffer" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ "bytes", "half", @@ -312,15 +346,14 @@ dependencies = [ [[package]] name = "arrow-cast" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "67e8bcb7dc971d779a7280593a1bf0c2743533b8028909073e804552e85e75b5" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "atoi", "base64 0.22.1", "chrono", @@ -332,14 +365,32 @@ dependencies = [ ] [[package]] -name = "arrow-csv" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "673fd2b5fb57a1754fdbfac425efd7cf54c947ac9950c1cce86b14e248f1c458" +name = "arrow-cast" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-array", - "arrow-cast", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "atoi", + "base64 0.22.1", + "chrono", + "half", + "lexical-core", + "num", + "ryu", +] + +[[package]] +name = "arrow-csv" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "csv", "csv-core", @@ -348,33 +399,42 @@ dependencies = [ [[package]] name = "arrow-data" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "97c22fe3da840039c69e9f61f81e78092ea36d57037b4900151f063615a2f6b4" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-buffer", - "arrow-schema", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "half", + "num", +] + +[[package]] +name = "arrow-data" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "half", "num", ] [[package]] name = "arrow-flight" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6808d235786b721e49e228c44dd94242f2e8b46b7e95b233b0733c46e758bfee" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-arith", - "arrow-array", - "arrow-buffer", - "arrow-cast", - "arrow-data", - "arrow-ipc", - "arrow-ord", - "arrow-row", - "arrow-schema", - "arrow-select", - "arrow-string", + "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "base64 0.22.1", "bytes", "futures", @@ -382,35 +442,45 @@ dependencies = [ "paste", "prost", "prost-types", - "tonic", + "tonic 0.12.3", ] [[package]] name = "arrow-ipc" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "778de14c5a69aedb27359e3dd06dd5f9c481d5f6ee9fbae912dba332fd64636b" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "flatbuffers", "lz4_flex", "zstd", ] [[package]] -name = "arrow-json" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3860db334fe7b19fcf81f6b56f8d9d95053f3839ffe443d56b5436f7a29a1794" +name = "arrow-ipc" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-cast", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "flatbuffers", +] + +[[package]] +name = "arrow-json" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "half", "indexmap 2.10.0", @@ -424,78 +494,130 @@ dependencies = [ [[package]] name = "arrow-ord" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "425fa0b42a39d3ff55160832e7c25553e7f012c3f187def3d70313e7a29ba5d9" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", +] + +[[package]] +name = "arrow-ord" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", ] [[package]] name = "arrow-pyarrow" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d944d8ae9b77230124e6570865b570416c33a5809f32c4136c679bbe774e45c9" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "pyo3", ] [[package]] name = "arrow-row" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "df9c9423c9e71abd1b08a7f788fcd203ba2698ac8e72a1f236f1faa1a06a7414" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "half", +] + +[[package]] +name = "arrow-row" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "half", ] [[package]] name = "arrow-schema" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "85fa1babc4a45fdc64a92175ef51ff00eba5ebbc0007962fecf8022ac1c6ce28" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "bitflags 2.9.1", "serde", "serde_json", ] +[[package]] +name = "arrow-schema" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" + [[package]] name = "arrow-select" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d8854d15f1cf5005b4b358abeb60adea17091ff5bdd094dca5d3f73787d81170" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "ahash 0.8.12", - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "num", +] + +[[package]] +name = "arrow-select" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "ahash 0.8.12", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "num", ] [[package]] name = "arrow-string" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2c477e8b89e1213d5927a2a84a72c384a9bf4dd0dbf15f9fd66d821aafd9e95e" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "memchr", + "num", + "regex", + "regex-syntax", +] + +[[package]] +name = "arrow-string" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "memchr", "num", "regex", @@ -567,6 +689,28 @@ dependencies = [ "syn 2.0.106", ] +[[package]] +name = "async-stream" +version = "0.3.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b5a71a6f37880a80d1d7f19efd781e4b5de42c88f0722cc13bcb6cc2cfe8476" +dependencies = [ + "async-stream-impl", + "futures-core", + "pin-project-lite", +] + +[[package]] +name = "async-stream-impl" +version = "0.3.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c7c24de15d275a1ecfd47a380fb4d5ec9bfe0933f309ed5e705b775596a3574d" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.104", +] + [[package]] name = "async-trait" version = "0.1.89" @@ -827,7 +971,7 @@ dependencies = [ "rustls-native-certs", "rustls-pki-types", "tokio", - "tower", + "tower 0.5.2", "tracing", ] @@ -948,18 +1092,19 @@ dependencies = [ [[package]] name = "axum" -version = "0.8.4" +version = "0.7.9" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5" +checksum = "edca88bc138befd0323b20752846e6587272d3b03b0343c8ea28a6f819e6e71f" dependencies = [ - "axum-core", + "async-trait", + "axum-core 0.4.5", "bytes", "futures-util", "http 1.3.1", "http-body 1.0.1", "http-body-util", "itoa", - "matchit", + "matchit 0.7.3", "memchr", "mime", "percent-encoding", @@ -967,7 +1112,53 @@ dependencies = [ "rustversion", "serde", "sync_wrapper", - "tower", + "tower 0.5.2", + "tower-layer", + "tower-service", +] + +[[package]] +name = "axum" +version = "0.8.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5" +dependencies = [ + "axum-core 0.5.2", + "bytes", + "futures-util", + "http 1.3.1", + "http-body 1.0.1", + "http-body-util", + "itoa", + "matchit 0.8.4", + "memchr", + "mime", + "percent-encoding", + "pin-project-lite", + "rustversion", + "serde", + "sync_wrapper", + "tower 0.5.2", + "tower-layer", + "tower-service", +] + +[[package]] +name = "axum-core" +version = "0.4.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09f2bd6146b97ae3359fa0cc6d6b376d9539582c7b4220f041a33ec24c226199" +dependencies = [ + "async-trait", + "bytes", + "futures-util", + "http 1.3.1", + "http-body 1.0.1", + "http-body-util", + "mime", + "pin-project-lite", + "rustversion", + "sync_wrapper", "tower-layer", "tower-service", ] @@ -1818,8 +2009,8 @@ name = "datafusion" version = "49.0.1" dependencies = [ "arrow", - "arrow-ipc", - "arrow-schema", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "bytes", "bzip2 0.6.0", @@ -1996,7 +2187,7 @@ dependencies = [ "ahash 0.8.12", "apache-avro", "arrow", - "arrow-ipc", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "base64 0.22.1", "chrono", "half", @@ -2176,7 +2367,7 @@ version = "49.0.1" dependencies = [ "arrow", "arrow-flight", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "base64 0.22.1", "bytes", @@ -2197,7 +2388,7 @@ dependencies = [ "tempfile", "test-utils", "tokio", - "tonic", + "tonic 0.13.1", "tracing", "tracing-subscriber", "url", @@ -2264,7 +2455,7 @@ version = "49.0.1" dependencies = [ "abi_stable", "arrow", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-ffi", "async-trait", "datafusion", @@ -2284,7 +2475,7 @@ name = "datafusion-functions" version = "49.0.1" dependencies = [ "arrow", - "arrow-buffer", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "base64 0.22.1", "blake2", "blake3", @@ -2347,7 +2538,7 @@ name = "datafusion-functions-nested" version = "49.0.1" dependencies = [ "arrow", - "arrow-ord", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "criterion", "datafusion-common", "datafusion-doc", @@ -2517,8 +2708,8 @@ version = "49.0.1" dependencies = [ "ahash 0.8.12", "arrow", - "arrow-ord", - "arrow-schema", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "chrono", "criterion", @@ -2589,7 +2780,7 @@ name = "datafusion-pruning" version = "49.0.1" dependencies = [ "arrow", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "datafusion-common", "datafusion-datasource", "datafusion-expr", @@ -4157,6 +4348,12 @@ dependencies = [ "pkg-config", ] +[[package]] +name = "matchit" +version = "0.7.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0e7465ac9959cc2b1404e8e2367b43684a6d13790fe23056cc8c6c5a6b7bcb94" + [[package]] name = "matchit" version = "0.8.4" @@ -4529,18 +4726,17 @@ dependencies = [ [[package]] name = "parquet" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c7288a07e…

github-actions bot added sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels Aug 21, 2025

alamb mentioned this pull request Aug 21, 2025

Release arrow-rs / parquet Minor version 56.1.0 (August 2025) apache/arrow-rs#7837

Closed

5 tasks

alamb commented Aug 21, 2025

View reviewed changes

github-actions bot added the physical-plan Changes to the physical-plan crate label Aug 21, 2025

alamb commented Aug 21, 2025

View reviewed changes

alamb changed the title ~~WIP: Test upgrade to arrow 56.1.0~~ WIP: Uupgrade to arrow 56.1.0 Aug 21, 2025

alamb changed the title ~~WIP: Uupgrade to arrow 56.1.0~~ WIP: Upgrade to arrow 56.1.0 Aug 21, 2025

alamb mentioned this pull request Aug 22, 2025

Use the upstream arrow-rs coalesce kernel #17193

Merged

alamb mentioned this pull request Sep 2, 2025

chore(deps): bump the arrow-parquet group with 7 updates #17335

Closed

nuno-faria reviewed Sep 3, 2025

View reviewed changes

alamb mentioned this pull request Sep 4, 2025

chore(deps): bump the arrow-parquet group with 7 updates #17396

Closed

alamb force-pushed the alamb/update_arrow branch from 75c255e to 26d94a4 Compare September 9, 2025 17:50

github-actions bot added common Related to common crate proto Related to proto crate labels Sep 9, 2025

github-actions bot added the core Core DataFusion crate label Sep 9, 2025

alamb mentioned this pull request Sep 10, 2025

[Parquet] predicate cache over reports "cache read" metrics in some cases apache/arrow-rs#8307

Open

alamb force-pushed the alamb/update_arrow branch from 7f69441 to ceade58 Compare September 10, 2025 11:21

alamb commented Sep 10, 2025

View reviewed changes

alamb added 3 commits September 10, 2025 08:19

Update to arrow/parquet 56.1.0

7e1b3cf

Adjust for new parquet sizes, update for deprecated API

f7e2e86

Thread through max_predicate_cache_size, add test

7a6ea93

alamb force-pushed the alamb/update_arrow branch from ea466a5 to 7a6ea93 Compare September 10, 2025 12:19

alamb mentioned this pull request Sep 15, 2025

Upgrade arrow/parquet to 56.1.0 #17571

Closed

AdamGS mentioned this pull request Sep 15, 2025

Support Decimal32/64 types #17501

Merged

alamb marked this pull request as ready for review September 15, 2025 18:41

Merge branch 'main' into alamb/update_arrow

9b59020

github-actions bot added the documentation Improvements or additions to documentation label Sep 15, 2025

alamb mentioned this pull request Sep 15, 2025

Potential performance regression with parquet 56.1.0 / data ranges #17575

Open

alamb changed the title ~~WIP: Upgrade to arrow 56.1.0~~ Upgrade to arrow 56.1.0 Sep 15, 2025

alamb mentioned this pull request Sep 17, 2025

Update arrow / parquet to 56.2.0 #17631

Merged

AdamGS approved these changes Sep 17, 2025

View reviewed changes

Jefffrey approved these changes Sep 18, 2025

View reviewed changes

alamb added this pull request to the merge queue Sep 18, 2025

Merged via the queue into apache:main with commit 980c948 Sep 18, 2025
33 checks passed

Upgrade to arrow 56.1.0 #17275

Upgrade to arrow 56.1.0 #17275

Uh oh!

Conversation

alamb commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 22, 2025

Comparing HEAD and alamb_update_arrow

Benchmark clickbench_pushdown.json

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

nuno-faria Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Sep 4, 2025

Uh oh!

nuno-faria commented Sep 8, 2025

Uh oh!

alamb commented Sep 8, 2025

Uh oh!

alamb commented Sep 9, 2025

Uh oh!

nuno-faria commented Sep 9, 2025

Uh oh!

alamb commented Sep 9, 2025

Uh oh!

alamb commented Sep 10, 2025

Uh oh!

alamb Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

nuno-faria commented Sep 10, 2025

Uh oh!

alamb commented Sep 15, 2025

Uh oh!

AdamGS left a comment

Choose a reason for hiding this comment

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

alamb commented Aug 21, 2025 •

edited

Loading

alamb commented Sep 18, 2025 •

edited

Loading