You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/user-guide/configuration.rst
+48-1Lines changed: 48 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,5 +95,52 @@ control:
95
95
result = df.collect()
96
96
97
97
98
-
You can read more about available :py:class:`~datafusion.context.SessionConfig` options in the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
98
+
Benchmark Example
99
+
^^^^^^^^^^^^^^^^^
100
+
101
+
The repository includes a benchmark script that demonstrates how to maximize CPU usage
102
+
with DataFusion. The :code:`benchmarks/max_cpu_usage.py` script shows a practical example
103
+
of configuring DataFusion for optimal parallelism.
104
+
105
+
You can run the benchmark script to see the impact of different configuration settings:
Processed 10000000 rows using 10 partitions in 0.038s
129
+
130
+
This example demonstrates nearly 3x performance improvement (0.107s vs 0.038s) when using
131
+
10 partitions instead of 1, showcasing how proper partitioning can significantly improve
132
+
CPU utilization and query performance.
133
+
134
+
The script demonstrates several key optimization techniques:
135
+
136
+
1. **Higher target partition count**: Uses :code:`with_target_partitions()` to set the number of concurrent partitions
137
+
2. **Automatic repartitioning**: Enables repartitioning for joins, aggregations, and window functions
138
+
3. **Manual repartitioning**: Uses :code:`repartition()` to ensure all partitions are utilized
139
+
4. **CPU-intensive operations**: Performs aggregations that can benefit from parallelization
140
+
141
+
The benchmark creates synthetic data and measures the time taken to perform a sum aggregation
142
+
across the specified number of partitions. This helps you understand how partition configuration
143
+
affects performance on your specific hardware.
144
+
145
+
For more information about available :py:class:`~datafusion.context.SessionConfig` options, see the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
99
146
and about :code:`RuntimeEnvBuilder` options in the rust `online API documentation <https://docs.rs/datafusion/latest/datafusion/execution/runtime_env/struct.RuntimeEnvBuilder.html>`_.
0 commit comments