docs: enhance benchmark example for maximizing CPU usage in DataFusion

kosiew · kosiew · commit ba11206d8a95 · 2025-08-29T20:45:27.000+08:00
diff --git a/docs/source/user-guide/configuration.rst b/docs/source/user-guide/configuration.rst
@@ -95,5 +95,52 @@ control:
     result = df.collect()
 
 
-You can read more about available :py:class:`~datafusion.context.SessionConfig` options in the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
+Benchmark Example
+^^^^^^^^^^^^^^^^^
+
+The repository includes a benchmark script that demonstrates how to maximize CPU usage
+with DataFusion. The :code:`benchmarks/max_cpu_usage.py` script shows a practical example
+of configuring DataFusion for optimal parallelism.
+
+You can run the benchmark script to see the impact of different configuration settings:
+
+.. code-block:: bash
+
+    # Run with default settings (uses all CPU cores)
+    python benchmarks/max_cpu_usage.py
+
+    # Run with specific number of rows and partitions
+    python benchmarks/max_cpu_usage.py --rows 5000000 --partitions 16
+
+    # See all available options
+    python benchmarks/max_cpu_usage.py --help
+
+Here's an example showing the performance difference between single and multiple partitions:
+
+.. code-block:: bash
+
+    # Single partition - slower processing
+    $ python benchmarks/max_cpu_usage.py --rows=10000000 --partitions 1
+    Processed 10000000 rows using 1 partitions in 0.107s
+
+    # Multiple partitions - faster processing
+    $ python benchmarks/max_cpu_usage.py --rows=10000000 --partitions 10
+    Processed 10000000 rows using 10 partitions in 0.038s
+
+This example demonstrates nearly 3x performance improvement (0.107s vs 0.038s) when using 
+10 partitions instead of 1, showcasing how proper partitioning can significantly improve 
+CPU utilization and query performance.
+
+The script demonstrates several key optimization techniques:
+
+1. **Higher target partition count**: Uses :code:`with_target_partitions()` to set the number of concurrent partitions
+2. **Automatic repartitioning**: Enables repartitioning for joins, aggregations, and window functions
+3. **Manual repartitioning**: Uses :code:`repartition()` to ensure all partitions are utilized
+4. **CPU-intensive operations**: Performs aggregations that can benefit from parallelization
+
+The benchmark creates synthetic data and measures the time taken to perform a sum aggregation
+across the specified number of partitions. This helps you understand how partition configuration
+affects performance on your specific hardware.
+
+For more information about available :py:class:`~datafusion.context.SessionConfig` options, see the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
 and about :code:`RuntimeEnvBuilder` options in the rust `online API documentation <https://docs.rs/datafusion/latest/datafusion/execution/runtime_env/struct.RuntimeEnvBuilder.html>`_.