Skip to content

Commit 294af6e

Browse files
committed
[SPARK-49680][PYTHON] Limit Sphinx build parallelism to 4 by default
### What changes were proposed in this pull request? This PR aims to limit `Sphinx` build parallelism to 4 by default for the following goals. - This will preserve the same speed in GitHub Action environment. - This will prevent the exhaustive `SparkSubmit` invocation in large machines like `c6i.24xlarge`. - The user still can override by providing `SPHINXOPTS`. ### Why are the changes needed? `Sphinx` parallelism feature was added via the following on 2024-01-10. - #44680 However, unfortunately, this breaks Python API doc generation in large machines because this means the number of parallel `SparkSubmit` invocation of PySpark. In addition, given that each `PySpark` currently is launched with `local[*]`, this ends up `N * N` `pyspark.daemon`s. In other words, as of today, this default setting, `auto`, seems to work on low-core machine like `GitHub Action` runners (4 cores). For example, this breaks `Python` documentations build even on M3 Max environment and this is worse on large EC2 machines (c7i.24xlarge). You can see the failure locally like this. ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ make html ... 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045. ... java.lang.OutOfMemoryError: Java heap space ... 24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker ... make: *** [html] Error 2 ``` ### Does this PR introduce _any_ user-facing change? No, this is a dev-only change. ### How was this patch tested? Pass the CIs and do manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48129 from dongjoon-hyun/SPARK-49680. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 8d78f5b commit 294af6e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

python/docs/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
# Minimal makefile for Sphinx documentation
1717

1818
# You can set these variables from the command line.
19-
SPHINXOPTS ?= "-W" "-j" "auto"
19+
SPHINXOPTS ?= "-W" "-j" "4"
2020
SPHINXBUILD ?= sphinx-build
2121
SOURCEDIR ?= source
2222
BUILDDIR ?= build

0 commit comments

Comments
 (0)