Skip to content

Commit c462271

Browse files
committed
Address comments
1 parent 2dfe36f commit c462271

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

docs/sql-performance-tuning.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ that these options will be deprecated in future release as more optimizations ar
119119
<td>32</td>
120120
<td>
121121
Configures the threshold to enable parallel listing for job input paths. If the number of
122-
input paths is larger than this threshold, Spark will use parallel listing on the driver side.
122+
input paths is larger than this threshold, Spark will list the files by using Spark distributed job.
123123
Otherwise, it will fallback to sequential listing. This configuration is only effective when
124124
using file-based data sources such as Parquet, ORC and JSON.
125125
</td>

docs/tuning.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ parent RDD's number of partitions. You can pass the level of parallelism as a se
264264
or set the config property `spark.default.parallelism` to change the default.
265265
In general, we recommend 2-3 tasks per CPU core in your cluster.
266266

267-
# Parallel Listing on Input Paths
267+
## Parallel Listing on Input Paths
268268

269269
Sometimes you may also need to increase directory listing parallelism when job input has large number of directories,
270270
otherwise the process could take a very long time, especially when against object store like S3.

0 commit comments

Comments
 (0)