File tree Expand file tree Collapse file tree 2 files changed +2
-2
lines changed Expand file tree Collapse file tree 2 files changed +2
-2
lines changed Original file line number Diff line number Diff line change @@ -119,7 +119,7 @@ that these options will be deprecated in future release as more optimizations ar
119119 <td>32</td>
120120 <td>
121121 Configures the threshold to enable parallel listing for job input paths. If the number of
122- input paths is larger than this threshold, Spark will use parallel listing on the driver side .
122+ input paths is larger than this threshold, Spark will list the files by using Spark distributed job .
123123 Otherwise, it will fallback to sequential listing. This configuration is only effective when
124124 using file-based data sources such as Parquet, ORC and JSON.
125125 </td>
Original file line number Diff line number Diff line change @@ -264,7 +264,7 @@ parent RDD's number of partitions. You can pass the level of parallelism as a se
264264or set the config property ` spark.default.parallelism ` to change the default.
265265In general, we recommend 2-3 tasks per CPU core in your cluster.
266266
267- # Parallel Listing on Input Paths
267+ ## Parallel Listing on Input Paths
268268
269269Sometimes you may also need to increase directory listing parallelism when job input has large number of directories,
270270otherwise the process could take a very long time, especially when against object store like S3.
You can’t perform that action at this time.
0 commit comments