Commit edc87d7
[SPARK-20107][DOC] Add spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version option to configuration.md
## What changes were proposed in this pull request?
Add `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` option to `configuration.md`.
Set `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2` can speed up [HadoopMapReduceCommitProtocol.commitJob](https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L121) for many output files.
All cloudera's hadoop 2.6.0-cdh5.4.0 or higher versions(see: https://github.com/cloudera/hadoop-common/commit/1c1236182304d4075276c00c4592358f428bc433 and https://github.com/cloudera/hadoop-common/commit/16b2de27321db7ce2395c08baccfdec5562017f0) and apache's hadoop 2.7.0 or higher versions support this improvement.
More see:
1. [MAPREDUCE-4815](https://issues.apache.org/jira/browse/MAPREDUCE-4815): Speed up FileOutputCommitter#commitJob for many output files.
2. [MAPREDUCE-6406](https://issues.apache.org/jira/browse/MAPREDUCE-6406): Update the default version for the property mapreduce.fileoutputcommitter.algorithm.version to 2.
## How was this patch tested?
Manual test and exist tests.
Author: Yuming Wang <[email protected]>
Closes #17442 from wangyum/SPARK-20107.1 parent 471de5d commit edc87d7
1 file changed
+9
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1137 | 1137 | | |
1138 | 1138 | | |
1139 | 1139 | | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
1140 | 1149 | | |
1141 | 1150 | | |
1142 | 1151 | | |
| |||
0 commit comments