Skip to content

Commit edc87d7

Browse files
wangyumsrowen
authored andcommitted
[SPARK-20107][DOC] Add spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version option to configuration.md
## What changes were proposed in this pull request? Add `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` option to `configuration.md`. Set `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2` can speed up [HadoopMapReduceCommitProtocol.commitJob](https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L121) for many output files. All cloudera's hadoop 2.6.0-cdh5.4.0 or higher versions(see: https://github.com/cloudera/hadoop-common/commit/1c1236182304d4075276c00c4592358f428bc433 and https://github.com/cloudera/hadoop-common/commit/16b2de27321db7ce2395c08baccfdec5562017f0) and apache's hadoop 2.7.0 or higher versions support this improvement. More see: 1. [MAPREDUCE-4815](https://issues.apache.org/jira/browse/MAPREDUCE-4815): Speed up FileOutputCommitter#commitJob for many output files. 2. [MAPREDUCE-6406](https://issues.apache.org/jira/browse/MAPREDUCE-6406): Update the default version for the property mapreduce.fileoutputcommitter.algorithm.version to 2. ## How was this patch tested? Manual test and exist tests. Author: Yuming Wang <[email protected]> Closes #17442 from wangyum/SPARK-20107.
1 parent 471de5d commit edc87d7

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

docs/configuration.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1137,6 +1137,15 @@ Apart from these, the following properties are also available, and may be useful
11371137
mapping has high overhead for blocks close to or below the page size of the operating system.
11381138
</td>
11391139
</tr>
1140+
<tr>
1141+
<td><code>spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version</code></td>
1142+
<td>1</td>
1143+
<td>
1144+
The file output committer algorithm version, valid algorithm version number: 1 or 2.
1145+
Version 2 may have better performance, but version 1 may handle failures better in certain situations,
1146+
as per <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4815">MAPREDUCE-4815</a>.
1147+
</td>
1148+
</tr>
11401149
</table>
11411150

11421151
### Networking

0 commit comments

Comments
 (0)