Skip to content

Commit 38f10c6

Browse files
authored
HADOOP-19039. Hadoop 3.4.0 Highlight big features and improvements. (#6462) Contributed by Shilun Fan.
Reviewed-by: He Xiaoqiao <[email protected]> Signed-off-by: Shilun Fan <[email protected]>
1 parent caba9bb commit 38f10c6

File tree

1 file changed

+99
-61
lines changed

1 file changed

+99
-61
lines changed

hadoop-project/src/site/markdown/index.md.vm

Lines changed: 99 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -15,103 +15,141 @@
1515
Apache Hadoop ${project.version}
1616
================================
1717

18-
Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release branch.
18+
Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release branch.
1919

2020
Overview of Changes
2121
===================
2222

2323
Users are encouraged to read the full set of release notes.
2424
This page provides an overview of the major changes.
2525

26-
Azure ABFS: Critical Stream Prefetch Fix
26+
S3A: Upgrade AWS SDK to V2
2727
----------------------------------------
2828

29-
The abfs has a critical bug fix
30-
[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
31-
*ABFS. Disable purging list of in-progress reads in abfs stream close().*
29+
[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: Upgrade AWS SDK to V2
3230

33-
All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
34-
or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
31+
This release upgrade Hadoop's AWS connector S3A from AWS SDK for Java V1 to AWS SDK for Java V2.
32+
This is a significant change which offers a number of new features including the ability to work with Amazon S3 Express One Zone Storage - the new high performance, single AZ storage class.
3533

36-
Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
37-
*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
38-
for root cause analysis, details on what is affected, and mitigations.
34+
HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
35+
----------------------------------------
36+
37+
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one FsDatasetImpl lock to volume grain locks.
38+
39+
Throughput is one of the core performance evaluation for DataNode instance.
40+
However, it does not reach the best performance especially for Federation deploy all the time although there are different improvement,
41+
because of the global coarse-grain lock.
42+
These series issues (include [HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), [HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), [HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and [HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
43+
try to split the global coarse-grain lock to fine-grain lock which is double level lock for blockpool and volume,
44+
to improve the throughput and avoid lock impacts between blockpools and volumes.
45+
46+
YARN Federation improvements
47+
----------------------------------------
48+
49+
[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation improvements.
50+
51+
We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows:
52+
1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol.
53+
2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster.
54+
3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities.
55+
4. Audit logs and Metrics for Router received upgrades.
56+
5. A boost in cluster security features was achieved, with the inclusion of Kerberos support.
57+
6. The page function of the router has been enhanced.
58+
7. A set of commands has been added to the Router side for operating on SubClusters and Policies.
59+
60+
HDFS RBF: Code Enhancements, New Features, and Bug Fixes
61+
----------------------------------------
62+
63+
The HDFS RBF functionality has undergone significant enhancements, encompassing over 200 commits for feature
64+
improvements, new functionalities, and bug fixes.
65+
Important features and improvements are as follows:
66+
67+
**Feature**
68+
69+
[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) HDFS Federation balance tool introduces one tool to balance data across different namespace.
3970

71+
**Improvement**
4072

41-
Vectored IO API
42-
---------------
73+
[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers.
4374

44-
[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
45-
*High performance vectored read API in Hadoop*
75+
The SQLDelegationTokenSecretManager enhances performance by maintaining processed tokens in memory. However, there is
76+
a potential issue of router cache inconsistency due to token loading and renewal. This issue has been addressed by the
77+
resolution of HDFS-17128.
4678

47-
The `PositionedReadable` interface has now added an operation for
48-
Vectored IO (also known as Scatter/Gather IO):
79+
[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL.
4980

50-
```java
51-
void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
52-
```
81+
SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens from SQL in a memory cache with a short TTL,
82+
faces an issue where expired tokens are not efficiently cleaned up, leading to a buildup of expired tokens in the SQL database.
83+
This issue has been addressed by the resolution of HDFS-17148.
84+
85+
**Others**
86+
87+
Other changes to HDFS RBF include WebUI, command line, and other improvements. Please refer to the release document.
88+
89+
HDFS EC: Code Enhancements and Bug Fixes
90+
----------------------------------------
5391

54-
All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously,
55-
possibly in parallel, with results potentially coming in out-of-order.
92+
HDFS EC has made code improvements and fixed some bugs.
5693

57-
1. The default implementation uses a series of `readFully()` calls, so delivers
58-
equivalent performance.
59-
2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`.
60-
3. The S3A filesystem issues parallel HTTP GET requests in different threads.
94+
Important improvements and bugs are as follows:
6195

62-
Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://`
63-
show significant improvements in query performance.
96+
**Improvement**
6497

65-
Further Reading:
66-
* [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
67-
* [Hadoop Vectored IO: Your Data Just Got Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html)
68-
Apachecon 2022 talk.
98+
[HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks.
6999

70-
Mapreduce: Manifest Committer for Azure ABFS and google GCS
71-
----------------------------------------------------------
100+
In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The reason is unlike replication blocks can be replicated
101+
from any dn which has the same block replication, the ec block have to be replicated from the decommissioning dn.
102+
The configurations `dfs.namenode.replication.max-streams` and `dfs.namenode.replication.max-streams-hard-limit` will limit
103+
the replication speed, but increase these configurations will create risk to the whole cluster's network. So it should add a new
104+
configuration to limit the decommissioning dn, distinguished from the cluster wide max-streams limit.
72105

73-
The new _Intermediate Manifest Committer_ uses a manifest file
74-
to commit the work of successful task attempts, rather than
75-
renaming directories.
76-
Job commit is matter of reading all the manifests, creating the
77-
destination directories (parallelized) and renaming the files,
78-
again in parallel.
106+
[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) EC: Allow block reconstruction pending timeout refreshable to increase decommission performance.
79107

80-
This is both fast and correct on Azure Storage and Google GCS,
81-
and should be used there instead of the classic v1/v2 file
82-
output committers.
108+
In [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613), increase the value of `dfs.namenode.replication.max-streams-hard-limit` would maximize the IO
109+
performance of the decommissioning DN, which has a lot of EC blocks. Besides this, we also need to decrease the value of
110+
`dfs.namenode.reconstruction.pending.timeout-sec`, default is 5 minutes, to shorten the interval time for checking
111+
pendingReconstructions. Or the decommissioning node would be idle to wait for copy tasks in most of this 5 minutes.
112+
In decommission progress, we may need to reconfigure these 2 parameters several times. In [HDFS-14560](https://issues.apache.org/jira/browse/HDFS-14560), the
113+
`dfs.namenode.replication.max-streams-hard-limit` can already be reconfigured dynamically without namenode restart. And
114+
the `dfs.namenode.reconstruction.pending.timeout-sec` parameter also need to be reconfigured dynamically.
83115

84-
It is also safe to use on HDFS, where it should be faster
85-
than the v1 committer. It is however optimized for
86-
cloud storage where list and rename operations are significantly
87-
slower; the benefits may be less.
116+
**Bug**
88117

89-
More details are available in the
90-
[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
91-
documentation.
118+
[HDFS-16456](https://issues.apache.org/jira/browse/HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication.
92119

120+
In below scenario, decommission will fail by `TOO_MANY_NODES_ON_RACK` reason:
121+
- Enable EC policy, such as RS-6-3-1024k.
122+
- The rack number in this cluster is equal with or less than the replication number(9)
123+
- A rack only has one DN, and decommission this DN.
124+
This issue has been addressed by the resolution of HDFS-16456.
93125

94-
HDFS: Dynamic Datanode Reconfiguration
95-
--------------------------------------
126+
[HDFS-17094](https://issues.apache.org/jira/browse/HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes.
127+
During block recovery, the `RecoveryTaskStriped` in the datanode expects a one-to-one correspondence between
128+
`rBlock.getLocations()` and `rBlock.getBlockIndices()`. However, if there are stale locations during a NameNode heartbeat,
129+
this correspondence may be disrupted. Specifically, although there are no stale locations in `recoveryLocations`, the block indices
130+
array remains complete. This discrepancy causes `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate an incorrect
131+
internal block ID, leading to a failure in the recovery process as the corresponding datanode cannot locate the replica.
132+
This issue has been addressed by the resolution of HDFS-17094.
96133

97-
HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457.
134+
[HDFS-17284](https://issues.apache.org/jira/browse/HDFS-17284). EC: Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks during block recovery.
135+
Due to an integer overflow in the calculation of numReplicationTasks or numEcReplicatedTasks, the NameNode's configuration
136+
parameter `dfs.namenode.replication.max-streams-hard-limit` failed to take effect. This led to an excessive number of tasks
137+
being sent to the DataNodes, consequently occupying too much of their memory.
98138

99-
A number of Datanode configuration options can be changed without having to restart
100-
the datanode. This makes it possible to tune deployment configurations without
101-
cluster-wide Datanode Restarts.
139+
This issue has been addressed by the resolution of HDFS-17284.
102140

103-
See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
104-
for the list of dynamically reconfigurable attributes.
141+
**Others**
105142

143+
Other improvements and fixes for HDFS EC, Please refer to the release document.
106144

107145
Transitive CVE fixes
108146
--------------------
109147

110148
A lot of dependencies have been upgraded to address recent CVEs.
111149
Many of the CVEs were not actually exploitable through the Hadoop
112150
so much of this work is just due diligence.
113-
However applications which have all the library is on a class path may
114-
be vulnerable, and the ugprades should also reduce the number of false
151+
However, applications which have all the library is on a class path may
152+
be vulnerable, and the upgrades should also reduce the number of false
115153
positives security scanners report.
116154

117155
We have not been able to upgrade every single dependency to the latest
@@ -147,12 +185,12 @@ can, with care, keep data and computing resources private.
147185
1. Physical cluster: *configure Hadoop security*, usually bonded to the
148186
enterprise Kerberos/Active Directory systems.
149187
Good.
150-
1. Cloud: transient or persistent single or multiple user/tenant cluster
188+
2. Cloud: transient or persistent single or multiple user/tenant cluster
151189
with private VLAN *and security*.
152190
Good.
153191
Consider [Apache Knox](https://knox.apache.org/) for managing remote
154192
access to the cluster.
155-
1. Cloud: transient single user/tenant cluster with private VLAN
193+
3. Cloud: transient single user/tenant cluster with private VLAN
156194
*and no security at all*.
157195
Requires careful network configuration as this is the sole
158196
means of securing the cluster..

0 commit comments

Comments
 (0)