|
15 | 15 | Apache Hadoop ${project.version} |
16 | 16 | ================================ |
17 | 17 |
|
18 | | -Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release branch. |
| 18 | +Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release branch. |
19 | 19 |
|
20 | 20 | Overview of Changes |
21 | 21 | =================== |
22 | 22 |
|
23 | 23 | Users are encouraged to read the full set of release notes. |
24 | 24 | This page provides an overview of the major changes. |
25 | 25 |
|
26 | | -Azure ABFS: Critical Stream Prefetch Fix |
| 26 | +S3A: Upgrade AWS SDK to V2 |
27 | 27 | ---------------------------------------- |
28 | 28 |
|
29 | | -The abfs has a critical bug fix |
30 | | -[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546). |
31 | | -*ABFS. Disable purging list of in-progress reads in abfs stream close().* |
| 29 | +[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: Upgrade AWS SDK to V2 |
32 | 30 |
|
33 | | -All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade |
34 | | -or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0` |
| 31 | +This release upgrade Hadoop's AWS connector S3A from AWS SDK for Java V1 to AWS SDK for Java V2. |
| 32 | +This is a significant change which offers a number of new features including the ability to work with Amazon S3 Express One Zone Storage - the new high performance, single AZ storage class. |
35 | 33 |
|
36 | | -Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521) |
37 | | -*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests* |
38 | | -for root cause analysis, details on what is affected, and mitigations. |
| 34 | +HDFS DataNode Split one FsDatasetImpl lock to volume grain locks |
| 35 | +---------------------------------------- |
| 36 | + |
| 37 | +[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one FsDatasetImpl lock to volume grain locks. |
| 38 | + |
| 39 | +Throughput is one of the core performance evaluation for DataNode instance. |
| 40 | +However, it does not reach the best performance especially for Federation deploy all the time although there are different improvement, |
| 41 | +because of the global coarse-grain lock. |
| 42 | +These series issues (include [HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), [HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), [HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and [HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).) |
| 43 | +try to split the global coarse-grain lock to fine-grain lock which is double level lock for blockpool and volume, |
| 44 | +to improve the throughput and avoid lock impacts between blockpools and volumes. |
| 45 | + |
| 46 | +YARN Federation improvements |
| 47 | +---------------------------------------- |
| 48 | + |
| 49 | +[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation improvements. |
| 50 | + |
| 51 | +We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows: |
| 52 | +1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol. |
| 53 | +2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster. |
| 54 | +3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities. |
| 55 | +4. Audit logs and Metrics for Router received upgrades. |
| 56 | +5. A boost in cluster security features was achieved, with the inclusion of Kerberos support. |
| 57 | +6. The page function of the router has been enhanced. |
| 58 | +7. A set of commands has been added to the Router side for operating on SubClusters and Policies. |
| 59 | + |
| 60 | +HDFS RBF: Code Enhancements, New Features, and Bug Fixes |
| 61 | +---------------------------------------- |
| 62 | + |
| 63 | +The HDFS RBF functionality has undergone significant enhancements, encompassing over 200 commits for feature |
| 64 | +improvements, new functionalities, and bug fixes. |
| 65 | +Important features and improvements are as follows: |
| 66 | + |
| 67 | +**Feature** |
| 68 | + |
| 69 | +[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) HDFS Federation balance tool introduces one tool to balance data across different namespace. |
39 | 70 |
|
| 71 | +**Improvement** |
40 | 72 |
|
41 | | -Vectored IO API |
42 | | ---------------- |
| 73 | +[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers. |
43 | 74 |
|
44 | | -[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103). |
45 | | -*High performance vectored read API in Hadoop* |
| 75 | +The SQLDelegationTokenSecretManager enhances performance by maintaining processed tokens in memory. However, there is |
| 76 | +a potential issue of router cache inconsistency due to token loading and renewal. This issue has been addressed by the |
| 77 | +resolution of HDFS-17128. |
46 | 78 |
|
47 | | -The `PositionedReadable` interface has now added an operation for |
48 | | -Vectored IO (also known as Scatter/Gather IO): |
| 79 | +[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL. |
49 | 80 |
|
50 | | -```java |
51 | | -void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate) |
52 | | -``` |
| 81 | +SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens from SQL in a memory cache with a short TTL, |
| 82 | +faces an issue where expired tokens are not efficiently cleaned up, leading to a buildup of expired tokens in the SQL database. |
| 83 | +This issue has been addressed by the resolution of HDFS-17148. |
| 84 | + |
| 85 | +**Others** |
| 86 | + |
| 87 | +Other changes to HDFS RBF include WebUI, command line, and other improvements. Please refer to the release document. |
| 88 | + |
| 89 | +HDFS EC: Code Enhancements and Bug Fixes |
| 90 | +---------------------------------------- |
53 | 91 |
|
54 | | -All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously, |
55 | | -possibly in parallel, with results potentially coming in out-of-order. |
| 92 | +HDFS EC has made code improvements and fixed some bugs. |
56 | 93 |
|
57 | | -1. The default implementation uses a series of `readFully()` calls, so delivers |
58 | | - equivalent performance. |
59 | | -2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`. |
60 | | -3. The S3A filesystem issues parallel HTTP GET requests in different threads. |
| 94 | +Important improvements and bugs are as follows: |
61 | 95 |
|
62 | | -Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://` |
63 | | -show significant improvements in query performance. |
| 96 | +**Improvement** |
64 | 97 |
|
65 | | -Further Reading: |
66 | | -* [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html). |
67 | | -* [Hadoop Vectored IO: Your Data Just Got Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html) |
68 | | - Apachecon 2022 talk. |
| 98 | +[HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks. |
69 | 99 |
|
70 | | -Mapreduce: Manifest Committer for Azure ABFS and google GCS |
71 | | ----------------------------------------------------------- |
| 100 | +In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The reason is unlike replication blocks can be replicated |
| 101 | +from any dn which has the same block replication, the ec block have to be replicated from the decommissioning dn. |
| 102 | +The configurations `dfs.namenode.replication.max-streams` and `dfs.namenode.replication.max-streams-hard-limit` will limit |
| 103 | +the replication speed, but increase these configurations will create risk to the whole cluster's network. So it should add a new |
| 104 | +configuration to limit the decommissioning dn, distinguished from the cluster wide max-streams limit. |
72 | 105 |
|
73 | | -The new _Intermediate Manifest Committer_ uses a manifest file |
74 | | -to commit the work of successful task attempts, rather than |
75 | | -renaming directories. |
76 | | -Job commit is matter of reading all the manifests, creating the |
77 | | -destination directories (parallelized) and renaming the files, |
78 | | -again in parallel. |
| 106 | +[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) EC: Allow block reconstruction pending timeout refreshable to increase decommission performance. |
79 | 107 |
|
80 | | -This is both fast and correct on Azure Storage and Google GCS, |
81 | | -and should be used there instead of the classic v1/v2 file |
82 | | -output committers. |
| 108 | +In [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613), increase the value of `dfs.namenode.replication.max-streams-hard-limit` would maximize the IO |
| 109 | +performance of the decommissioning DN, which has a lot of EC blocks. Besides this, we also need to decrease the value of |
| 110 | +`dfs.namenode.reconstruction.pending.timeout-sec`, default is 5 minutes, to shorten the interval time for checking |
| 111 | +pendingReconstructions. Or the decommissioning node would be idle to wait for copy tasks in most of this 5 minutes. |
| 112 | +In decommission progress, we may need to reconfigure these 2 parameters several times. In [HDFS-14560](https://issues.apache.org/jira/browse/HDFS-14560), the |
| 113 | +`dfs.namenode.replication.max-streams-hard-limit` can already be reconfigured dynamically without namenode restart. And |
| 114 | +the `dfs.namenode.reconstruction.pending.timeout-sec` parameter also need to be reconfigured dynamically. |
83 | 115 |
|
84 | | -It is also safe to use on HDFS, where it should be faster |
85 | | -than the v1 committer. It is however optimized for |
86 | | -cloud storage where list and rename operations are significantly |
87 | | -slower; the benefits may be less. |
| 116 | +**Bug** |
88 | 117 |
|
89 | | -More details are available in the |
90 | | -[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html). |
91 | | -documentation. |
| 118 | +[HDFS-16456](https://issues.apache.org/jira/browse/HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication. |
92 | 119 |
|
| 120 | +In below scenario, decommission will fail by `TOO_MANY_NODES_ON_RACK` reason: |
| 121 | +- Enable EC policy, such as RS-6-3-1024k. |
| 122 | +- The rack number in this cluster is equal with or less than the replication number(9) |
| 123 | +- A rack only has one DN, and decommission this DN. |
| 124 | +This issue has been addressed by the resolution of HDFS-16456. |
93 | 125 |
|
94 | | -HDFS: Dynamic Datanode Reconfiguration |
95 | | --------------------------------------- |
| 126 | +[HDFS-17094](https://issues.apache.org/jira/browse/HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes. |
| 127 | +During block recovery, the `RecoveryTaskStriped` in the datanode expects a one-to-one correspondence between |
| 128 | +`rBlock.getLocations()` and `rBlock.getBlockIndices()`. However, if there are stale locations during a NameNode heartbeat, |
| 129 | +this correspondence may be disrupted. Specifically, although there are no stale locations in `recoveryLocations`, the block indices |
| 130 | +array remains complete. This discrepancy causes `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate an incorrect |
| 131 | +internal block ID, leading to a failure in the recovery process as the corresponding datanode cannot locate the replica. |
| 132 | +This issue has been addressed by the resolution of HDFS-17094. |
96 | 133 |
|
97 | | -HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457. |
| 134 | +[HDFS-17284](https://issues.apache.org/jira/browse/HDFS-17284). EC: Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks during block recovery. |
| 135 | +Due to an integer overflow in the calculation of numReplicationTasks or numEcReplicatedTasks, the NameNode's configuration |
| 136 | +parameter `dfs.namenode.replication.max-streams-hard-limit` failed to take effect. This led to an excessive number of tasks |
| 137 | +being sent to the DataNodes, consequently occupying too much of their memory. |
98 | 138 |
|
99 | | -A number of Datanode configuration options can be changed without having to restart |
100 | | -the datanode. This makes it possible to tune deployment configurations without |
101 | | -cluster-wide Datanode Restarts. |
| 139 | +This issue has been addressed by the resolution of HDFS-17284. |
102 | 140 |
|
103 | | -See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361) |
104 | | -for the list of dynamically reconfigurable attributes. |
| 141 | +**Others** |
105 | 142 |
|
| 143 | +Other improvements and fixes for HDFS EC, Please refer to the release document. |
106 | 144 |
|
107 | 145 | Transitive CVE fixes |
108 | 146 | -------------------- |
109 | 147 |
|
110 | 148 | A lot of dependencies have been upgraded to address recent CVEs. |
111 | 149 | Many of the CVEs were not actually exploitable through the Hadoop |
112 | 150 | so much of this work is just due diligence. |
113 | | -However applications which have all the library is on a class path may |
114 | | -be vulnerable, and the ugprades should also reduce the number of false |
| 151 | +However, applications which have all the library is on a class path may |
| 152 | +be vulnerable, and the upgrades should also reduce the number of false |
115 | 153 | positives security scanners report. |
116 | 154 |
|
117 | 155 | We have not been able to upgrade every single dependency to the latest |
@@ -147,12 +185,12 @@ can, with care, keep data and computing resources private. |
147 | 185 | 1. Physical cluster: *configure Hadoop security*, usually bonded to the |
148 | 186 | enterprise Kerberos/Active Directory systems. |
149 | 187 | Good. |
150 | | -1. Cloud: transient or persistent single or multiple user/tenant cluster |
| 188 | +2. Cloud: transient or persistent single or multiple user/tenant cluster |
151 | 189 | with private VLAN *and security*. |
152 | 190 | Good. |
153 | 191 | Consider [Apache Knox](https://knox.apache.org/) for managing remote |
154 | 192 | access to the cluster. |
155 | | -1. Cloud: transient single user/tenant cluster with private VLAN |
| 193 | +3. Cloud: transient single user/tenant cluster with private VLAN |
156 | 194 | *and no security at all*. |
157 | 195 | Requires careful network configuration as this is the sole |
158 | 196 | means of securing the cluster.. |
|
0 commit comments