-
Couldn't load subscription status.
- Fork 9.1k
HDFS-14617 - Improve fsimage load time by writing sub-sections to the fsimage index #1028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
.../hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
Outdated
Show resolved
Hide resolved
.../hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, threads size should be not greater than target sections, otherwise the remaining threads will not be used or some other issues. So is it necessary to warn this configuration limit?
.../hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
Outdated
Show resolved
Hide resolved
...t/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
Outdated
Show resolved
Hide resolved
...t/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version I pushed removes this TODO and causes the load to fail if the number loaded != number expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some exceptions will invisible to users? it will be not easy to get root cause if meet some exceptions but the core exception is not at head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All exceptions will be logged by the executor here:
public void run() {
try {
totalLoaded.addAndGet(loadINodesInSection(ins, null));
prog.setCount(Phase.LOADING_FSIMAGE, currentStep,
totalLoaded.get());
} catch (Exception e) {
LOG.error("An exception occurred loading INodes in parallel", e);
exceptions.add(new IOException(e));
} finally {
...
It is fairly likely that if there are problems all exceptions will be from the same cause, so just throwing the first one, while ensuring all of them are logged should make debugging possible.
The alternative is to create a wrapper exception that all exceptions get stored into, and then throw that, but that would need to be caught and log out each exception it contains anyway, and those exceptions are already logged, so it would duplicate the information in the logs.
.../hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to add some unit test to cover serial loading and parallel loading for old&new format fsimage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I need to figure out how to add some tests for this overall change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably need to put a sample fsimage in old format in the resource directory (hadoop-hdfs-project/hadoop-hdfs/src/test/resources/), and load it in a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a unit test which ensures a parallel image can be created and loaded. It would be fairly easy to create another test which generates a non-parallel image, validates it is non-parallel and then attempt to load it with parallel enabled. Do you think that would cover what we need?
I also have a test that enables parallel and compression and then it verifies the parallel part is not used as compression disables it.
88a0f06 to
a426bb6
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Stephen and Xiaoqiao.
Patch looks good to me and I am only able to make a few useless comments.
| if (parallelEnabled) { | ||
| if (compressionEnabled) { | ||
| LOG.warn("Parallel Image loading is not supported when {} is set to" + | ||
| " true. Parallel loading will be disabled.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update hdfs-default.xml to include this caveat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a note to hdfs-default.xml for this.
.../hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
Show resolved
Hide resolved
| boolean loadInParallel = | ||
| conf.getBoolean(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, | ||
| DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_DEFAULT); | ||
| // TODO - check for compression and if enabled disable parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently checked when saving the fsimage, but not when loading it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed this TODO and also refactored the code a little to allow reuse between the saving and loading sections.
.../hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
Show resolved
Hide resolved
|
And we should make sure oiv tool works with this change. We can file another jira to address the oiv issue. |
@jojochuang Thanks for your suggestions, current patch @sodonnel submitted seems not include OIV tools, And I agree that we should file another JIRA to trace it. I would like to file it, and help to trace it if @sodonnel have no times. FYI. |
| if (child.isFile()) { | ||
| inodeList.add(child); | ||
| } | ||
| if (inodeList.size() >= 1000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define the value as a constant (final static)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
...t/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the performance benefit of batch update outweighs cache locality.
...t/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
Outdated
Show resolved
Hide resolved
| if (loadInParallel) { | ||
| executorService = Executors.newFixedThreadPool( | ||
| conf.getInt(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, | ||
| DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_DEFAULT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice to have: log an info message stating that parallel image loading is enabled and the number of threads used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a log message here, and validated the thread count setting is not less than 1, otherwise I reset it to the default (4). I also pulled the code to do this check and to create the executor into a private method to reduce the noise in the already too long loadInternal method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably need to put a sample fsimage in old format in the resource directory (hadoop-hdfs-project/hadoop-hdfs/src/test/resources/), and load it in a unit test
| } | ||
| if (inodeThreshold <= 0) { | ||
| LOG.warn("{} is set to {}. It must be greater than zero. Setting to" + | ||
| "default of {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" default"
^^ missing an extra space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
…ion offsets align when the image is created
b71ec0f to
fb27e28
Compare
|
💔 -1 overall
This message was automatically generated. |
I checked OIV, and it can load the images with parallel sections in the image index with no problems and it does not produce any warnings. The reason, is that this change simply adds additional sections to the image index, so we still have: This means that if a loader looks for certain sections, it does not matter which other sections are there, provided it ignores them. In the case of the OIV for the "delimited" processor, it uses this pattern: It loops over all the sections in the "FileSummary Index" looking for one it ones (INODE in the above example) and then ignore all others. In the case of the XML processor, which is probably the most important, it works in a very similar way to how the namenode loads the image. It loops over all sections and uses a case statement to process the sections it is interested in, and skips others: Note the default clause, where it does nothing if it encounters a section name it does not expect. I tested running the other processors, File Distribution, DetectCorruption and Web and they all worked with no issues. Two future improvements we could do in a new Jiras, are:
|
|
manually trigger a precommit rebuild |
|
💔 -1 overall
This message was automatically generated. |
|
@sodonnel for OIV tools, based on branch-2.7 I got the different result IIRC. unfortunately I do not dig it deeply at that time. I would like to test it again later and report it if meet some exceptions.
+1. Thanks @sodonnel |
|
+1 from me. I've reviewed it several times and I think this is good. Will let it sit for a few days for other folks to comment on. |
|
💔 -1 overall
This message was automatically generated. |
|
+1 (non-binding) from me. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
| // Image transfer chunksize | ||
| public static final String DFS_IMAGE_TRANSFER_CHUNKSIZE_KEY = "dfs.image.transfer.chunksize"; | ||
| public static final int DFS_IMAGE_TRANSFER_CHUNKSIZE_DEFAULT = 64 * 1024; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can add annotation?
// NameNode fsimage start parallel
…fsimage index (apache#1028). Contributed by Stephen O'Donnell. Ref: CDH-80870 Reviewed-by: He Xiaoqiao <[email protected]> (cherry picked from commit b67812e) Change-Id: I5ecc31d20ff930df7642fcce43abc81bb96bdefe
… used to merge PRs (see Contributor's Corner doc) (apache#1028)
…o longer used to merge PRs (see Contributor's Corner doc) (apache#1028)" (apache#1062) This reverts commit 7aa6195. Some committers still use this merge script instead of merging directly from GitHub, so adding this back for now. We can remove this merge script once everyone moves to the GitHub flow.
…fsimage index (apache#1028). Contributed by Stephen O'Donnell. Reviewed-by: He Xiaoqiao <[email protected]>
…fsimage index (apache#1028). Contributed by Stephen O'Donnell. Reviewed-by: He Xiaoqiao <[email protected]>
…ions to the fsimage index (apache#1028). Contributed by Stephen O'Donnell. Reviewed-by: He Xiaoqiao <[email protected]>
Initial code I used to perform my benchmarks for the above change. There are two parts to the change:
The code used to write the sub-sections to the image, which is fairly simple.
The code used to process the sub-sections in parallel. As much as possible this reuses the original code in multiple threads, but the inodeDirectory code needed to be moved around a bit to avoid synchronization issues.
The only sections with parallel loading so far are inode and inodeDirectory.
So far, there are no new tests for this feature. That is something I need to look into further if we agree this is a good change to move forward with.