Skip to content

Conversation

@ankitsol
Copy link

@ankitsol ankitsol commented Dec 6, 2024

Enhance WALPlayer for restore of BulkLoad WAL entries

https://issues.apache.org/jira/browse/HBASE-28988

@ankitsol
Copy link
Author

ankitsol commented Dec 6, 2024

Need to update for newly suggested backup directory structure

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@vinayakphegde
Copy link
Contributor

@ankitsol You need to run mvn spotless:apply to fix code style issues.

Copy link
Contributor

@vinayakphegde vinayakphegde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I found the following issues:

  • All the new log lines are at the INFO level, which may not be necessary. Consider reducing them to DEBUG or TRACE in some cases.
  • Javadoc/comments have not been updated to reflect the latest changes.
  • There are code style issues that need to be fixed so we can run the unit tests and update them if necessary.
  • New unit tests need to be added.

*/
protected static class WALMapper
extends Mapper<WALKey, WALEdit, ImmutableBytesWritable, Mutation> {
extends Mapper<WALKey, WALEdit, ImmutableBytesWritable, Pair<Mutation, List<String>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Pair here, can we use a Custom Class? So that The exclusivity between Mutation and BulkLoadFiles is enforced programmatically.


// Retrieve configuration and set up file systems for backup and staging locations
Configuration conf = context.getConfiguration();
Path backupLocation = new Path(conf.get(BULKLOAD_BACKUP_LOCATION));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for if backupLocation is not specified.


try {
for (String file : bulkloadFilesWithFullPath) {
// Full file path from S3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the hardcoded S3 here

List<String> stagingPaths = new ArrayList<>();

try {
for (String file : bulkloadFilesWithFullPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not full paths, but the relative paths from namespace

for (ExtendedCell cell : WALEditInternalHelper.getExtendedCells(value)) {
context.getCounter(Counter.CELLS_READ).increment(1);

if (CellUtil.matchingQualifier(cell, WALEdit.BULK_LOAD)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the processing of bulkloaded files can be simplified, and we could reduce the log level from INFO to DEBUG or TRACE in some cases.

setupTime(conf, WALInputFormat.START_TIME_KEY);
setupTime(conf, WALInputFormat.END_TIME_KEY);
String inputDirs = args[0];
String walDir = new Path(inputDirs, "WALs").toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. We are hard-coding the directories here.
We could introduce a new optional parameter that the user can specify if they have bulkloaded files for us to process.
For example:
hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2 -Dwal.bulk.backup.location=/bulkload-files-dir

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ HBASE-28957 Compile Tests _
+0 🆗 mvndep 0m 9s Maven dependency ordering for branch
+1 💚 mvninstall 2m 46s HBASE-28957 passed
+1 💚 compile 1m 6s HBASE-28957 passed
+1 💚 checkstyle 0m 23s HBASE-28957 passed
+1 💚 spotbugs 0m 59s HBASE-28957 passed
+1 💚 spotless 1m 7s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 3m 0s the patch passed
+1 💚 compile 1m 4s the patch passed
-0 ⚠️ javac 0m 33s /results-compile-javac-hbase-mapreduce.txt hbase-mapreduce generated 1 new + 197 unchanged - 1 fixed = 198 total (was 198)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 24s the patch passed
+1 💚 spotbugs 1m 6s the patch passed
+1 💚 hadoopcheck 9m 55s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 39s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 20s The patch does not generate ASF License warnings.
30m 9s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6523
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 68382c610f0a 5.4.0-200-generic #220-Ubuntu SMP Fri Sep 27 13:19:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / 12fdd6d
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-it U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ HBASE-28957 Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for branch
+1 💚 mvninstall 3m 34s HBASE-28957 passed
+1 💚 compile 0m 49s HBASE-28957 passed
+1 💚 javadoc 0m 31s HBASE-28957 passed
+1 💚 shadedjars 7m 27s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 4m 7s the patch passed
+1 💚 compile 0m 42s the patch passed
+1 💚 javac 0m 42s the patch passed
+1 💚 javadoc 0m 25s the patch passed
+1 💚 shadedjars 5m 37s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 27m 20s /patch-unit-hbase-mapreduce.txt hbase-mapreduce in the patch failed.
+1 💚 unit 0m 39s hbase-it in the patch passed.
53m 33s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6523
Optional Tests javac javadoc unit compile shadedjars
uname Linux b92f4b8e2b6c 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / 12fdd6d
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/2/testReport/
Max. process+thread count 3286 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-it U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@anmolnar
Copy link
Contributor

anmolnar commented Dec 19, 2024

As discussed internally we need to add this enhancement outside of the WALPlayer class. Perhaps we could try to extend WALPlayer and the mapper to create a new MR job with the two functionalities mixed together.

One thing I don't understand reading this patch: how the original code skipped BulkLoad edits and how the new code enabled them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants