-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-28988 Enhance WALPlayer for restore of BulkLoad #6523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: HBASE-28957
Are you sure you want to change the base?
Conversation
|
Need to update for newly suggested backup directory structure |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@ankitsol You need to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I found the following issues:
- All the new log lines are at the INFO level, which may not be necessary. Consider reducing them to DEBUG or TRACE in some cases.
- Javadoc/comments have not been updated to reflect the latest changes.
- There are code style issues that need to be fixed so we can run the unit tests and update them if necessary.
- New unit tests need to be added.
| */ | ||
| protected static class WALMapper | ||
| extends Mapper<WALKey, WALEdit, ImmutableBytesWritable, Mutation> { | ||
| extends Mapper<WALKey, WALEdit, ImmutableBytesWritable, Pair<Mutation, List<String>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of Pair here, can we use a Custom Class? So that The exclusivity between Mutation and BulkLoadFiles is enforced programmatically.
|
|
||
| // Retrieve configuration and set up file systems for backup and staging locations | ||
| Configuration conf = context.getConfiguration(); | ||
| Path backupLocation = new Path(conf.get(BULKLOAD_BACKUP_LOCATION)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for if backupLocation is not specified.
|
|
||
| try { | ||
| for (String file : bulkloadFilesWithFullPath) { | ||
| // Full file path from S3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the hardcoded S3 here
| List<String> stagingPaths = new ArrayList<>(); | ||
|
|
||
| try { | ||
| for (String file : bulkloadFilesWithFullPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not full paths, but the relative paths from namespace
| for (ExtendedCell cell : WALEditInternalHelper.getExtendedCells(value)) { | ||
| context.getCounter(Counter.CELLS_READ).increment(1); | ||
|
|
||
| if (CellUtil.matchingQualifier(cell, WALEdit.BULK_LOAD)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the processing of bulkloaded files can be simplified, and we could reduce the log level from INFO to DEBUG or TRACE in some cases.
| setupTime(conf, WALInputFormat.START_TIME_KEY); | ||
| setupTime(conf, WALInputFormat.END_TIME_KEY); | ||
| String inputDirs = args[0]; | ||
| String walDir = new Path(inputDirs, "WALs").toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. We are hard-coding the directories here.
We could introduce a new optional parameter that the user can specify if they have bulkloaded files for us to process.
For example:
hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2 -Dwal.bulk.backup.location=/bulkload-files-dir
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
As discussed internally we need to add this enhancement outside of the One thing I don't understand reading this patch: how the original code skipped BulkLoad edits and how the new code enabled them? |
7eb6a76 to
71efe29
Compare
Enhance WALPlayer for restore of BulkLoad WAL entries
https://issues.apache.org/jira/browse/HBASE-28988