-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-28328 Add an option to count different types of Delete Markers in RowCounter #6435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
Outdated
Show resolved
Hide resolved
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
Outdated
Show resolved
Hide resolved
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
Show resolved
Hide resolved
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
Outdated
Show resolved
Hide resolved
|
BTW thanks @shubham-roy for your first PR in Apache HBase, I have added some review comments, please have a look and please let me know if you have any doubts or need any help! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| private boolean countDeleteMarkers; | ||
| private List<String> columns = new ArrayList<>(); | ||
|
|
||
| private Job job; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesnot look necessary, we have been validating counters with following logic for existing tests. Please update tests to take a similar approach:
* Run the RowCounter map reduce job and verify the row count.
* @param args the command line arguments to be used for rowcounter job.
* @param expectedCount the expected row count (result of map reduce job).
* @throws Exception in case of any unexpected error.
*/
private void runCreateSubmittableJobWithArgs(String[] args, int expectedCount) throws Exception {
Job job = RowCounter.createSubmittableJob(TEST_UTIL.getConfiguration(), args);
long start = EnvironmentEdgeManager.currentTime();
job.waitForCompletion(true);
long duration = EnvironmentEdgeManager.currentTime() - start;
LOG.debug("row count duration (ms): " + duration);
assertTrue(job.isSuccessful());
Counter counter = job.getCounters().findCounter(RowCounter.RowCounterMapper.Counters.ROWS);
assertEquals(expectedCount, counter.getValue());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NihalJain, the method runCreateSubmittableJobWithArgs internally calls RowCounter.createSubmittableJob(TEST_UTIL.getConfiguration(), args). However, the method createSubmittableJob is marked for deprecation - code link. So ideally, I believe (please correct me if I am wrong), we should not be making a change to that method. To use that method, we have to change the scan behaviour based on the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shubham-roy I mean we could rewriten a helper in tests similar to above example method runCreateSubmittableJobWithArgs and make assertions. IMO we should try to get rid of deprecated API as another task than mixing implementations and doing same thing in different ways at different places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should try to get rid of deprecated API as another task than mixing implementations and doing same thing in different ways at different places.
@NihalJain , don't you think that the access to the job object via a getter method (which I exposed) could be a good starting point to getting rid of the deprecated method createSubmittableJob. I already used it in a way that could be easily extended to other use cases as well. LMK what do you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but I would prefer to do that as another cleanup task for separation of concerns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but I would prefer to do that as another cleanup task for separation of concerns
@NihalJain , I agree and I am also not touching any of the other tests. I just used whatever is needed for my testing in an extensible way. Fixing of remaining tests can be taken up as a separate cleanup task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will leave this upto others as i am still not convinced. +0 from me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@virajjasani , can you please have a look at this thread and let us know of your thoughts on the same?
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
Outdated
Show resolved
Hide resolved
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
Show resolved
Hide resolved
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@NihalJain A gentle reminder whenever you are ready to take another look! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| private boolean countDeleteMarkers; | ||
| private List<String> columns = new ArrayList<>(); | ||
|
|
||
| private Job job; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will leave this upto others as i am still not convinced. +0 from me.
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
Outdated
Show resolved
Hide resolved
|
Thanks for addressing the review comments @shubham-roy. LGTM ! Before merge, please ensure to keep jira and github PR title in sync. Also please add release notes in jira to explain the behaviour of the new flag and how to make use of it. Thanks for this nice feature. |
|
Thank you @NihalJain for all the review! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I think we are good to merge the PR then? |
@virajjasani , yes the PR is good to merge. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…in RowCounter (#6435) Signed-off-by: Nihal Jain <[email protected]>
…in RowCounter (#6496) (#6435) Signed-off-by: Nihal Jain <[email protected]>
…in RowCounter (#6496) (#6435) Signed-off-by: Nihal Jain <[email protected]>
…in RowCounter (#6496) (#6435) Signed-off-by: Nihal Jain <[email protected]>
…in RowCounter (apache#6435) Signed-off-by: Nihal Jain <[email protected]>
…in RowCounter (apache#6496) (apache#6435) Signed-off-by: Nihal Jain <[email protected]>
…in RowCounter (apache#6496) (apache#6435) Signed-off-by: Nihal Jain <[email protected]>
A flag is introduced, which, when enabled, allows RowCounter to count the various types of Delete Markers - DELETE_COLUMN, DELETE_FAMILY, DELETE_FAMILY_VERSION. It will also calculate the number of rows having a delete marker.
To enable this the scan object was modified -> if flag is set, raw scan is performed without FirstKeyOnlyFilter.