Skip to content

Conversation

@shubham-roy
Copy link
Contributor

@shubham-roy shubham-roy commented Nov 4, 2024

A flag is introduced, which, when enabled, allows RowCounter to count the various types of Delete Markers - DELETE_COLUMN, DELETE_FAMILY, DELETE_FAMILY_VERSION. It will also calculate the number of rows having a delete marker.

To enable this the scan object was modified -> if flag is set, raw scan is performed without FirstKeyOnlyFilter.

@NihalJain
Copy link
Contributor

BTW thanks @shubham-roy for your first PR in Apache HBase, I have added some review comments, please have a look and please let me know if you have any doubts or need any help!

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@shubham-roy shubham-roy requested a review from NihalJain November 5, 2024 06:31
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

private boolean countDeleteMarkers;
private List<String> columns = new ArrayList<>();

private Job job;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesnot look necessary, we have been validating counters with following logic for existing tests. Please update tests to take a similar approach:

   * Run the RowCounter map reduce job and verify the row count.
   * @param args          the command line arguments to be used for rowcounter job.
   * @param expectedCount the expected row count (result of map reduce job).
   * @throws Exception in case of any unexpected error.
   */
  private void runCreateSubmittableJobWithArgs(String[] args, int expectedCount) throws Exception {
    Job job = RowCounter.createSubmittableJob(TEST_UTIL.getConfiguration(), args);
    long start = EnvironmentEdgeManager.currentTime();
    job.waitForCompletion(true);
    long duration = EnvironmentEdgeManager.currentTime() - start;
    LOG.debug("row count duration (ms): " + duration);
    assertTrue(job.isSuccessful());
    Counter counter = job.getCounters().findCounter(RowCounter.RowCounterMapper.Counters.ROWS);
    assertEquals(expectedCount, counter.getValue());
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NihalJain, the method runCreateSubmittableJobWithArgs internally calls RowCounter.createSubmittableJob(TEST_UTIL.getConfiguration(), args). However, the method createSubmittableJob is marked for deprecation - code link. So ideally, I believe (please correct me if I am wrong), we should not be making a change to that method. To use that method, we have to change the scan behaviour based on the flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shubham-roy I mean we could rewriten a helper in tests similar to above example method runCreateSubmittableJobWithArgs and make assertions. IMO we should try to get rid of deprecated API as another task than mixing implementations and doing same thing in different ways at different places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should try to get rid of deprecated API as another task than mixing implementations and doing same thing in different ways at different places.

@NihalJain , don't you think that the access to the job object via a getter method (which I exposed) could be a good starting point to getting rid of the deprecated method createSubmittableJob. I already used it in a way that could be easily extended to other use cases as well. LMK what do you think.

Copy link
Contributor

@NihalJain NihalJain Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but I would prefer to do that as another cleanup task for separation of concerns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but I would prefer to do that as another cleanup task for separation of concerns

@NihalJain , I agree and I am also not touching any of the other tests. I just used whatever is needed for my testing in an extensible way. Fixing of remaining tests can be taken up as a separate cleanup task.

Copy link
Contributor

@NihalJain NihalJain Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will leave this upto others as i am still not convinced. +0 from me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@virajjasani , can you please have a look at this thread and let us know of your thoughts on the same?

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@shubham-roy shubham-roy requested a review from NihalJain November 7, 2024 13:27
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@virajjasani
Copy link
Contributor

@NihalJain A gentle reminder whenever you are ready to take another look!

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

private boolean countDeleteMarkers;
private List<String> columns = new ArrayList<>();

private Job job;
Copy link
Contributor

@NihalJain NihalJain Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will leave this upto others as i am still not convinced. +0 from me.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@NihalJain
Copy link
Contributor

Thanks for addressing the review comments @shubham-roy. LGTM !

Before merge, please ensure to keep jira and github PR title in sync. Also please add release notes in jira to explain the behaviour of the new flag and how to make use of it.

Thanks for this nice feature.

@shubham-roy
Copy link
Contributor Author

Thank you @NihalJain for all the review!

@shubham-roy shubham-roy changed the title HBASE-28328 Added feature to count cells and delete markers in RowCounter. HBASE-28328 Add an option to count different types of Delete Markers in RowCounter Nov 20, 2024
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@virajjasani
Copy link
Contributor

I think we are good to merge the PR then?
@shubham-roy could you also create a PR against branch-2?

@shubham-roy
Copy link
Contributor Author

I think we are good to merge the PR then?
@shubham-roy could you also create a PR against branch-2?

@virajjasani , yes the PR is good to merge.
PR against branch-2: #6496

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 51s master passed
+1 💚 compile 0m 41s master passed
+1 💚 checkstyle 0m 13s master passed
+1 💚 spotbugs 0m 39s master passed
+1 💚 spotless 0m 50s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 16s the patch passed
+1 💚 compile 0m 37s the patch passed
+1 💚 javac 0m 37s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 12s the patch passed
+1 💚 xmllint 0m 0s No new issues.
+1 💚 spotbugs 0m 42s the patch passed
+1 💚 hadoopcheck 11m 52s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 46s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
31m 34s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6435/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6435
Optional Tests dupname asflicense javac codespell detsecrets xmllint hadoopcheck spotless compile spotbugs checkstyle hbaseanti
uname Linux f8c49ce6d4ba 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f4a1b12
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-mapreduce U: hbase-mapreduce
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6435/10/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 xmllint=20913
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 43s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 58s master passed
+1 💚 compile 0m 22s master passed
+1 💚 javadoc 0m 16s master passed
+1 💚 shadedjars 5m 21s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 52s the patch passed
+1 💚 compile 0m 22s the patch passed
+1 💚 javac 0m 22s the patch passed
+1 💚 javadoc 0m 16s the patch passed
+1 💚 shadedjars 5m 21s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 19m 11s hbase-mapreduce in the patch passed.
38m 53s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6435/10/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6435
Optional Tests javac javadoc unit shadedjars compile
uname Linux b4e5c4e278bf 5.4.0-195-generic #215-Ubuntu SMP Fri Aug 2 18:28:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f4a1b12
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6435/10/testReport/
Max. process+thread count 2416 (vs. ulimit of 30000)
modules C: hbase-mapreduce U: hbase-mapreduce
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6435/10/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani virajjasani merged commit 240bc3f into apache:master Dec 2, 2024
1 check passed
virajjasani pushed a commit that referenced this pull request Dec 2, 2024
virajjasani pushed a commit that referenced this pull request Dec 2, 2024
virajjasani pushed a commit that referenced this pull request Dec 2, 2024
virajjasani pushed a commit that referenced this pull request Dec 2, 2024
gvprathyusha6 pushed a commit to gvprathyusha6/hbase that referenced this pull request Dec 19, 2024
mokai87 pushed a commit to mokai87/hbase that referenced this pull request Aug 7, 2025
sanjeet006py pushed a commit to sanjeet006py/hbase that referenced this pull request Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants