Skip to content

Conversation

@gvprathyusha6
Copy link
Contributor

Modifies HStoreFile/StoreFileInfo constructors to take SFT interface as a parameter.
Refactors direct interactions of Reference/HFileLink creations to SFT interface. Also moves getStoreFiles/hasReferences from HRegionFS to SFT impls.
Use the SFT interface to list files of store everywhere instead of using FS objects directly

@gvprathyusha6
Copy link
Contributor Author

Modifies HStoreFile/StoreFileInfo constructors to take SFT interface as a parameter. Refactors direct interactions of Reference/HFileLink creations to SFT interface. Also moves getStoreFiles/hasReferences from HRegionFS to SFT impls. Use the SFT interface to list files of store everywhere instead of using FS objects directly

This POC PR is primarily to get a high level overview of the list of changes and is intended to be broken once the initial review is done.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the POC, overall good.

It is a pain that we need to touch the MOB related code.

Anyway, I think first we could first do some refactorings, to move the reference file related logic to StoreFileTracker, without changing any real logic. And then, another refactoring to move HFileLink, and back reference files related logic to StoreFileTracker. And finally, we start to change the implementation of StoreFileTracker, to implement the special logic for FileBasedStoreFileTracker, and also consider how to migrate between different store file tracker implementions.

WDYT?

Thanks.

int nbFiles = 0;
final Map<String, Collection<StoreFileInfo>> files =
new HashMap<String, Collection<StoreFileInfo>>(htd.getColumnFamilyCount());
final Map<String, Pair<Collection<StoreFileInfo>, StoreFileTracker>> files =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a separate StoreFileTracker for every StoreFile? Seems strange...

Copy link
Contributor Author

@gvprathyusha6 gvprathyusha6 Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we have a StoreFileTracker per column family here, we refactored current struct which contains [columnFamilyName] ---> [ Collection(StoreFileInfo) ] to
[columnFamilyName] ---> [ Collection(StoreFileInfo) + sft ]

@Override
public Pair<Path, Path> call() throws IOException {
return splitStoreFile(regionFs, family, sf);
return splitStoreFile(regionFs, tracker, family, sf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we need to abstract at a higer level. As if we use file based store file tracker, we do not need multi threading. So we'd better abstract a method for splitting multiple store files in the store file tracker interface, and in the implementation, we are free to choose whether to use multi threading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if we can add this api as well in SFT, it will be easier to choose if where we want multi threading, but both impls might still need multi threading while initiating reader for all the parent store files while reading metadata for first/lastkeys.
But yes, we can get to it in detail as part of our last phase of implementation may be, where we want to commit all the ref/links in 1 go

public StoreFileInfo(final Configuration conf, final FileSystem fs, final Path initialPath,
final boolean primaryReplica) throws IOException {
this(conf, fs, null, initialPath, primaryReplica);
final boolean primaryReplica, final StoreFileTracker sft) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a better choice is to move these logics into StoreFileTracker, and make the constructor of StoreFileInfo simpler, so we do not need to pass so many parameters in...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean, we move the creation of StoreFileInfo itself to SFT, like SFT.getStoreFileInfo(path, primaryReplica) something like that right? yes this does look better to me.
Couple of trivial constructors like these we can have in StoreFileInfo and rest we can move to SFT interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9 Should we also have the HStoreFile created using SFT apis then?

* @param familyName Column Family Name
* @return a set of {@link StoreFileInfo} for the specified family.
*/
public List<StoreFileInfo> getStoreFiles(final String familyName, final boolean validate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why putting this method here and there is no @Override annotation? It is for DefaultStoreFileTracker only? Who will call it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only helper method used as part of SFT#load() impl of DefaultStoreFileTracker, this was moved from HRegionFileSystem to here. This is not a new API we are adding

Path backRefPath = null;
if (params.isCreateBackRef()) {
Path backRefssDir = HFileLink.getBackReferencesDir(archiveStoreDir, params.getHfileName());
params.getFs().mkdirs(backRefssDir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for now, we just move the logic here, but still always create a file on filesystem?

}

@Override
public Reference createReference(Reference reference, Path path) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so just a placeholder right? Later we will implement different ways to create reference file and check whether there are references right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats correct, we will move it respective impl classes once we have the implementation of virtual links ready in FSFT

@gvprathyusha6
Copy link
Contributor Author

gvprathyusha6 commented Apr 18, 2024

Checked the POC, overall good.

It is a pain that we need to touch the MOB related code.

Anyway, I think first we could first do some refactorings, to move the reference file related logic to StoreFileTracker, without changing any real logic. And then, another refactoring to move HFileLink, and back reference files related logic to StoreFileTracker. And finally, we start to change the implementation of StoreFileTracker, to implement the special logic for FileBasedStoreFileTracker, and also consider how to migrate between different store file tracker implementions.

WDYT?

Thanks.

Checked the POC, overall good.

It is a pain that we need to touch the MOB related code.

Anyway, I think first we could first do some refactorings, to move the reference file related logic to StoreFileTracker, without changing any real logic. And then, another refactoring to move HFileLink, and back reference files related logic to StoreFileTracker. And finally, we start to change the implementation of StoreFileTracker, to implement the special logic for FileBasedStoreFileTracker, and also consider how to migrate between different store file tracker implementions.

WDYT?

Thanks.

Totally agree, we can target them 1 pr at a time.

  • to move the reference file related logic to StoreFileTracker
  • then, another refactoring to move HFileLink, and hasReference methods
  • then we start to change the implementation of StoreFileTracker, to implement the special logic for FileBasedStoreFileTracker to create virtual links
  • and commit all the ref files as part of split in 1 go

Thanks a lot for the review :)

Also around back references, we can support it as part of the api impl, but for the virtual links created as part of Split/Merge we dont need to create back references right? there are not moved to archive till these link files are compacted know

@gvprathyusha6 gvprathyusha6 requested a review from Apache9 April 19, 2024 14:39
@gvprathyusha6 gvprathyusha6 force-pushed the HBASE-27826 branch 2 times, most recently from 5d18437 to 03596b9 Compare April 24, 2025 15:08
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 buf 0m 0s buf was not available.
+0 🆗 buf 0m 0s buf was not available.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for branch
+1 💚 mvninstall 3m 52s master passed
+1 💚 compile 4m 3s master passed
+1 💚 checkstyle 0m 49s master passed
+1 💚 spotbugs 4m 28s master passed
+1 💚 spotless 0m 53s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 3m 41s the patch passed
+1 💚 compile 4m 2s the patch passed
+1 💚 cc 4m 2s the patch passed
+1 💚 javac 4m 2s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 41s /results-checkstyle-hbase-server.txt hbase-server: The patch generated 2 new + 72 unchanged - 0 fixed = 74 total (was 72)
-1 ❌ spotbugs 2m 4s /new-spotbugs-hbase-server.html hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 hadoopcheck 15m 36s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 hbaseprotoc 1m 56s the patch passed
+1 💚 spotless 1m 8s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 27s The patch does not generate ASF License warnings.
56m 50s
Reason Tests
SpotBugs module:hbase-server
Dead store to infos in org.apache.hadoop.hbase.regionserver.HRegionFileSystem.commitMergedRegion(List, MasterProcedureEnv) At HRegionFileSystem.java:org.apache.hadoop.hbase.regionserver.HRegionFileSystem.commitMergedRegion(List, MasterProcedureEnv) At HRegionFileSystem.java:[line 773]
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5834/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5834
Optional Tests dupname asflicense cc buflint bufcompat codespell detsecrets hbaseprotoc spotless javac spotbugs checkstyle compile hadoopcheck hbaseanti
uname Linux 266b8a011966 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 5d18437
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5834/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for branch
+1 💚 mvninstall 4m 0s master passed
+1 💚 compile 1m 47s master passed
+1 💚 javadoc 0m 46s master passed
+1 💚 shadedjars 6m 36s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 55s the patch passed
+1 💚 compile 2m 5s the patch passed
+1 💚 javac 2m 5s the patch passed
-0 ⚠️ javadoc 0m 44s /results-javadoc-javadoc-hbase-server.txt hbase-server generated 3 new + 63 unchanged - 0 fixed = 66 total (was 63)
+1 💚 shadedjars 9m 38s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 0m 48s hbase-protocol-shaded in the patch passed.
-1 ❌ unit 464m 57s /patch-unit-hbase-server.txt hbase-server in the patch failed.
524m 40s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5834/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #5834
Optional Tests unit javac javadoc compile shadedjars
uname Linux 172d7ceb769d 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 5d18437
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5834/3/testReport/
Max. process+thread count 4618 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5834/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants