-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-27826 Refactor code to move creation of Ref files to SFT interface apis #5834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This POC PR is primarily to get a high level overview of the list of changes and is intended to be broken once the initial review is done. |
93935f8 to
03596b9
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the POC, overall good.
It is a pain that we need to touch the MOB related code.
Anyway, I think first we could first do some refactorings, to move the reference file related logic to StoreFileTracker, without changing any real logic. And then, another refactoring to move HFileLink, and back reference files related logic to StoreFileTracker. And finally, we start to change the implementation of StoreFileTracker, to implement the special logic for FileBasedStoreFileTracker, and also consider how to migrate between different store file tracker implementions.
WDYT?
Thanks.
| int nbFiles = 0; | ||
| final Map<String, Collection<StoreFileInfo>> files = | ||
| new HashMap<String, Collection<StoreFileInfo>>(htd.getColumnFamilyCount()); | ||
| final Map<String, Pair<Collection<StoreFileInfo>, StoreFileTracker>> files = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a separate StoreFileTracker for every StoreFile? Seems strange...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we have a StoreFileTracker per column family here, we refactored current struct which contains [columnFamilyName] ---> [ Collection(StoreFileInfo) ] to
[columnFamilyName] ---> [ Collection(StoreFileInfo) + sft ]
| @Override | ||
| public Pair<Path, Path> call() throws IOException { | ||
| return splitStoreFile(regionFs, family, sf); | ||
| return splitStoreFile(regionFs, tracker, family, sf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we need to abstract at a higer level. As if we use file based store file tracker, we do not need multi threading. So we'd better abstract a method for splitting multiple store files in the store file tracker interface, and in the implementation, we are free to choose whether to use multi threading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if we can add this api as well in SFT, it will be easier to choose if where we want multi threading, but both impls might still need multi threading while initiating reader for all the parent store files while reading metadata for first/lastkeys.
But yes, we can get to it in detail as part of our last phase of implementation may be, where we want to commit all the ref/links in 1 go
| public StoreFileInfo(final Configuration conf, final FileSystem fs, final Path initialPath, | ||
| final boolean primaryReplica) throws IOException { | ||
| this(conf, fs, null, initialPath, primaryReplica); | ||
| final boolean primaryReplica, final StoreFileTracker sft) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a better choice is to move these logics into StoreFileTracker, and make the constructor of StoreFileInfo simpler, so we do not need to pass so many parameters in...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean, we move the creation of StoreFileInfo itself to SFT, like SFT.getStoreFileInfo(path, primaryReplica) something like that right? yes this does look better to me.
Couple of trivial constructors like these we can have in StoreFileInfo and rest we can move to SFT interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Apache9 Should we also have the HStoreFile created using SFT apis then?
| * @param familyName Column Family Name | ||
| * @return a set of {@link StoreFileInfo} for the specified family. | ||
| */ | ||
| public List<StoreFileInfo> getStoreFiles(final String familyName, final boolean validate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why putting this method here and there is no @Override annotation? It is for DefaultStoreFileTracker only? Who will call it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only helper method used as part of SFT#load() impl of DefaultStoreFileTracker, this was moved from HRegionFileSystem to here. This is not a new API we are adding
| Path backRefPath = null; | ||
| if (params.isCreateBackRef()) { | ||
| Path backRefssDir = HFileLink.getBackReferencesDir(archiveStoreDir, params.getHfileName()); | ||
| params.getFs().mkdirs(backRefssDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for now, we just move the logic here, but still always create a file on filesystem?
| } | ||
|
|
||
| @Override | ||
| public Reference createReference(Reference reference, Path path) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so just a placeholder right? Later we will implement different ways to create reference file and check whether there are references right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thats correct, we will move it respective impl classes once we have the implementation of virtual links ready in FSFT
Totally agree, we can target them 1 pr at a time.
Thanks a lot for the review :) Also around back references, we can support it as part of the api impl, but for the virtual links created as part of Split/Merge we dont need to create back references right? there are not moved to archive till these link files are compacted know |
5d18437 to
03596b9
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
Modifies HStoreFile/StoreFileInfo constructors to take SFT interface as a parameter.
Refactors direct interactions of Reference/HFileLink creations to SFT interface. Also moves getStoreFiles/hasReferences from HRegionFS to SFT impls.
Use the SFT interface to list files of store everywhere instead of using FS objects directly