- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.4k
HBASE-28195 set start row as prefix if a scan with PrefixFilter #5514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| There is a setStartStopRowForPrefixScan method for Scan already? I think it is exactly for the same purpose... | 
| Thanks for reviewing Duo. Yes, the setStartStopRowForPrefixScan method works for prefix filtering, but it can not work for range filtering. Maybe the title misunderstood you. What I want to introduce here, is like the query optimizer sub-system in RDBMS. It will optimize the scan range based on the filters that user sets. For example, if user want to scan data where rowkey > 'hhh' and rowkey < 'mmm', the optimizer can optimize start row to 'hhh' and stop row to 'mmm'. Compare to the default start row and stop row, EMPTY_START_ROW and EMPTY_STOP_ROW, this will help speed up scan request. | 
| Then let's change the title and post a simple design doc to discuss first? I think introducing a new mechanism is fine, but we need to discuss it first. At least, changing the Scan object passed in may break our users code... | 
| OK. Thanks for your advise Duo. Let me prepare the design doc first. | 
This PR introduces a ScanRangeOptimizer to try to reduce unnecessary reading of data based on filters user set.
For example, if user want to scan data where rowkey > 'hhh' and rowkey < 'mmm', the optimizer can optimize start row to 'hhh' and stop row to 'mmm'. Compare to the default start row and stop row, EMPTY_START_ROW and EMPTY_STOP_ROW, this will help speed up scan request.