-
Notifications
You must be signed in to change notification settings - Fork 14
Add hooks for selecting the set of files for a table scan; also add an option for empty string -> null conversion #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hooks for selecting the set of files for a table scan; also add an option for empty string -> null conversion #68
Conversation
a6dd5a3 to
0adc99b
Compare
including a filesystem prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add doc strings for these new public methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
6f39327 to
dcbe683
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider setHadoopFileSelector(hadoopFileSelector: HadoopFileSelector) and unsetHadoopFileSelector(): Unit { hadoopFileSelector = None }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? One method means less code to write and maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just looks a little odd to me to set using an Option -- i.e. to setHadoopFileSelector(maybeAHadoopFileSelector) -- instead of to set with an actual instance and to explicitly clear instead of to set to None. I guess what I am saying is that it makes sense for the underlying this.hadoopFileSelector to be an Option (maybe there, maybe not), but that when setting or removing the hadoopFileSelector the caller of the method(s) would naturally have a concrete idea of what should be done and wrapping that concreteness in a maybe doesn't make obvious sense or improve the readability at the callsite of the set/unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also separate this into two cases, which may make the code maintenance with upstream changes a little easier.
case oi: HiveVarcharObjectInspector if emptyStringsAsNulls => ...
case oi: HiveVarcharObjectInspector =>
(value: Any, row: MutableRow, ordinal: Int) =>
row.setString(ordinal, oi.getPrimitiveJavaObject(value).getValue)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Add hooks for selecting the set of files for a table scan; also add an option for empty string -> null conversion
@markhamstra