-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17351] Refactor JDBCRDD to expose ResultSet -> Seq[Row] utility methods #14907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| */ | ||
| override def compute(thePart: Partition, context: TaskContext): Iterator[InternalRow] = | ||
| new Iterator[InternalRow] { | ||
| override def compute(thePart: Partition, context: TaskContext): Iterator[InternalRow] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While most of the changes in this block stem from moving the inner loop into a JdbcUtils method, there are one or two non-trivial changes that may impact cleanup in error situations. I'll comment on these changes below in order to help walk through them.
|
Test build #64745 has finished for PR 14907 at commit
|
|
I've merged #14911 so this should now be ready for review. |
| case _ => throw new IllegalArgumentException(s"Unsupported type ${dt.simpleString}") | ||
| } | ||
|
|
||
| private def nullSafeConvert[T](input: T, f: T => Any): Any = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be
private def nullSafeConvert[T, U](input: T, f: T => U): U
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is carried over from the old code. Good catch.
Changing this signature might actually improve performance in the ArrayType case because Scala should be able to statically determine that it can allocate primitive arrays once it knows the return type of nullSafeConvert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think that we can do this because I don't think that there's a way to set an upper type bound to say that U must be a nullable object, so you have to do a dangerous null.asInstanceOf cast. And since we're not working with primitives here there's no savings on boxing, etc. Therefore, I'd prefer to just leave this unchanged since it's a carryover from the old code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the U >: Null bound right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, forgot about using lower type bounds for this.
|
Test build #64816 has finished for PR 14907 at commit
|
|
LGTM. Merging to master. Thanks! |
|
We exposed |
This patch refactors the internals of the JDBC data source in order to allow some of its code to be re-used in an automated comparison testing harness. Here are the key changes:
ResultSetMetadatatoStructTypeconversion logic fromJDBCRDD.resolveTable()to theJdbcUtilsobject (as a newgetSchema(ResultSet, JdbcDialect)method), allowing it to be applied onResultSets that are created elsewhere.ResultSettoInternalRowconversion methods fromJDBCRDDtoJdbcUtils:JDBCValueGettertype andmakeGetterfunctions here given that their write-path counterparts (JDBCValueSetter) are already inJdbcUtils.resultSetToSparkInternalRowsmethod which takes aResultSetand schema and returns anIterator[InternalRow]. This effectively extracts the main loop ofJDBCRDDinto its own method.resultSetToRowsmethod toJdbcUtils, which wraps the minimal machinery aroundresultSetToSparkInternalRowsin order to allow it to be called from outside of a Spark job.JdbcDialect.getinto aDeveloperApi(JdbcDialectitself is already aDeveloperApi).Put together, these changes enable the following testing pattern: