[SPARK-17351] Refactor JDBCRDD to expose ResultSet -> Seq[Row] utility methods #14907

JoshRosen · 2016-09-01T00:49:51Z

This patch refactors the internals of the JDBC data source in order to allow some of its code to be re-used in an automated comparison testing harness. Here are the key changes:

Move the JDBC ResultSetMetadata to StructType conversion logic from JDBCRDD.resolveTable() to the JdbcUtils object (as a new getSchema(ResultSet, JdbcDialect) method), allowing it to be applied on ResultSets that are created elsewhere.
Move the ResultSet to InternalRow conversion methods from JDBCRDD to JdbcUtils:
- It makes sense to move the JDBCValueGetter type and makeGetter functions here given that their write-path counterparts (JDBCValueSetter) are already in JdbcUtils.
- Add an internal resultSetToSparkInternalRows method which takes a ResultSet and schema and returns an Iterator[InternalRow]. This effectively extracts the main loop of JDBCRDD into its own method.
- Add a public resultSetToRows method to JdbcUtils, which wraps the minimal machinery around resultSetToSparkInternalRows in order to allow it to be called from outside of a Spark job.
Make JdbcDialect.get into a DeveloperApi (JdbcDialect itself is already a DeveloperApi).

Put together, these changes enable the following testing pattern:

val jdbResultSet: ResultSet = conn.prepareStatement(query).executeQuery()
val resultSchema: StructType = JdbcUtils.getSchema(jdbResultSet, JdbcDialects.get("jdbc:postgresql"))
val jdbcRows: Seq[Row] = JdbcUtils.resultSetToRows(jdbResultSet, schema).toSeq
checkAnswer(sparkResult, jdbcRows) // in a test case

…ernals

JoshRosen · 2016-09-01T00:58:00Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala

   */
-  override def compute(thePart: Partition, context: TaskContext): Iterator[InternalRow] =
-    new Iterator[InternalRow] {
+  override def compute(thePart: Partition, context: TaskContext): Iterator[InternalRow] = {


While most of the changes in this block stem from moving the inner loop into a JdbcUtils method, there are one or two non-trivial changes that may impact cleanup in error situations. I'll comment on these changes below in order to help walk through them.

SparkQA · 2016-09-01T02:46:16Z

Test build #64745 has finished for PR 14907 at commit 43cbef6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-09-01T05:55:08Z

Please merge #14911 ahead of this so that I can bring this up-to-date with that change. Merging in this order reduces the amount of work to backport #14911.

…ernals

JoshRosen · 2016-09-02T00:04:47Z

I've merged #14911 so this should now be ready for review.

srinathshankar · 2016-09-02T00:28:14Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

+    case _ => throw new IllegalArgumentException(s"Unsupported type ${dt.simpleString}")
+  }
+
+  private def nullSafeConvert[T](input: T, f: T => Any): Any = {


Can this be
private def nullSafeConvert[T, U](input: T, f: T => U): U

This is carried over from the old code. Good catch.

Changing this signature might actually improve performance in the ArrayType case because Scala should be able to statically determine that it can allocate primitive arrays once it knows the return type of nullSafeConvert.

Actually, I don't think that we can do this because I don't think that there's a way to set an upper type bound to say that U must be a nullable object, so you have to do a dangerous null.asInstanceOf cast. And since we're not working with primitives here there's no savings on boxing, etc. Therefore, I'd prefer to just leave this unchanged since it's a carryover from the old code.

You can use the U >: Null bound right?

Aha, forgot about using lower type bounds for this.

SparkQA · 2016-09-02T01:58:32Z

Test build #64816 has finished for PR 14907 at commit f09174b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-09-02T16:52:33Z

LGTM. Merging to master. Thanks!

gatorsmile · 2017-01-24T08:24:02Z

We exposed JdbcUtils.resultSetToRows, but forgot to add a test case.

JoshRosen added 7 commits August 25, 2016 15:05

Move ResultSet -> Seq[InternalRow] conversion into JdbcUtils

17d770a

Make new method private[spark]

682b591

Move getCatalystType to JdbcUtils and add new getSchema() method.

ec49acc

Add public resultSetToRows() method for converting to public rows.

025c9d0

Remove InputMetrics from a public API.

05dfe52

Open up JdbcDialects.get as developerapi.

fca548a

Merge remote-tracking branch 'origin/master' into modularize-jdbc-int…

43cbef6

…ernals

JoshRosen reviewed Sep 1, 2016
View reviewed changes

JoshRosen added 2 commits September 1, 2016 16:48

Merge remote-tracking branch 'origin/master' into modularize-jdbc-int…

1d725ad

…ernals

Merge remote-tracking branch 'origin/master' into modularize-jdbc-int…

f09174b

…ernals

srinathshankar reviewed Sep 2, 2016
View reviewed changes

asfgit closed this in 6bcbf9b Sep 2, 2016

JoshRosen deleted the modularize-jdbc-internals branch September 2, 2016 17:39

[SPARK-17351] Refactor JDBCRDD to expose ResultSet -> Seq[Row] utility methods #14907

[SPARK-17351] Refactor JDBCRDD to expose ResultSet -> Seq[Row] utility methods #14907

Uh oh!

Conversation

JoshRosen commented Sep 1, 2016

Uh oh!

JoshRosen Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 1, 2016

Uh oh!

JoshRosen commented Sep 1, 2016

Uh oh!

JoshRosen commented Sep 2, 2016

Uh oh!

srinathshankar Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JoshRosen Sep 2, 2016

Choose a reason for hiding this comment

Uh oh!

JoshRosen Sep 2, 2016

Choose a reason for hiding this comment

Uh oh!

hvanhovell Sep 2, 2016

Choose a reason for hiding this comment

Uh oh!

JoshRosen Sep 2, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 2, 2016

Uh oh!

hvanhovell commented Sep 2, 2016

Uh oh!

gatorsmile commented Jan 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

srinathshankar Sep 2, 2016 •

edited

Loading