Skip to content

Conversation

@sureshthalamati
Copy link
Contributor

@sureshthalamati sureshthalamati commented Dec 8, 2016

What changes were proposed in this pull request?

Currently JDBC data source creates tables in the target database using the default type mapping, and the JDBC dialect mechanism.  If users want to specify different database data type for only some of columns, there is no option available. In scenarios where default mapping does not work, users are forced to create tables on the target database before writing. This workaround is probably not acceptable from a usability point of view. This PR is to provide a user-defined type mapping for specific columns.
 
The solution is based on the existing Redshift connector (https://github.com/databricks/spark-redshift#setting-a-custom-column-type). We add a new column metadata property to the jdbc data source for users to specify database column type using the metadata.
 
Example :

val nvarcharMd = new MetadataBuilder().putString(“createTableColumnType", "NVARCHAR(123)").build()
val newDf = df.withColumn("name", col("name"), nvarcharMd)
newDf.write.mode(SaveMode.Overwrite).jdbc(url, "TEST.USERDBTYPETEST", properties)

One restriction with this approach metadata modification is unsupported in the Python, SQL, and R language APIs. Users have to create a new data frame to specify the metadata with the createTableColumnType property.
 
Alternative approach is to add JDBC data source option for users to specify database column types information as JSON String. For more details , please refer to the JDBC OPTION PR.

TODO: Documentation for specifying the database column type

How was this patch tested?

Added new test case to the JDBCWriteSuite

@SparkQA
Copy link

SparkQA commented Dec 8, 2016

Test build #69849 has started for PR 16208 at commit 3834903.

@gatorsmile
Copy link
Member

gatorsmile commented Dec 8, 2016

@rxin @JoshRosen @srowen Which solution is preferred for supporting customized column types? Table-level JDBC option or column metadata property? Thanks!

FYI: This PR is based on column metadata property.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 9, 2016

Test build #69883 has finished for PR 16208 at commit 3834903.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mtrewartha
Copy link

@sureshthalamati Are you still planning on trying to get this merged soon? This would be a hugely useful feature for us!

@sureshthalamati
Copy link
Contributor Author

@miketrewartha Yes, i am hoping one of the fixes for this issues will get merged. I proposed two solutions this PR , and another one #16209. Waiting for feedback from committers.

@gatorsmile
Copy link
Member

@sureshthalamati Could you resolve the conflicts in both PRs? Thanks!

… users to specify database column type when creating table on write.
@sureshthalamati sureshthalamati force-pushed the jdbc_custom_dbtype-spark-10849 branch from 3834903 to 66c9e80 Compare January 15, 2017 20:16
@SparkQA
Copy link

SparkQA commented Jan 15, 2017

Test build #71404 has finished for PR 16208 at commit 66c9e80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Can you please close it, since the alternative PR is already merged?

@sureshthalamati
Copy link
Contributor Author

This issue is resolved by the alternate PR. Closing the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants