[WIP][SPARK-10849][SQL] Adds a new column metadata property to the jdbc data source for users to specify database column type using the metadata #16208
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Currently JDBC data source creates tables in the target database using the default type mapping, and the JDBC dialect mechanism. If users want to specify different database data type for only some of columns, there is no option available. In scenarios where default mapping does not work, users are forced to create tables on the target database before writing. This workaround is probably not acceptable from a usability point of view. This PR is to provide a user-defined type mapping for specific columns.
The solution is based on the existing Redshift connector (https://github.com/databricks/spark-redshift#setting-a-custom-column-type). We add a new column metadata property to the jdbc data source for users to specify database column type using the metadata.
Example :
One restriction with this approach metadata modification is unsupported in the Python, SQL, and R language APIs. Users have to create a new data frame to specify the metadata with the createTableColumnType property.
Alternative approach is to add JDBC data source option for users to specify database column types information as JSON String. For more details , please refer to the JDBC OPTION PR.
TODO: Documentation for specifying the database column type
How was this patch tested?
Added new test case to the JDBCWriteSuite