Skip to content

Conversation

@sureshthalamati
Copy link
Contributor

This patch allows users to override default type mapping of data frame field to database column type when writing data frame to jdbc data sources.

In some cases user may want to use specific database data type mapping for fields based on the database configuration (page size , type of table spaces ..etc) instead of the defaults. For example large varchar size for all the columns may not fit in row size limits , user may want to use mix of varchar , and clob types. Max precision supported in some database systems might be less than the spark decimal precision, in such cases user can use this option to adjust the decimal type precision , and scale to match the target database.

Added a new field meta data property name db.column.type . I am not sure what is the convention for these type of property names. Please let me know it it needs to be changed.

@rxin @marmbrus

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44586 has finished for PR 9352 at commit 4048c2d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sureshthalamati
Copy link
Contributor Author

Failed test is test_trainOn_predictOn (main.StreamingKMeansTest). It seems to be unrelated to my changes. Can we retest this please.

@sureshthalamati
Copy link
Contributor Author

Thinking about this more , I realized current version of the patch may introduce SQL injection. I will update the pull request with a new version of the fix.

@sureshthalamati
Copy link
Contributor Author

Updated the patch to address sql injection issue by removing space characters from the input. Please review.

@marmbrus @rxin

@SparkQA
Copy link

SparkQA commented Oct 30, 2015

Test build #44699 has finished for PR 9352 at commit ef26084.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sureshthalamati
Copy link
Contributor Author

retest this please.

Test failure is unrelated to my changes. Failed test is org.apache.spark.sql.sources.JsonHadoopFsRelationSuite.test all data types - TimestampType. It passes in my environment.

@rick-ibm
Copy link
Contributor

rick-ibm commented Nov 4, 2015

Thanks for addressing the SQL injection concerns, Suresh. LGTM.

@sureshthalamati
Copy link
Contributor Author

Jenkins, retest this please.

@tristanreid
Copy link
Contributor

Anyone know the status of this change? Is there anything blocking, or was it superceded by something else? Thanks...

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@sureshthalamati
Copy link
Contributor Author

Opened two new [WIP] PRs to fix this issue using different approaches.
#16208
#16209

Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants