Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 30, 2020

What changes were proposed in this pull request?

  1. Override the default SQL strings in the Oracle Dialect for:
    • ALTER TABLE ADD COLUMN
    • ALTER TABLE UPDATE COLUMN TYPE
    • ALTER TABLE UPDATE COLUMN NULLABILITY
  2. Add new docker integration test suite jdbc/v2/OracleIntegrationSuite.scala

Why are the changes needed?

In SPARK-24907, we implemented JDBC v2 Table Catalog but it doesn't support some ALTER TABLE at the moment. This PR supports Oracle specific ALTER TABLE.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

By running new integration test suite:

$ ./build/sbt -Pdocker-integration-tests "test-only *.OracleIntegrationSuite"

@SparkQA
Copy link

SparkQA commented Sep 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33890/

@SparkQA
Copy link

SparkQA commented Sep 30, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33890/

@SparkQA
Copy link

SparkQA commented Sep 30, 2020

Test build #129273 has finished for PR 29912 at commit b7c4ea5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSparkSession

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 30, 2020

@dongjoon-hyun @maropu @huaxingao May I ask you to review this PR.

* $ export ORACLE_DOCKER_IMAGE_NAME=oracle/database:18.4.0-xe
* $ cd $SPARK_HOME
* $ ./build/sbt -Pdocker-integration-tests
* "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: you mean "org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite", right?

@huaxingao
Copy link
Contributor

LGTM

import org.apache.spark.sql.jdbc.{DatabaseOnDocker, DockerJDBCIntegrationSuite}
import org.apache.spark.sql.test.SharedSparkSession
import org.apache.spark.sql.types._
import org.apache.spark.tags.DockerTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz remove all the unused imports.

* It has been validated with 18.4.0 Express Edition.
*/
@DockerTest
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSparkSession {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we run the existing tests of o.a.s.s.jdbc.OracleIntegrationSuite for the V2 JDBC path? I think it is okay to fix it in a separate PR though.

Copy link
Member Author

@MaxGekk MaxGekk Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened the JIRA ticket for that https://issues.apache.org/jira/browse/SPARK-33066, let's do that separately. Probably, we will need to split the ticket per each supported dialect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, looks okay. Thanks for opening it, Max!

test("SPARK-33034: alter table ... add column") {
withTable("oracle.alt_table") {
sql("CREATE TABLE oracle.alt_table (ID STRING) USING _")
sql("ALTER TABLE oracle.alt_table ADD COLUMNS (C1 STRING, C2 STRING)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ALTER command always succeed? What if we add a new column having the same name with the existing column? Anyway, I think it is better to add some tests for error cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test here dialect specific changes:
ALTER TABLE ... ALTER COLUMN vs ALTER TABLE ... ADD
I believe the test for error handling should be added to JDBCTableCatalogSuite since error handling should be generic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added negative tests to the common tests #29945

test("SPARK-33034: alter table ... update column type") {
withTable("oracle.alt_table") {
sql("CREATE TABLE oracle.alt_table (ID INTEGER) USING _")
sql("ALTER TABLE oracle.alt_table ALTER COLUMN id TYPE STRING")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: We can alter a column from a string type to a int one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No:

rg.apache.spark.sql.AnalysisException: Cannot update alt_table field ID: string cannot be cast to int; line 1 pos 0;
[info] AlterTable org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog@3ebc40fe, alt_table, RelationV2[ID#25] alt_table, [org.apache.spark.sql.connector.catalog.TableChange$UpdateColumnType@ce035e83]
[info]   at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
[info]   at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$31(CheckAnalysis.scala:528)
[info]   at scala.collection.immutable.List.foreach(List.scala:392)
[info]   at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:489)

I will add a check for that.

test("SPARK-33034: alter table ... update column nullability") {
withTable("oracle.alt_table") {
sql("CREATE TABLE oracle.alt_table (ID STRING NOT NULL) USING _")
sql("ALTER TABLE oracle.alt_table ALTER COLUMN ID DROP NOT NULL")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: What if we drop a non-existent column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add negative tests.

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34015/

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34015/

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Test build #129408 has finished for PR 29912 at commit d124fa7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34018/

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34018/

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Test build #129411 has finished for PR 29912 at commit aa121d8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 5, 2020

The last build failure is not related to the changes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked that v2/OracleIntegrationSuite passed in my local env. cc: @dongjoon-hyun

override val connectionTimeout = timeout(7.minutes)
override def dataPreparation(conn: Connection): Unit = {}

test("SPARK-33034: alter table ... add column") {
Copy link
Contributor

@cloud-fan cloud-fan Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: upper case the sql keyword: ALTER TABLE ... add new columns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the SQL parser case agnostic one ? ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is. It's a convention that people upper case the sql keyword when writing sql queries.

expectedSchema = expectedSchema.add("C3", StringType)
assert(t.schema === expectedSchema)
// Add already existing column
intercept[AnalysisException] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check the error message?

}
}
// Add a column to not existing table
intercept[AnalysisException] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

val expectedSchema = new StructType().add("ID", StringType)
assert(t.schema === expectedSchema)
// Update column type from STRING to INTEGER
intercept[AnalysisException] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

sql("ALTER TABLE oracle.alt_table ALTER COLUMN bad_column TYPE DOUBLE")
}
// Update column to wrong type
intercept[AnalysisException] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ParseException?

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some minor comments in tests

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34066/

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34067/

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34066/

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34067/

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Test build #129459 has finished for PR 29912 at commit 300980a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Test build #129460 has finished for PR 29912 at commit 5be3a7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants