Skip to content
This repository was archived by the owner on Feb 27, 2025. It is now read-only.

Conversation

@pp-akursar
Copy link
Contributor

Changes similar to those of the previous upgrade to 3.3 at #197

  • Bump Spark to 3.4.0
  • Bumped the version of the library to 1.4.0
  • Updated the readme accordingly
    • Added the 1.4.0 artifact to the version table

Fixes #227, specifically java.lang.NoSuchMethodError: 'org.apache.spark.sql.types.StructType org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(java.sql.ResultSet, org.apache.spark.sql.jdbc.JdbcDialect, boolean)'

@pp-akursar
Copy link
Contributor Author

@microsoft-github-policy-service agree company="PulsePoint"

@shivsood
Copy link
Collaborator

@pp-akursar The change looks good. can u attach test results on 3.4.0.

Copy link
Collaborator

@shivsood shivsood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test results for 3.4 run. Post that this can be merged.

@SteffenMangold
Copy link

SteffenMangold commented Sep 22, 2023

Please review and build! :)
Highly needed

@shivsood
Copy link
Collaborator

shivsood commented Sep 22, 2023

Test results on Spark 3.4.0 - Pass.

Run :

/*

  • SparkConnTestMain
  • Main class for the test jar.
  • @ arg0 : database user name
  • @ arg1 : database user password
  • @ arg2 : AD user principal
  • @ arg3 : AD user keytab name
  • @ arg4 : connector type to use ("JDBC" or "com.microsoft.sqlserver.jdbc.spark")
  • @ arg5 : database name
  • @ arg6 : data source for data pool
  • @ arg7 : test suite to run -GCI(0), CI(1), Perf(2), All(9)
  • @ arg8 : run datapool test (true or false)
  • @ arg9 : sql server URL
  • @ arg10: sql server port
  • @ arg11: AD domain
    */

val test_obj = new SparkConnTest("connectoradmin", "", "", "", "com.microsoft.sqlserver.jdbc.spark", "testconn","", "0", "false".toBoolean, "database.windows.net", 1433,"");
test_obj.test_sqlmaster();

Results: Passed
test_gci_twoPartName_owar : Entered
Tablename is mssqlspark.test_gci_twoPartName_owar
Operation Overwrite, append and read
test_gci_twoPartName_owar : Passed
test_gci_tbNameInBracket_owar : Entered
Table name is [test_gci_tbNameInBracket_owar]
Operation Overwrite, append and read
test_gci_tbNameInBracket_owar : Passed
test_gci_tabLock_write : Entered
test_gci_tabLock_write : Passed
test_gci_secureURL_write : Entered
test_gci_secureURL_write : Passed
test_gci_reordered_columns : Entered
test_gci_reordered_columns : Created table
test_gci_reordered_columns : Append succcessful
test_gci_reordered_columns : Read back table and confirmed data is added succcessful
test_gci_reordered_columns : Reordered Write overwrite with truncate
test_gci_reordered_columns : Reordered write append
test_gci_reordered_columns : Reordered Write overwrite without truncate Passed
test_write_parallel : Entered
test_write_parallel : Passed
test_gci_empty_dataframe : Entered
test_gci_empty_dataframe : Passed
test_gci_read_write : Entered
test_basic_read_write : Passed
test_gci_read_write : Passed
test_gci_null_values : Entered
test_gci_null_values : Passed
test_gci_append_rows : Entered
test_gci_append_rows : Passed
test_gci_truncate_table : Entered
test_gci_truncate_table : Passed
test_gci_case_sensitivity : Entered
test_gci_case_sensitivity : exit
test_gci_precision_scale : Entered
test_gci_precision_scale : exit
test_isolation_level : Entered
test_isolation_level : READ_UNCOMMITTED succeded
test_isolation_level : READ_COMMITTED succeded
test_isolation_level : REPEATABLE_READ succeded
test_isolation_level : SNAPShort write start
test_isolation_level : SNAPShort write done
test_isolation_level : SNAPShort read done
test_isolation_level : SNAPShort 5 5
Assert counts
test_isolation_level : SNAPSHOT succeded
test_isolation_level : isoLevel = ONE
test_isolation_level : isoLevel = NONE Exception
test_isolation_level : all done
test_isolation_level : exit
test_gci_limit_escape : Multiple read test Entered
test_gci_limit_escape : Passed
test_gci_threePartName_owar : Entered
Tablename is testconn.mssqlspark.test_gci_threePartName_owar
Operation Overwrite, append and read
test_gci_threePartName_owar : Passed

@shivsood
Copy link
Collaborator

Test Pass: Reliability Mode on.

/*

  • SparkConnTestMain
  • Main class for the test jar.
  • @ arg0 : database user name
  • @ arg1 : database user password
  • @ arg2 : AD user principal
  • @ arg3 : AD user keytab name
  • @ arg4 : connector type to use ("JDBC" or "com.microsoft.sqlserver.jdbc.spark")
  • @ arg5 : database name
  • @ arg6 : data source for data pool
  • @ arg7 : test suite to run -GCI(0), CI(1), Perf(2), All(9)
  • @ arg8 : run datapool test (true or false)
  • @ arg9 : sql server URL
  • @ arg10: sql server port
  • @ arg11: AD domain
    */

Test :

test_obj.test_sqlmaster_reliable_connector()

Results : Pass
test_isolation_level : SNAPSHOT succeded
test_isolation_level : isoLevel = NONE
test_isolation_level : isoLevel = NONE Exception
test_isolation_level : all done
test_isolation_level : exit
test_gci_limit_escape : Multiple read test Entered
test_gci_limit_escape : Passed
test_gci_threePartName_owar : Entered
Tablename is testconn.mssqlspark.test_gci_threePartName_owar
Operation Overwrite, append and read
test_gci_threePartName_owar : Passed
test_gci_twoPartName_owar : Entered
Tablename is mssqlspark.test_gci_twoPartName_owar
Operation Overwrite, append and read
test_gci_twoPartName_owar : Passed
test_gci_tbNameInBracket_owar : Entered
Table name is [test_gci_tbNameInBracket_owar]
Operation Overwrite, append and read
test_gci_tbNameInBracket_owar : Passed
test_gci_tabLock_write : Entered
test_gci_tabLock_write : Passed
test_gci_secureURL_write : Entered
test_gci_secureURL_write : Passed
test_gci_reordered_columns : Entered
test_gci_reordered_columns : Created table
test_gci_reordered_columns : Append succcessful
test_gci_reordered_columns : Read back table and confirmed data is added succcessful
test_gci_reordered_columns : Reordered Write overwrite with truncate
test_gci_reordered_columns : Reordered write append
test_gci_reordered_columns : Reordered Write overwrite without truncate Passed
test_write_parallel : Entered
test_write_parallel : Passed
test_gci_empty_dataframe : Entered
test_gci_empty_dataframe : Passed
test_gci_read_write : Entered
test_basic_read_write : Passed
test_gci_read_write : Passed
test_gci_null_values : Entered
test_gci_null_values : Passed
test_gci_append_rows : Entered
test_gci_append_rows : Passed
Results : Pass

test_isolation_level : SNAPSHOT succeded
test_isolation_level : isoLevel = NONE
test_isolation_level : isoLevel = NONE Exception
test_isolation_level : all done
test_isolation_level : exit
test_gci_limit_escape : Multiple read test Entered
test_gci_limit_escape : Passed
test_gci_threePartName_owar : Entered
Tablename is testconn.mssqlspark.test_gci_threePartName_owar
Operation Overwrite, append and read
test_gci_threePartName_owar : Passed
test_gci_twoPartName_owar : Entered
Tablename is mssqlspark.test_gci_twoPartName_owar
Operation Overwrite, append and read
test_gci_twoPartName_owar : Passed
test_gci_tbNameInBracket_owar : Entered
Table name is [test_gci_tbNameInBracket_owar]
Operation Overwrite, append and read
test_gci_tbNameInBracket_owar : Passed
test_gci_tabLock_write : Entered
test_gci_tabLock_write : Passed
test_gci_secureURL_write : Entered
test_gci_secureURL_write : Passed
test_gci_reordered_columns : Entered
test_gci_reordered_columns : Created table
test_gci_reordered_columns : Append succcessful
test_gci_reordered_columns : Read back table and confirmed data is added succcessful
test_gci_reordered_columns : Reordered Write overwrite with truncate
test_gci_reordered_columns : Reordered write append
test_gci_reordered_columns : Reordered Write overwrite without truncate Passed
test_write_parallel : Entered
test_write_parallel : Passed
test_gci_empty_dataframe : Entered
test_gci_empty_dataframe : Passed
test_gci_read_write : Entered
test_basic_read_write : Passed
test_gci_read_write : Passed
test_gci_null_values : Entered
test_gci_null_values : Passed
test_gci_append_rows : Entered
test_gci_append_rows : Passed

| Spark 3.0.x compatible connector | `com.microsoft.azure:spark-mssql-connector_2.12:1.1.0` | 2.12 |
| Spark 3.1.x compatible connector | `com.microsoft.azure:spark-mssql-connector_2.12:1.2.0` | 2.12 |
| Spark 3.3.x compatible connector | `com.microsoft.azure:spark-mssql-connector_2.12:1.3.0` | 2.12 |
| Spark 3.4.x compatible connector | `com.microsoft.azure:spark-mssql-connector_2.12:1.4.0` | 2.12 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also need to add spark 3.4 in Versions Supported

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as we release the beta.

<profiles>
<profile>
<id>spark33</id>
<id>spark34</id>
Copy link
Collaborator

@luxu1-ms luxu1-ms Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivsood Could you confirm that 3.4 spark DBR environment could not use 3.3 connector? If 3.4 DBR could still use 3.3 connector, then there is no need for a new release maybe?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.3.0-BETA release ( Spark 3.3) throws an exception for get schema type with DBR Spark 3.4.0.

@shivsood shivsood merged commit eb60462 into microsoft:master Sep 22, 2023
@SteffenMangold
Copy link

Nice! thanks for reacting this quickly.

@RolandASc
Copy link

@shivsood , can you tell us when the beta will be released and available through maven? We are wondering whether we should revert back to DBR 12.2 for now or if we can just wait a couple more days and switch to the new connector.

And can we assume that for 3.3 it will stay 1.3.0-BETA?

Also, it would be nice if the README could be updated in these two places (even for the previous release):

  • There are three version sets of the connector available through Maven, a 2.4.x, a 3.0.x and a 3.1.x compatible version.
  • Current Releases

And 👍 for your work!

@JessicaBL
Copy link

Hello,
Also wondering when the newest version 1.4.0 for Spark 3.4 will be available on Maven Central - I can't see it on Databricks for Download and it's highly needed!

Thanks :)

@atul-delphix
Copy link

atul-delphix commented May 1, 2024

@pp-akursar / @shivsood When are we planning to GA release this version 1.4.0 to support Apache Spark 3.4? And when can we expect support for Apache Spark 3.5.X ? as all previous version has lots of vulnerabilities.

It highly needed, You can also redirect me to concerning person who can better answer.

@dbeavon
Copy link

dbeavon commented Oct 8, 2024

@shivsood

Should azure databricks customers contact Azure support? It would be great to get GA versions of the connector. The default JDBC connector is not as fast. (even with the batchsize option/customization)

What team at Microsoft sponsors this project? Is it HDInsight? Or Synapse?
Or azure databricks?

@grihabor grihabor mentioned this pull request Dec 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for Spark 3.4.0

8 participants