Skip to content

Conversation

@andrej-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes the pushdown of ^ operator (XOR operator) for Postgres. Those two databases use this as exponent, rather then bitwise xor.

Fix is consisted of overriding the SQLExpressionBuilder to replace the '^' character with '#'.

Why are the changes needed?

Result is incorrect.

Does this PR introduce any user-facing change?

Yes. The user will now have a proper translation of the ^ operator.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Sep 18, 2024
@urosstan-db
Copy link
Contributor

Can we add some tests?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you add an integration test, for instance here *PostgresExpressionPushdownSuite*

@andrej-db
Copy link
Contributor Author

I wanted to add tests, but didn't know where to put them... Don't know how oss tests postgre.

@urosstan-db
Copy link
Contributor

I wanted to add tests, but didn't know where to put them... Don't know how oss tests postgre.

V2JDBCTest is base class, and there is derived class for Postgres PostgresIntegrationSuite

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49695] Postgres fix xor push-down [SPARK-49695][SQL] Postgres fix xor push-down Sep 18, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @andrej-db . +1 for adding a test case definitely.

cc @huaxingao , too.

@andrej-db
Copy link
Contributor Author

Added the test, let me know if this is in order.

testDatetime(s"$catalogAndNamespace.${caseConvert("datetime")}")
}

test("xor operator push-down") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can do explain formatted and check whether string contains "id" # 3, that will add a little bit of robustness.
Another thing we can do is to make unit test, and just invoke compilation of XOR expression and check whether col # constant is result of compilation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat, I like it

Copy link
Contributor

@urosstan-db urosstan-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving latest iteration (with tests)

PostgresIntegrationSuite: add test
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix your test:

[info] - SPARK-49695: Postgres fix xor push-down *** FAILED *** (10 milliseconds)
[info]   org.apache.spark.sql.catalyst.ExtendedAnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `bar` cannot be found. Verify the spelling and correctness of the schema and catalog.


override def compileExpression(expr: Expression): Option[String] = {
val builder = new PostgresSQLBuilder()
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps its better to only override visitBinaryArithmetics?

We have similar problem with some of the functions and we use dialectFunctionName to translate from Spark to local dialect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 here, let's override last possible method in chain of execution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@milastdbx milastdbx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring this

Copy link
Contributor

@urosstan-db urosstan-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please make test more assertive.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrej-db Could you fix builds and retrigger intergration tests:

[error] /home/runner/work/spark/spark/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala:237:25: not found: type Filter
[error]       plan.isInstanceOf[Filter]
[error]                         ^
[error] one error found

Comment on lines +262 to +264
assert(rows.length == 1)
assert(rows(0).getInt(0) === 6)
assert(rows(0).getString(1) === "jen")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use checkAnswer, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has mainly these asserts, wanted to make it in line with other tests

…park/sql/jdbc/v2/PostgresIntegrationSuite.scala

Co-authored-by: Maxim Gekk <[email protected]>
@MaxGekk
Copy link
Member

MaxGekk commented Dec 4, 2024

+1, LGTM. Merging to master/3.5/3.4.
Thank you, @andrej-db and @urosstan-db @RaleSapic @PetarVasiljevic-DB @milastdbx @dongjoon-hyun for review.

@MaxGekk MaxGekk closed this in 4248397 Dec 4, 2024
@MaxGekk
Copy link
Member

MaxGekk commented Dec 4, 2024

@andrej-db Could you open separate PRs for 3.4 and 3.5 because your changes cause conflicts in the branches.

@dongjoon-hyun
Copy link
Member

To @MaxGekk and @andrej-db , Apache Spark 3.4 reached the End-Of-Support.

Only branch-3.5 is open for backporting.

Also, cc @LuciferYang since he is the release manager for Apache Spark 3.5.4.

@LuciferYang
Copy link
Contributor

Thanks @dongjoon-hyun

dongjoon-hyun pushed a commit that referenced this pull request Dec 7, 2024
### What changes were proposed in this pull request?
Backport of the #48144

This PR fixes the pushdown of ^ operator (XOR operator) for Postgres. Those two databases use this as exponent, rather then bitwise xor.

Fix is consisted of overriding the SQLExpressionBuilder to replace the '^' character with '#'.
### Why are the changes needed?
Result is incorrect.

### Does this PR introduce _any_ user-facing change?
Yes. The user will now have a proper translation of the ^ operator.

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #49071 from andrej-db/PGXORBackport.

Lead-authored-by: Andrej Gobeljić <[email protected]>
Co-authored-by: andrej-gobeljic_data <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants