-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-49667][SQL] Disallowed CS_AI collators with expressions that use StringSearch #48121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49667][SQL] Disallowed CS_AI collators with expressions that use StringSearch #48121
Conversation
Do you mind explaining, in the PR description, why they are disallowed? |
sql/api/src/main/scala/org/apache/spark/sql/internal/types/AbstractStringType.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/internal/types/AbstractStringType.scala
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/internal/types/AbstractStringType.scala
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some more comments, but overall looking better to me
on another note - I think we should consider adding a new error class (instead of relying on DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE here)
one reason for this is that UNEXPECTED_INPUT_TYPE will give an error message that looks something like:
"Parameter ... of function ... requires the STRING type, however ... has the type STRING COLLATE UNICODE_AI."
which I think is rather confusing
another reason is that this is indeed a special case, and shouldn't use such a generic error condition - instead, we should add a new one that offers a better explanation to the user
|
123a2ff to
6292cd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, changes in this PR are ensuring core funcionality (block expressions for _CS_AI)
so I agree - if we decide to add a new error class, then we can do that in a follow-up
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Outdated
Show resolved
Hide resolved
6292cd1 to
d8bf03f
Compare
|
thanks, merging to master! |
What changes were proposed in this pull request?
In this PR, I propose to disallow
CS_AIcollated strings in expressions that useStringsSearchin their implementation. These expressions aretrim,startswith,endswith,locate,instr,str_to_map,contains,replace,split_partandsubstring_index.Currently, these expressions support all possible collations, however, they do not work properly with
CS_AIcollators. This is because there is no support forCS_AIsearch in the ICU'sStringSearchclass which is used to implement these expressions. Therefore, the expressions are not behaving correctly when used withCS_AIcollators (e.g. currentlystartswith('hOtEl' collate unicode_ai, 'Hotel' collate unicode_ai)returnstrue).Why are the changes needed?
Proposed changes are necessary in order to achieve correct behavior of the expressions mentioned above.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This patch was tested by adding a test in the
CollationSuite.Was this patch authored or co-authored using generative AI tooling?
No.