-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-8772][SQL] Implement implicit type cast for expressions that define input types. #7175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is based on #7174 |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #36327 has started for PR 7175 at commit |
|
Test build #36327 has finished for PR 7175 at commit
|
|
Merged build finished. Test FAILed. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #36332 has started for PR 7175 at commit |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #36336 has started for PR 7175 at commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the rule seems like to be:
DecimalType => DecimalType
FractionalType => DoubleType
LongType => LongType
IntegralType => IntegerType
|
Test build #36332 has finished for PR 7175 at commit
|
|
Merged build finished. Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will we use this method?
|
Test build #36336 has finished for PR 7175 at commit
|
|
Merged build finished. Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crc32 should be able to work with StringType, but StringType cannot be implicit casted BinaryType, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i need to think about whether we should support implicit casts from string to binary. sql server does support that. hive doesn't, but hive chose to make a lot of the udfs work against both types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we can always cast a string to binary correctly, as it produces different binary when specifying different encoder. It's actually the case accept multiple DataType for an expression.
And also for Length, which support both StringType and BinaryType.
We probably need another PR for this improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about having an AbstractDataType that's a TypeCollection, that expressions can put arbitrary types into it. Basically similar to the Seq[Any] idea, but with better type safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea for this, but it probably make thing more complicated for auto casting. (Which data type should be cast to?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for StringType -> BinaryType (UTF8 will be used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we'd better leave the casting (StringType -> BinaryType) to be done within the UDF Crc32 itself, not via the generic auto casting rule. From the user perspective, the UDF Crc32 will support both StringType and BinaryType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I just checked the code of Hive, it does convert the StringType => BinaryType (UTF8 bytes), just as the generic rule. @davies +1
|
Merged build triggered. |
|
I added an implicit type cast from String to Binary. |
|
Merged build started. |
|
Test build #36375 has started for PR 7175 at commit |
|
Test build #36375 has finished for PR 7175 at commit
|
|
Merged build finished. Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exist: why we add unapply for it? Is it same with Cast(child: NumericType, StringType)? It looks to me that we only need this object NumericType in ExpectsInputTypes when an expression need any kind of numeric input.
And, should we add object AtomicType too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
child is an expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry didn't see that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw this is old code. just got copied around.
No description provided.