-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-8221][SQL]Add pmod function #6783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #34760 has finished for PR 6783 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to support Decimal type as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Byte, Long etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I've just checked the Hive doc, seems only Int and Double types are accepted, let's do the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important that we should not generate a result different from hive, but I think we are free to extend when hive would throw an exception, only if we consider it a reasonable change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however I think it is ok to leave Decimal unsupported here.:P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to hive, I think we need to support Decimal...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's not the same as the doc described, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
The reason for this is conform to the BinaryMathExpression, which supposed to be DoubleType, just the like pow etc.
|
Test build #34900 has finished for PR 6783 at commit
|
|
Test build #34903 has finished for PR 6783 at commit
|
|
Test build #34912 has finished for PR 6783 at commit
|
|
Test build #34963 has finished for PR 6783 at commit
|
|
LGTM except the full type APIs |
|
@adrian-wang @chenghao-intel , wondering what type of api we should reserve regarding to this kind of api |
|
@zhichao-li if you are using |
|
hum... right . It just give us a convenient way to enable |
|
Adding API is easier, but remove the published API is difficult, let's keep the DF API simple as much as possible. |
|
Test build #34975 has finished for PR 6783 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about follow Remainder to use Integral.rem to do the computation? That will save a lot of code here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea , but seems like the clac result would be a little bit different with the expected result. pmod(7.2, 4.1), actual: 3.099999999999999, expected: 3.1000000000000005 . I'm keen on keeping it this way if not obvious downside.
private lazy val integral = dataType match {
case i: IntegralType => i.integral.asInstanceOf[Integral[Any]]
case i: FractionalType => i.asIntegral.asInstanceOf[Integral[Any]]
case i: DecimalType => i.asIntegral.asInstanceOf[Integral[Any]]
}
protected override def evalInternal(a: Any, n: Any) =
integral.rem(integral.plus(integral.rem(a, n), n), n)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok maybe you can send another PR to fix Divide in this way too, as it's result is different from hive's.
|
Test build #35116 has finished for PR 6783 at commit
|
|
Test build #35203 has finished for PR 6783 at commit
|
|
retest this please. |
1 similar comment
|
retest this please. |
|
Test build #35209 has finished for PR 6783 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need to override this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it's a must since symbol is abstract in BinaryOperator
|
retest this please. Fail by timeout after 1 minute |
|
Test build #9 has finished for PR 6783 at commit
|
|
Test build #11 has finished for PR 6783 at commit
|
|
retest this please. |
|
Test build #37208 has finished for PR 6783 at commit
|
|
Test build #37219 has finished for PR 6783 at commit
|
|
Test build #13 has finished for PR 6783 at commit
|
|
retest this please. |
|
Test build #18 has finished for PR 6783 at commit
|
|
Test build #37291 has finished for PR 6783 at commit
|
|
retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this -- and after #7348 is merged, you'd need to define inputType to be NumericType
|
@zhichao-li this looks pretty good now. Just need to update it to work with the latest type checking framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should override toString to be s"pmod($left, $right)"
|
Test build #37332 has finished for PR 6783 at commit
|
|
Jenkins, retest this please. |
|
Test build #20 has finished for PR 6783 at commit
|
|
Test build #37321 has finished for PR 6783 at commit
|
|
Test build #37334 has finished for PR 6783 at commit
|
|
Test build #37335 has finished for PR 6783 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this symbol is not needed as you have override toString, however we have to implement it here as it's defined abstract in BinaryOperator. We may need to fix it in the future.
|
lgtm |
|
Alright I'm merging this. Thanks a lot for working on it. |
| case dt: DecimalType => | ||
| val decimalAdd = "$plus" | ||
| s""" | ||
| ${ctx.javaType(dataType)} r = $eval1.remainder($eval2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi there is a bug here -- if we use pmod twice, this will fail codegen because r is not unique.
https://issues.apache.org/jira/browse/SPARK-8221
One concern is the result would be negative if the divisor is not positive( i.e pmod(7, -3) ), but the behavior is the same as hive.