-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10648] Proposed bug fix when oracle returns -127 as a scale to a numeric type #8780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
So I understand, the goal of this patch is that if an invalid value is returned (e.g. a precision or scale <= 0), then the defaults are used yes? |
|
Yes, that is the intention. Is this the proper way to address that issue? On Wed, Sep 16, 2015, 5:14 PM Holden Karau [email protected] wrote:
|
|
I'm not super sure on that, one question would be if this is reasonable behavior for all databases or only Oracle. |
|
I'm not sure if oracle can be associated with anything reasonable, but sometimes you have to play the hand you are dealt. :) I can only answer your question with a question... Would there ever be a use case in the Decimal() class where the precision and/or the scale would be set to a negative value? If not, then I'd have to imagine that this patch makes the intention of the code more accurate across the board. If yes, then I may have to explore an oracle-only type of patch, which may or may not ever be committed back, depending on it's usefulness to the community. I'd have to assume that there isn't a use case for negative values given the way precision and scale are used and defined, but you'll have to forgive any ignorance on my part as I'm still fairly new to scala. I hadn't even browsed the source for spark until about one week ago. I'm still in the alpha stages of even testing spark in general, so while it's seemingly solved the problem for me in my testing, I could easily be overlooking something. |
|
Actually scale can be negative. It just means the number of 0s to the left of decimal point. For example, for number 123, precision = 2 and scale = -1, then 123 would become 120. |
|
(I actually don't know if Spark implements this correctly -- we should test it) |
|
That is exactly what I was afraid of. Would the patch make more sense to only check precision for a zero value? Does it ever make sense to have a precision of zero (or less than zero for that matter)? Could we safely enforce defaults if precision is zero (or less) regardless of scale? That would solve my problem still, hopefully without compromising functionality for everyone else. |
|
That should work. On Sep 18, 2015, at 7:15 AM, Travis Hegner [email protected] wrote: That is exactly what I was afraid of. Would the patch make more sense to — |
|
Working on a new patch... Would it ever be possible to have a case where precision is 0 (essentially undefined), but scale is still intentionally set? Or is it that setting the precision is required in order to set a scale? |
|
They would all be null then. It doesn't make sense to have precision < scale. |
|
But a negative scale is inherently less than a defined precision... or do you mean precision should never be less than the absolute value of scale? Is that something that should be tested for, and overridden with defaults if true? This also would fix my problem without DB specific hacks. |
|
Oh actually - let me correct it. If scale is positive, then precision needs to >= scale. If scale is negative, then precision can be anything (>0). I'm not sure if precision == 0 makes any sense, since that effectively means null value. |
|
OK, after looking at this a little further, it seems that DecimalType.bounded() should be called regardless of precision and scale values in JDBCRDD.scala, and then let the .bounded() method validate the precision and scale values. If it finds invalid values, it can return defaults, or throw an exception, depending on which condition is met. I'm going to move in that direction for this. I'm still a little confused by the precision and scale rules that you defined. Wouldn't it be invalid to have a precision = 10, but a scale = -20? Wouldn't that always result in a null value, or am I still completely interpreting precision and scale incorrectly? |
|
precision = 10 and scale = -20 should be fine. |
…scala, and put them into bounded function
|
I'm making sure the new version builds, but here are the rules: private[sql] def bounded(precision: Int, scale: Int): DecimalType = {
if (precision <= 0)
DecimalType.SYSTEM_DEFAULT
else if (scale > precision)
DecimalType(min(precision, MAX_PRECISION), min(precision, MAX_SCALE))
else
DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
}For both Decimal and Numeric types in Once I verify that it builds and runs, I will update the pull request. Does this look accurate? |
|
So any thoughts on merging this? |
|
cc @davies to take a quick look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about change this to scale >= 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't work as demonstrated earlier in the thread. A negative scale is legal to reduce precision to the 10s place, 100s place, etc...
|
I met this problem before, and actually it's not spark that detect them to be 0 and -127, but JDBC. My solution is just adding a So if we want to support oracle officially, we can add |
|
The problem with Oracle is that you can define numbers without providing precision or scale: I think the best solution is to handle this in a OracleDialect since this is a quirk of Oracle. I've done that for my own code but it would be nice to have two changes in Spark:
For now, this is what I've done in my own OracleDialect: |
|
@cloud-fan @bdolbeare @davies I'm certainly open to doing this in an oracle specific way if that is what is required. I was simply hoping to solve my problem while simultaneously making the whole project more robust. I completely understand if you don't believe that it's the right direction. Thanks for looking into it with me! |
|
@travishegner looks like it is best to just do it in the oracle dialect. |
|
@travishegner Will you have time to continue your work? I think our resolution is to create a oracle dialect and we automatically register it (see https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala). |
|
Test build #1986 has finished for PR 8780 at commit
|
|
Please see PR #9495 for the oracle dialect solution proposed above. |
This is the alternative/agreed upon solution to PR #8780. Creating an OracleDialect to handle the nonspecific numeric types that can be defined in oracle. Author: Travis Hegner <[email protected]> Closes #9495 from travishegner/OracleDialect. (cherry picked from commit 14ee0f5) Signed-off-by: Yin Huai <[email protected]>
This is the alternative/agreed upon solution to PR #8780. Creating an OracleDialect to handle the nonspecific numeric types that can be defined in oracle. Author: Travis Hegner <[email protected]> Closes #9495 from travishegner/OracleDialect.
In my environment, the precision and scale are undefined in the oracle database, but spark is detecting them to be 0 and -127 respectively.
If I understand those two values correctly, they should never logically be defined as less than zero, so the proposed changes should correctly default the precision and scale instead of trying to use erroneous values.
If there is a valid use case of a negative precision or scale, then I can re-work this to test for the exact 0 AND -127 case and handle it appropriately.