Skip to content

Conversation

@JoshRosen
Copy link
Contributor

Spark SQL's Decimal class should use Java's BigDecimal instead of Scala's, since this removes a layer of object allocation and works around an issue where Scala's BigDecimal.hashCode() can lead to OOMs (see https://issues.scala-lang.org/browse/SI-6173).

@JoshRosen
Copy link
Contributor Author

/cc @rxin @davies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it considered a public API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decimal is not meant to be public. I think it just accidentally became public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think user's code should never touch Decimal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're in agreement that it should be removed, then I'm glad to update this patch to do so (which involves adding a MiMa exclusion, I think). We could also just leave it and decide later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove it, but it is ok to leave it in for now I think. Most likely this entire class will look very different soon...

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38319 has finished for PR 7635 at commit d7a3535.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SequenceNumberRange(
    • case class SequenceNumberRanges(ranges: Array[SequenceNumberRange])
    • class KinesisBackedBlockRDDPartition(
    • class KinesisBackedBlockRDD(
    • class KinesisSequenceRangeIterator(

@JoshRosen
Copy link
Contributor Author

For my own reference, here's the list of failed tests:

 org.apache.spark.sql.catalyst.expressions.CastSuite.from decimal   4 ms    1
 org.apache.spark.sql.catalyst.expressions.CastSuite.cast from timestamp    0.15 sec    1
 org.apache.spark.sql.catalyst.expressions.MathFunctionsSuite.round 0.64 sec    1
 org.apache.spark.sql.types.decimal.DecimalSuite.hash code  9 ms    1
 org.apache.spark.sql.types.decimal.DecimalSuite.isZero

@JoshRosen
Copy link
Contributor Author

Some of these tests are failing due to legitimate issues with Decimal hashCodes. Since this needs more design work to figure out the right semantics, I'm going to close this and un-assign the JIRA ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants