Skip to content

Conversation

@juliuszsompolski
Copy link
Contributor

What changes were proposed in this pull request?

32bit Int was used for row rank.
That overflowed in a dataframe with more than 2B rows.

How was this patch tested?

Added test, but ignored, as it takes 4 minutes.

@SparkQA
Copy link

SparkQA commented Jan 4, 2018

Test build #85679 has finished for PR 20152 at commit 324218b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jan 4, 2018

Seems reasonable; @clockfly what do you think?
The serialized form changed here, but I'm not sure we'd guarantee compatibility there. This isn't something serialized to files for long-term storage is it?

@juliuszsompolski
Copy link
Contributor Author

If the serialized form change is a problem, that part can probably be reverted - it's far less likely that a single compressed stats chunk will overflow Int.
The bug I hit was in the global rank counter part, and I changed the other part just by reviewing the code around for other places that could conceivably use a Long instead of Int.

@SparkQA
Copy link

SparkQA commented Jan 4, 2018

Test build #85680 has finished for PR 20152 at commit e32f254.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Stats(value: Double, g: Long, delta: Long)

@rxin
Copy link
Contributor

rxin commented Jan 4, 2018

cc @gatorsmile @cloud-fan

@SparkQA
Copy link

SparkQA commented Jan 4, 2018

Test build #85684 has finished for PR 20152 at commit 448dcce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, merging to master/2.3!

asfgit pushed a commit that referenced this pull request Jan 5, 2018
## What changes were proposed in this pull request?

32bit Int was used for row rank.
That overflowed in a dataframe with more than 2B rows.

## How was this patch tested?

Added test, but ignored, as it takes 4 minutes.

Author: Juliusz Sompolski <[email protected]>

Closes #20152 from juliuszsompolski/SPARK-22957.

(cherry picked from commit df7fc3e)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in df7fc3e Jan 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants