Skip to content

Conversation

@kanzhang
Copy link
Contributor

... sequences

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala> dr.take(1)
res11: scala.collection.immutable.NumericRange[Double] = NumericRange(1.0)

scala> dr.take(2)
res12: scala.collection.immutable.NumericRange[Double] = NumericRange(1.0)

scala> dr.take(3)
res13: scala.collection.immutable.NumericRange[Double] = NumericRange(1.0, 1.2)

scala> dr.take(4)
res14: scala.collection.immutable.NumericRange[Double] = NumericRange(1.0, 1.2, 1.4, 1.5999999999999999)

scala> lr.take(1)
res15: scala.collection.immutable.NumericRange[Long] = NumericRange(1)

scala> lr.take(2)
res16: scala.collection.immutable.NumericRange[Long] = NumericRange(1, 4)

scala> lr.take(3)
res17: scala.collection.immutable.NumericRange[Long] = NumericRange(1, 4, 7)

scala> lr.take(4)
res18: scala.collection.immutable.NumericRange[Long] = NumericRange(1, 4, 7)

(1D to 2D).by(0.2).take(2) => NumericRange(1.0) why ? This is a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a bug to me. This, in turn, is causing problems in RDD.

scala> sc.parallelize((1D to 2D).by(0.2), 2).collectPartitions
res15: Array[Array[Double]] = Array(Array(1.0, 1.2), Array(1.6, 1.8))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue has been reported at Scala and is still open, https://issues.scala-lang.org/browse/SI-8518

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to wait on this to be fixed by Scala, or do you want to work around it for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd wait for Scala to fix it. That said, I'm open to work around (I just don't see one myself).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, maybe we should wait for Scala then. By the way, for your original use case, was the range you wanted always (0 to numElements)? If so you can also try RDD.zipWithIndex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateiz the use case wasn't mine, it was from reporter of SPARK-1817. Btw, I think this PR can be committed independent of Scala fix. It fixes the issue for other numeric ranges (e.g., Long), and will also work on Double once the Scala fix is in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this patch make it lose numbers out of Double ranges? Whereas the current implementation works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateiz the current implementation would lose elements for all types of numeric ranges (including Long and Double) when we zip a numeric range with other sequences, because we partition numeric ranges differently from other sequences. This patch fixes it by partitioning numeric ranges at exactly the same indexes as we would on other sequences. However, we still depend on take and drop being implemented correctly on numeric ranges for things to work. The Scala bug affects take and drop on Double ranges, but not on other numeric ranges like Long (hence, the unit tests in this patch, which are based on Long ranges, are successful).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense then; I didn't realize that we were already using drop and take. In that case we should merge this patch as is and maybe create a JIRA for Double ranges so people see it's a known issue. Made one other small comment on the patch.

@mateiz
Copy link
Contributor

mateiz commented May 17, 2014

Jenkins, this is ok to test

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15060/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an explicit return type (e.g. : Seq[(Int, Int)])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it would be better if this returned an Iterator, so that it doesn't materialize the whole sequence. You can do (0 until numSlices).iterator.map(...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an explicit return type (e.g. : Seq[(Int, Int)])

For binary compatibility?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just our style throughout the code. It makes it easier to avoid compatibility-breaking changes.

@kanzhang
Copy link
Contributor Author

kanzhang commented Jun 3, 2014

Updated patch based on @mateiz comments.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15392/

@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Jenkins, retest this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15399/

@kanzhang
Copy link
Contributor Author

@mateiz could you take another look at this when you get a chance? SPARK-1817 has been marked as resolved, but the fix for the original issue depends on this patch. Thx.

@mateiz
Copy link
Contributor

mateiz commented Jun 14, 2014

Oh, sorry, I forgot to merge this after testing it. Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15790/

@asfgit asfgit closed this in 7dd9fc6 Jun 14, 2014
@mateiz
Copy link
Contributor

mateiz commented Jun 14, 2014

Alright, merged this. Thanks!

@kanzhang kanzhang deleted the SPARK-1837 branch June 16, 2014 01:01
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…her...

... sequences

Author: Kan Zhang <[email protected]>

Closes apache#776 from kanzhang/SPARK-1837 and squashes the following commits:

e48f018 [Kan Zhang] [SPARK-1837] code refactoring
67c33b5 [Kan Zhang] minor change
403f9b1 [Kan Zhang] [SPARK-1837] NumericRange should be partitioned in the same way as other sequences
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…her...

... sequences

Author: Kan Zhang <[email protected]>

Closes apache#776 from kanzhang/SPARK-1837 and squashes the following commits:

e48f018 [Kan Zhang] [SPARK-1837] code refactoring
67c33b5 [Kan Zhang] minor change
403f9b1 [Kan Zhang] [SPARK-1837] NumericRange should be partitioned in the same way as other sequences
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
### What changes were proposed in this pull request?

Currently, Spark pulls Gson 2.2.4 from `hive-exec`, which is pretty old and [vulnerable](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-25647), this PR proposes to upgrade it to the latest version 2.11.0.

<img width="697" alt="image" src="https://github.com/user-attachments/assets/f101ab3f-875c-4cc3-9692-48394c9ada3e">

### Why are the changes needed?

For security.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GHA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47627 from pan3793/SPARK-49120.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit 9fb9cff)

Co-authored-by: Cheng Pan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants