Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Jul 1, 2020

What changes were proposed in this pull request?

The purpose of this PR is to partly resolve SPARK-29292, and fully resolve SPARK-30010, which should allow Spark to compile vs Scala 2.13 in Spark Core and up through GraphX (not SQL, Streaming, etc).

Note that we are not trying to determine here whether this makes Spark work on 2.13 yet, just compile, as a prerequisite for assessing test outcomes. However, of course, we need to ensure that the change does not break 2.12.

The changes are, in the main, adding .toSeq and .toMap calls where mutable collections / maps are returned as Seq / Map, which are immutable by default in Scala 2.13. The theory is that it should be a no-op for Scala 2.12 (these return themselves), and required for 2.13.

There are a few non-trivial changes highlighted below.
In particular, to get Core to compile, we need to resolve SPARK-30010 which removes a deprecated SparkConf method

Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

Does this PR introduce any user-facing change?

Yes, removal of the deprecated SparkConf.setAll overload, which isn't legal in Scala 2.13 anymore.

How was this patch tested?

Existing tests. (2.13 was not tested; this is about getting it to compile without breaking 2.12)

@srowen srowen changed the title [SPARK-29292][SPARK-30010][CORE] Let core compile for Scala 2.13 [WIP][SPARK-29292][SPARK-30010][CORE] Let core compile for Scala 2.13 Jul 1, 2020
@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@srowen
Copy link
Member Author

srowen commented Jul 1, 2020

Jenkins retest this please

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@srowen
Copy link
Member Author

srowen commented Jul 1, 2020

@shaneknapp I think we have the same corrupted .m2 repository issue again. Do I just keep retesting until it doesn't hit the worker?

@srowen
Copy link
Member Author

srowen commented Jul 1, 2020

Jenkins retest this please

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@srowen
Copy link
Member Author

srowen commented Jul 1, 2020

Jenkins retest this please

@SparkQA

This comment has been minimized.

@dongjoon-hyun
Copy link
Member

Thank you so much, @srowen !

@SparkQA

This comment has been minimized.


def stateValid(): Boolean = {
(workers.map(_.ip) -- liveWorkerIPs).isEmpty &&
workers.map(_.ip).forall(liveWorkerIPs.contains) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: What about using diff here?
As I see diff is not deprecated: https://www.scala-lang.org/api/current/scala/collection/Seq.html#diff[B%3E:A](that:scala.collection.Seq[B]):C

Suggested change
workers.map(_.ip).forall(liveWorkerIPs.contains) &&
workers.map(_.ip).diff(liveWorkerIPs).isEmpty &&

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff would work too, I think. It has multiset semantics, and I thought it was not necessary here. I went for what I thought was simpler, but I am not 100% sure.

}
else {
new Range(r.start + start * r.step, r.start + end * r.step, r.step)
new Range.Inclusive(r.start + start * r.step, r.start + end * r.step - 1, r.step)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Range.Exclusive?

Suggested change
new Range.Inclusive(r.start + start * r.step, r.start + end * r.step - 1, r.step)
new Range.Exclusive(r.start + start * r.step, r.start + end * r.step, r.step)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range.Exclusive doesn't exist in 2.12, and Range() (exclusive in 2.12) doesn't exist in 2.13. :( I tried that initially. Because these are integers, I think we can get away with an Inclusive range that ends at end - 1 instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In this case this is totally fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question: what about using until and by?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably equivalent, yeah. I was shooting for minimal change here, but there may be several solutions.

@srowen
Copy link
Member Author

srowen commented Jul 2, 2020

Jenkins test this please

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124915 has finished for PR 28971 at commit ab62b32.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Jul 2, 2020

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124916 has finished for PR 28971 at commit ab62b32.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125479 has started for PR 28971 at commit 89d19c6.

@srowen
Copy link
Member Author

srowen commented Jul 9, 2020

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125499 has started for PR 28971 at commit 89d19c6.

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Jul 11, 2020

Test build #125648 has finished for PR 28971 at commit 89d19c6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Jul 11, 2020

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jul 11, 2020

Test build #125682 has finished for PR 28971 at commit 89d19c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen changed the title [WIP][SPARK-29292][SPARK-30010][CORE] Let core compile for Scala 2.13 [SPARK-29292][SPARK-30010][CORE] Let core compile for Scala 2.13 Jul 11, 2020
else {
new Range(r.start + start * r.step, r.start + end * r.step, r.step)
} else {
new Range.Inclusive(r.start + start * r.step, r.start + (end - 1) * r.step, r.step)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For previous reviewers: I fixed a bug from my initial change here. The non-inclusive end is not 1 less than the exclusive end, but one less r.step

resourceProfileManager: ResourceProfileManager)
extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {

def this() = this(null, null, null, null)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have no idea how this wasn't required in Scala 2.12, as it's used with a no-arg constructor but none existed ?!


test("top with predefined ordering") {
val nums = Array.range(1, 100000)
val nums = Seq.range(1, 100000)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment: generally speaking Seq types have less weird generic type problems than Arrays. This is a good example

@dongjoon-hyun
Copy link
Member

Great, @srowen !

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 11, 2020

Could you fix the following four instances together in this PR, @srowen ?

[ERROR] [Error] /Users/dongjoon/PRS/SPARK-PR-28971/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala:23: object parallel is not a member of package collection
[ERROR] [Error] /Users/dongjoon/PRS/SPARK-PR-28971/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala:24: object parallel is not a member of package collection
[ERROR] [Error] /Users/dongjoon/PRS/SPARK-PR-28971/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala:64: not found: type ForkJoinTaskSupport
[ERROR] [Error] /Users/dongjoon/PRS/SPARK-PR-28971/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala:79: not found: type ParVector

@srowen
Copy link
Member Author

srowen commented Jul 11, 2020

Oh where do you see that? I can't find it in the test logs. That should be fine on 2.12 but is also working on my local 2.13 compilation.

@dongjoon-hyun
Copy link
Member

I built with 2.13.3 like the following to verify this PR.

$ dev/change-scala-version.sh 2.13

$ ...
-    <scala.version>2.12.10</scala.version>
-    <scala.binary.version>2.12</scala.binary.version>
+    <scala.version>2.13.3</scala.version>
+    <scala.binary.version>2.13</scala.binary.version>

$ build/mvn package -DskipTests -pl core --am

@srowen
Copy link
Member Author

srowen commented Jul 11, 2020

Oh, you have to build with -Pscala-2.13 too

@dongjoon-hyun
Copy link
Member

Oh. Got it. Thanks!

@srowen
Copy link
Member Author

srowen commented Jul 11, 2020

Oh, I did miss one necessary change. The room pom has to update to scala 2.13.3, yes. Let me get that part of the change in here.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Everything works. Thank you so much, @srowen .
Merged to master for Apache Spark 3.1.0.

@srowen
Copy link
Member Author

srowen commented Jul 11, 2020

Oh OK I wanted to make sure everyone was OK with the approach, but I think so as it's been the plan for a long time AFAICT. I will start making other similar PRs (this one does not resolved SPARK-29292 by itself)

@dongjoon-hyun
Copy link
Member

Sorry for the rush. If needed, we are able to switch our approaches during Apache Spark 3.1.0 timeline. I believe this healthy core module will unlock Scala-2.13 progress as a baseline toward other modules and Scala 2.13 testing stage.

@SparkQA
Copy link

SparkQA commented Jul 12, 2020

Test build #125693 has finished for PR 28971 at commit 8f5af5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dongjoon-hyun pushed a commit that referenced this pull request Jul 14, 2020
…piling for Scala 2.13

### What changes were proposed in this pull request?

Continuation of #28971 which lets streaming, catalyst and sql compile for 2.13. Same idea.

### Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests. (2.13 was not tested; this is about getting it to compile without breaking 2.12)

Closes #29078 from srowen/SPARK-29292.2.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
* Set multiple parameters together
*/
@deprecated("Use setAll(Iterable) instead", "3.0.0")
def setAll(settings: Traversable[(String, String)]): SparkConf = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen, BTW, it might be best to file a JIRA as a reminder to keep this API back if we can't make Scala 2.13 in Spark 3.1.

I believe it is legitimate and inevitable to remove this because of Scala 2.13 but it might be problematic if we can't make it in Spark 3.1, and have a release out only with Scala 2.12.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if the whole thing doesn't make it for 3.1, I'd leave this method in 3.1.

dongjoon-hyun pushed a commit that referenced this pull request Jul 15, 2020
… for Scala 2.13 compilation

### What changes were proposed in this pull request?

Same as #29078 and #28971 . This makes the rest of the default modules (i.e. those you get without specifying `-Pyarn` etc) compile under Scala 2.13. It does not close the JIRA, as a result. this also of course does not demonstrate that tests pass yet in 2.13.

Note, this does not fix the `repl` module; that's separate.

### Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests. (2.13 was not tested; this is about getting it to compile without breaking 2.12)

Closes #29111 from srowen/SPARK-29292.3.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Jul 18, 2020
…ing modules

### What changes were proposed in this pull request?

See again the related PRs like #28971
This completes fixing compilation for 2.13 for all but `repl`, which is a separate task.

### Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests. (2.13 was not tested; this is about getting it to compile without breaking 2.12)

Closes #29147 from srowen/SPARK-29292.4.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@srowen srowen deleted the SPARK-29292.1 branch September 12, 2020 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants