[SPARK-40283][INFRA] Make MiMa check default exclude `private object` and bump `previousSparkVersion` to 3.3.0 #37741

LuciferYang · 2022-08-31T10:50:10Z

What changes were proposed in this pull request?

The main change of this pr as follows:

Make MiMa check excludes private object by default
Bump MiMa's previousSparkVersion to 3.3.0
Supplement missing ProblemFilters to MimaExcludes
Clean up expired rules and case match
Correct the rule added by SPARK-38679 and SPARK-39506 but misplaced

Why are the changes needed?

To ensure that MiMa checks cover new APIs added in Spark 3.3.0.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Scala 2.12

dev/mima -Pscala-2.12

Scala 2.13

dev/change-scala-version.sh 2.13
dev/mima -Pscala-2.13

project/MimaExcludes.scala

LuciferYang · 2022-08-31T12:45:41Z

cc @wangyum @HyukjinKwon @srowen @dongjoon-hyun

project/MimaBuild.scala

dongjoon-hyun

Maybe, do you think you can clean up more like SPARK-36004?

LuciferYang · 2022-09-01T00:31:05Z

Maybe, do you think you can clean up more like SPARK-36004?

like f2ed5d8?

LuciferYang · 2022-09-01T00:43:12Z

Maybe, do you think you can clean up more like SPARK-36004?

like f2ed5d8?

Seems failed. Let me check again

LuciferYang · 2022-09-01T01:08:07Z

project/MimaExcludes.scala

-    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.shuffle.api.SingleSpillShuffleMapOutputWriter.transferMapSpillFile"),
-    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.shuffle.api.ShuffleMapOutputWriter.commitAllPartitions"),
-
    // [SPARK-39506] In terms of 3 layer namespace effort, add currentCatalog, setCurrentCatalog and listCatalogs API to Catalog interface


wrong place, will correct it later

should be v34excludes

LuciferYang · 2022-09-01T01:14:12Z

project/MimaExcludes.scala

-    // [SPARK-36173][CORE] Support getting CPU number in TaskContext
-    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.TaskContext.cpus"),
-
-    // [SPARK-38679][CORE] Expose the number of partitions in a stage to TaskContext


wrong place, should in v34excludes

LuciferYang · 2022-09-01T03:16:32Z

project/MimaExcludes.scala

-    ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.apply")
+    ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.apply"),
+
+    // [SPARK-38679][CORE] Expose the number of partitions in a stage to TaskContext


It was placed in v32excludes before, and move to v34excludes in this pr.

cc @vkorukanti and @cloud-fan SPARK-38679:

[SPARK-38679][CORE] Expose the number partitions in a stage to TaskContext #35995

LuciferYang · 2022-09-01T03:20:18Z

project/MimaExcludes.scala

+    // [SPARK-38679][CORE] Expose the number of partitions in a stage to TaskContext
+    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.TaskContext.numPartitions"),
+
+    // [SPARK-39506] In terms of 3 layer namespace effort, add currentCatalog, setCurrentCatalog and listCatalogs API to Catalog interface


It was placed in v32excludes before, and move to v34excludes in this pr.

cc @amaliujia and @cloud-fan for SPARK-39506

[SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatible #36904

LuciferYang · 2022-09-01T03:23:26Z

project/MimaExcludes.scala

+    ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.types.Decimal.fromStringANSI"),
+    ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.types.Decimal.fromStringANSI$default$3"),
+
+    // [SPARK-39704][SQL] Implement createIndex & dropIndex & indexExists in JDBC (H2 dialect)


cc @panbingkun and @huaxingao for SPARK-39704

[SPARK-39704][SQL] Implement createIndex & dropIndex & indexExists in JDBC (H2 dialect) #37112

they should not be checked by mima... Does mima have a list to mark private APIs?

Sure? we can add org.apache.spark.sql.jdbc.* as defaultExcludes.

c2d9f37 add org.apache.spark.sql.jdbc.* as default excludes, is this ok?

LuciferYang · 2022-09-01T03:25:27Z

project/MimaExcludes.scala

+    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.setCurrentCatalog"),
+    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.listCatalogs"),
+
+    // [SPARK-38929][SQL] Improve error messages for cast failures in ANSI


cc @anchovYu and @MaxGekk for SPARK-38929

[SPARK-38929][SQL] Improve error messages for cast failures in ANSI #36241

LuciferYang · 2022-09-01T03:26:36Z

project/MimaExcludes.scala

+    ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.jdbc.TeradataDialect.dropIndex"),
+    ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.jdbc.TeradataDialect.indexExists"),
+
+    // [SPARK-39759][SQL] Implement listIndexes in JDBC (H2 dialect)


cc @panbingkun and @huaxingao for SPARK-39759

[SPARK-39759][SQL] Implement listIndexes in JDBC (H2 dialect) #37172

LuciferYang · 2022-09-01T03:28:05Z

project/MimaExcludes.scala

+    ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.jdbc.TeradataDialect.listIndexes"),
+
+    // [SPARK-36511][MINOR][SQL] Remove ColumnIOUtil
+    ProblemFilters.exclude[MissingClassProblem]("org.apache.parquet.io.ColumnIOUtil")


cc @panbingkun and @srowen for SPARK-36511

[SPARK-36511][MINOR][SQL] Remove ColumnIOUtil #37529

LuciferYang · 2022-09-01T03:29:56Z

project/MimaExcludes.scala

-  // Exclude rules for 3.2.x from 3.1.1
-  lazy val v32excludes = Seq(
+  // Defulat exclude rules
+  lazy val defaultExcludes = Seq(


@dongjoon-hyun

I keep

// Spark Internals ProblemFilters.exclude[Problem]("org.apache.spark.rpc.*"), ProblemFilters.exclude[Problem]("org.spark-project.jetty.*"), ProblemFilters.exclude[Problem]("org.spark_project.jetty.*"), ProblemFilters.exclude[Problem]("org.sparkproject.jetty.*"), ProblemFilters.exclude[Problem]("org.apache.spark.internal.*"), ProblemFilters.exclude[Problem]("org.apache.spark.unused.*"), ProblemFilters.exclude[Problem]("org.apache.spark.unsafe.*"), ProblemFilters.exclude[Problem]("org.apache.spark.memory.*"), ProblemFilters.exclude[Problem]("org.apache.spark.util.collection.unsafe.*"), ProblemFilters.exclude[Problem]("org.apache.spark.sql.catalyst.*"), ProblemFilters.exclude[Problem]("org.apache.spark.sql.execution.*"), ProblemFilters.exclude[Problem]("org.apache.spark.sql.internal.*"), ProblemFilters.exclude[Problem]("org.apache.spark.sql.errors.*"), // DSv2 catalog and expression APIs are unstable yet. We should enable this back. ProblemFilters.exclude[Problem]("org.apache.spark.sql.connector.catalog.*"), ProblemFilters.exclude[Problem]("org.apache.spark.sql.connector.expressions.*"), // Avro source implementation is internal. ProblemFilters.exclude[Problem]("org.apache.spark.sql.v2.avro.*"), (problem: Problem) => problem match { case MissingClassProblem(cls) => !cls.fullName.startsWith("org.sparkproject.jpmml") && !cls.fullName.startsWith("org.sparkproject.dmg.pmml") case _ => true }

as defaultExcludes, is that right?

Looks good to me.

dongjoon-hyun

Thank you for all clean-ups, @LuciferYang . It's really meaningful.

Addressed

dongjoon-hyun · 2022-09-01T04:04:21Z

project/MimaExcludes.scala


  // Exclude rules for 3.3.x from 3.2.0
-  lazy val v33excludes = v32excludes ++ Seq(
+  lazy val v33excludes = defaultExcludes ++ Seq(


I thought we can remove v33excludes completely except defaultExcludes because it is a different from 3.2.0. Do we need this when we compare with v3.3.0?

Do you mean val v34excludes = defaultExcludes ++ Seq(...)?

v33excludes should not be needed

dongjoon-hyun · 2022-09-01T04:06:23Z

project/MimaExcludes.scala


  def excludes(version: String) = version match {
    case v if v.startsWith("3.4") => v34excludes
    case v if v.startsWith("3.3") => v33excludes


Shall we remove this line too?

~~org.apache.spark.sql.catalog.* also add to default excludes~~

I've got the wrong idea

case v if v.startsWith("3.3") => v33excludes deleted

cloud-fan · 2022-09-01T04:34:53Z

project/MimaExcludes.scala

    ProblemFilters.exclude[Problem]("org.apache.spark.sql.internal.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.sql.errors.*"),
+    // SPARK-40283: add jdbc as default excludes
+    ProblemFilters.exclude[Problem]("org.apache.spark.sql.jdbc.*"),


We do have developer APIs in this package (e.g. JdbcDialect), but dialect implementations are private (e.g. private object DB2Dialect extends JdbcDialect). I have no idea why mima still checks these dialect implementations...

I have add org.apache.spark.sql.jdbc.* into default excludes list

We have developer APIs there so we can't simply exclude the entire package. If we can't figure out how to fix mima (will it help to add @Private for these private classes/objects?), then we have to add exclude rules one by one for each class in org.apache.spark.sql.jdbc

ignore the class with org.apache.spark.annotation.Private ? The code should be feasible, what do you think about this @dongjoon-hyun

@cloud-fan Are all XXDialect extends JdbcDialect identified as @Private? Let me test it first

So we want to skip all private object?

yea, mima is for checking binary compatibility and why do we care about private APIs?

Yes, that's reasonable. Let me check why GenerateMIMAIgnore didn't ignore them

@cloud-fan

ad543c0 add moduleSymbol.isPrivate condition to directlyPrivateSpark check, then private object will ignore as default.

But we still need explicitly add ProblemFilters.exclude[Problem]("org.apache.spark.sql.jdbc.JdbcDialect.*") due to JdbcDialect is a abstract class now

And @DeveloperApi(for example JdbcDialect) can not be ignore as default due to #11751

cloud-fan · 2022-09-02T16:20:00Z

project/MimaExcludes.scala

    ProblemFilters.exclude[Problem]("org.apache.spark.sql.internal.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.sql.errors.*"),
+    // SPARK-40283: add jdbc.JdbcDialect as default excludes
+    ProblemFilters.exclude[Problem]("org.apache.spark.sql.jdbc.JdbcDialect.*"),


JdbcDialect is a developer API and should be tracked. Can we track it explicitly instead of skipping the entire package?

34fb8c4 fix this

cloud-fan

looks pretty good now, thanks for working on it!

dongjoon-hyun

+1, LGTM. Thank you, @LuciferYang and all!

dongjoon-hyun · 2022-09-02T21:47:01Z

Merged to master for Apache Spark 3.4.0.

update version

d9830ef

LuciferYang changed the title ~~[SPARK-40283][INFRA] MiMa updates after 3.3.0 release: bump previousSparkVersion to 3.3.0~~ [WIP][SPARK-40283][INFRA] MiMa updates after 3.3.0 release: bump previousSparkVersion to 3.3.0 Aug 31, 2022

LuciferYang marked this pull request as draft August 31, 2022 10:50

github-actions bot added the BUILD label Aug 31, 2022

LuciferYang changed the title ~~[WIP][SPARK-40283][INFRA] MiMa updates after 3.3.0 release: bump previousSparkVersion to 3.3.0~~ [WIP][SPARK-40283][INFRA] Update MiMa's previousSparkVersion to 3.3.0 Aug 31, 2022

sql module

f8e3d79

LuciferYang commented Aug 31, 2022

View reviewed changes

project/MimaExcludes.scala Show resolved Hide resolved

LuciferYang added 4 commits August 31, 2022 19:18

catalyst module

0d568b9

catalyst module

b47bc42

catalyst module

bd0cca1

classify by Jira number and add comments

b43c8e3

LuciferYang changed the title ~~[WIP][SPARK-40283][INFRA] Update MiMa's previousSparkVersion to 3.3.0~~ [WIP][SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 Aug 31, 2022

LuciferYang changed the title ~~[WIP][SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0~~ [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 Aug 31, 2022

LuciferYang marked this pull request as ready for review August 31, 2022 12:29

Merge branch 'upmaster' into SPARK-40283

e61ebc2

srowen approved these changes Aug 31, 2022

View reviewed changes

wangyum reviewed Aug 31, 2022

View reviewed changes

project/MimaBuild.scala Show resolved Hide resolved

dongjoon-hyun previously requested changes Aug 31, 2022

View reviewed changes

remove outdated

f2ed5d8

LuciferYang added 3 commits September 1, 2022 08:45

keep sparkInternalexcludes

5be1c1f

keep sparkInternalexcludes

9182c4b

revert

d843a21

LuciferYang commented Sep 1, 2022

View reviewed changes

LuciferYang added 2 commits September 1, 2022 09:15

correct SPARK-38679 and SPARK-39506

d712518

rename to defaultExcludes

31ad559

LuciferYang commented Sep 1, 2022

View reviewed changes

add jdbc to default excludes

c2d9f37

dongjoon-hyun reviewed Sep 1, 2022

View reviewed changes

LuciferYang added 2 commits September 1, 2022 12:23

remove 33

e20d2d7

revert

c9c4b4c

cloud-fan reviewed Sep 1, 2022

View reviewed changes

LuciferYang changed the title ~~[SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0~~ [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 and clean up expired rules Sep 1, 2022

add moduleSymbol.isPrivate

ad543c0

github-actions bot added the INFRA label Sep 1, 2022

Merge branch 'upmaster' into SPARK-40283

2d735ed

cloud-fan reviewed Sep 2, 2022

View reviewed changes

explicitly track JdbcDialect

34fb8c4

LuciferYang changed the title ~~[SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 and clean up expired rules~~ [SPARK-40283][INFRA] Make MiMa check excludes private object by default and bump MiMa's previousSparkVersion to 3.3.0 Sep 2, 2022

LuciferYang changed the title ~~[SPARK-40283][INFRA] Make MiMa check excludes private object by default and bump MiMa's previousSparkVersion to 3.3.0~~ [SPARK-40283][INFRA] Make MiMa check exclude private object by default and bump MiMa's previousSparkVersion to 3.3.0 Sep 2, 2022

LuciferYang changed the title ~~[SPARK-40283][INFRA] Make MiMa check exclude private object by default and bump MiMa's previousSparkVersion to 3.3.0~~ [SPARK-40283][INFRA] Make MiMa check default exclude private object and bump MiMa's previousSparkVersion to 3.3.0 Sep 2, 2022

LuciferYang changed the title ~~[SPARK-40283][INFRA] Make MiMa check default exclude private object and bump MiMa's previousSparkVersion to 3.3.0~~ [SPARK-40283][INFRA] Make MiMa check default exclude private object and bump previousSparkVersion to 3.3.0 Sep 2, 2022

Merge branch 'apache:master' into SPARK-40283

8653819

dongjoon-hyun approved these changes Sep 2, 2022

View reviewed changes

dongjoon-hyun closed this in 28c8073 Sep 2, 2022

[SPARK-40283][INFRA] Make MiMa check default exclude private object and bump previousSparkVersion to 3.3.0 #37741

[SPARK-40283][INFRA] Make MiMa check default exclude private object and bump previousSparkVersion to 3.3.0 #37741

Uh oh!

Conversation

LuciferYang commented Aug 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

LuciferYang commented Aug 31, 2022

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuciferYang commented Sep 1, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

[SPARK-40283][INFRA] Make MiMa check default exclude `private object` and bump `previousSparkVersion` to 3.3.0 #37741

[SPARK-40283][INFRA] Make MiMa check default exclude `private object` and bump `previousSparkVersion` to 3.3.0 #37741

LuciferYang commented Aug 31, 2022 •

edited

Loading

LuciferYang commented Sep 1, 2022 •

edited

Loading

LuciferYang Sep 1, 2022 •

edited

Loading

LuciferYang Sep 1, 2022 •

edited

Loading

LuciferYang Sep 1, 2022 •

edited

Loading

LuciferYang Sep 1, 2022 •

edited

Loading