Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Aug 31, 2022

What changes were proposed in this pull request?

The main change of this pr as follows:

  1. Make MiMa check excludes private object by default
  2. Bump MiMa's previousSparkVersion to 3.3.0
  3. Supplement missing ProblemFilters to MimaExcludes
  4. Clean up expired rules and case match
  5. Correct the rule added by SPARK-38679 and SPARK-39506 but misplaced

Why are the changes needed?

To ensure that MiMa checks cover new APIs added in Spark 3.3.0.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Scala 2.12

dev/mima -Pscala-2.12

Scala 2.13

dev/change-scala-version.sh 2.13
dev/mima -Pscala-2.13

@LuciferYang LuciferYang changed the title [SPARK-40283][INFRA] MiMa updates after 3.3.0 release: bump previousSparkVersion to 3.3.0 [WIP][SPARK-40283][INFRA] MiMa updates after 3.3.0 release: bump previousSparkVersion to 3.3.0 Aug 31, 2022
@LuciferYang LuciferYang marked this pull request as draft August 31, 2022 10:50
@github-actions github-actions bot added the BUILD label Aug 31, 2022
@LuciferYang LuciferYang changed the title [WIP][SPARK-40283][INFRA] MiMa updates after 3.3.0 release: bump previousSparkVersion to 3.3.0 [WIP][SPARK-40283][INFRA] Update MiMa's previousSparkVersion to 3.3.0 Aug 31, 2022
@LuciferYang LuciferYang changed the title [WIP][SPARK-40283][INFRA] Update MiMa's previousSparkVersion to 3.3.0 [WIP][SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 Aug 31, 2022
@LuciferYang LuciferYang changed the title [WIP][SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 Aug 31, 2022
@LuciferYang LuciferYang marked this pull request as ready for review August 31, 2022 12:29
@LuciferYang
Copy link
Contributor Author

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, do you think you can clean up more like SPARK-36004?

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Sep 1, 2022

Maybe, do you think you can clean up more like SPARK-36004?

like f2ed5d8?

@LuciferYang
Copy link
Contributor Author

Maybe, do you think you can clean up more like SPARK-36004?

like f2ed5d8?

Seems failed. Let me check again

ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.shuffle.api.SingleSpillShuffleMapOutputWriter.transferMapSpillFile"),
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.shuffle.api.ShuffleMapOutputWriter.commitAllPartitions"),

// [SPARK-39506] In terms of 3 layer namespace effort, add currentCatalog, setCurrentCatalog and listCatalogs API to Catalog interface
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong place, will correct it later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be v34excludes

// [SPARK-36173][CORE] Support getting CPU number in TaskContext
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.TaskContext.cpus"),

// [SPARK-38679][CORE] Expose the number of partitions in a stage to TaskContext
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong place, should in v34excludes

ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.apply")
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.apply"),

// [SPARK-38679][CORE] Expose the number of partitions in a stage to TaskContext
Copy link
Contributor Author

@LuciferYang LuciferYang Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

It was placed in v32excludes before, and move to v34excludes in this pr.

cc @vkorukanti and @cloud-fan SPARK-38679:

// [SPARK-38679][CORE] Expose the number of partitions in a stage to TaskContext
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.TaskContext.numPartitions"),

// [SPARK-39506] In terms of 3 layer namespace effort, add currentCatalog, setCurrentCatalog and listCatalogs API to Catalog interface
Copy link
Contributor Author

@LuciferYang LuciferYang Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.types.Decimal.fromStringANSI"),
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.types.Decimal.fromStringANSI$default$3"),

// [SPARK-39704][SQL] Implement createIndex & dropIndex & indexExists in JDBC (H2 dialect)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they should not be checked by mima... Does mima have a list to mark private APIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure? we can add org.apache.spark.sql.jdbc.* as defaultExcludes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c2d9f37 add org.apache.spark.sql.jdbc.* as default excludes, is this ok?

ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.setCurrentCatalog"),
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.listCatalogs"),

// [SPARK-38929][SQL] Improve error messages for cast failures in ANSI
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.jdbc.TeradataDialect.dropIndex"),
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.jdbc.TeradataDialect.indexExists"),

// [SPARK-39759][SQL] Implement listIndexes in JDBC (H2 dialect)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.jdbc.TeradataDialect.listIndexes"),

// [SPARK-36511][MINOR][SQL] Remove ColumnIOUtil
ProblemFilters.exclude[MissingClassProblem]("org.apache.parquet.io.ColumnIOUtil")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Exclude rules for 3.2.x from 3.1.1
lazy val v32excludes = Seq(
// Defulat exclude rules
lazy val defaultExcludes = Seq(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun

I keep

    // Spark Internals
    ProblemFilters.exclude[Problem]("org.apache.spark.rpc.*"),
    ProblemFilters.exclude[Problem]("org.spark-project.jetty.*"),
    ProblemFilters.exclude[Problem]("org.spark_project.jetty.*"),
    ProblemFilters.exclude[Problem]("org.sparkproject.jetty.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.internal.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.unused.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.unsafe.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.memory.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.util.collection.unsafe.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.sql.catalyst.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.sql.execution.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.sql.internal.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.sql.errors.*"),
    // DSv2 catalog and expression APIs are unstable yet. We should enable this back.
    ProblemFilters.exclude[Problem]("org.apache.spark.sql.connector.catalog.*"),
    ProblemFilters.exclude[Problem]("org.apache.spark.sql.connector.expressions.*"),
    // Avro source implementation is internal.
    ProblemFilters.exclude[Problem]("org.apache.spark.sql.v2.avro.*"),

    (problem: Problem) => problem match {
      case MissingClassProblem(cls) => !cls.fullName.startsWith("org.sparkproject.jpmml") &&
          !cls.fullName.startsWith("org.sparkproject.dmg.pmml")
      case _ => true
    }

as defaultExcludes, is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all clean-ups, @LuciferYang . It's really meaningful.


// Exclude rules for 3.3.x from 3.2.0
lazy val v33excludes = v32excludes ++ Seq(
lazy val v33excludes = defaultExcludes ++ Seq(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we can remove v33excludes completely except defaultExcludes because it is a different from 3.2.0. Do we need this when we compare with v3.3.0?

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean val v34excludes = defaultExcludes ++ Seq(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v33excludes should not be needed


def excludes(version: String) = version match {
case v if v.startsWith("3.4") => v34excludes
case v if v.startsWith("3.3") => v33excludes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we remove this line too?

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.spark.sql.catalog.* also add to default excludes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got the wrong idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case v if v.startsWith("3.3") => v33excludes deleted

ProblemFilters.exclude[Problem]("org.apache.spark.sql.internal.*"),
ProblemFilters.exclude[Problem]("org.apache.spark.sql.errors.*"),
// SPARK-40283: add jdbc as default excludes
ProblemFilters.exclude[Problem]("org.apache.spark.sql.jdbc.*"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have developer APIs in this package (e.g. JdbcDialect), but dialect implementations are private (e.g. private object DB2Dialect extends JdbcDialect). I have no idea why mima still checks these dialect implementations...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have add org.apache.spark.sql.jdbc.* into default excludes list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have developer APIs there so we can't simply exclude the entire package. If we can't figure out how to fix mima (will it help to add @Private for these private classes/objects?), then we have to add exclude rules one by one for each class in org.apache.spark.sql.jdbc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore the class with org.apache.spark.annotation.Private ? The code should be feasible, what do you think about this @dongjoon-hyun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Are all XXDialect extends JdbcDialect identified as @Private? Let me test it first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we want to skip all private object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, mima is for checking binary compatibility and why do we care about private APIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's reasonable. Let me check why GenerateMIMAIgnore didn't ignore them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan

ad543c0 add moduleSymbol.isPrivate condition to directlyPrivateSpark check, then private object will ignore as default.

But we still need explicitly add ProblemFilters.exclude[Problem]("org.apache.spark.sql.jdbc.JdbcDialect.*") due to JdbcDialect is a abstract class now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And @DeveloperApi(for example JdbcDialect) can not be ignore as default due to #11751

@LuciferYang LuciferYang changed the title [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 and clean up expired rules Sep 1, 2022
@github-actions github-actions bot added the INFRA label Sep 1, 2022
ProblemFilters.exclude[Problem]("org.apache.spark.sql.internal.*"),
ProblemFilters.exclude[Problem]("org.apache.spark.sql.errors.*"),
// SPARK-40283: add jdbc.JdbcDialect as default excludes
ProblemFilters.exclude[Problem]("org.apache.spark.sql.jdbc.JdbcDialect.*"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JdbcDialect is a developer API and should be tracked. Can we track it explicitly instead of skipping the entire package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34fb8c4 fix this

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good now, thanks for working on it!

@LuciferYang LuciferYang changed the title [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 and clean up expired rules [SPARK-40283][INFRA] Make MiMa check excludes private object by default and bump MiMa's previousSparkVersion to 3.3.0 Sep 2, 2022
@LuciferYang LuciferYang changed the title [SPARK-40283][INFRA] Make MiMa check excludes private object by default and bump MiMa's previousSparkVersion to 3.3.0 [SPARK-40283][INFRA] Make MiMa check exclude private object by default and bump MiMa's previousSparkVersion to 3.3.0 Sep 2, 2022
@LuciferYang LuciferYang changed the title [SPARK-40283][INFRA] Make MiMa check exclude private object by default and bump MiMa's previousSparkVersion to 3.3.0 [SPARK-40283][INFRA] Make MiMa check default exclude private object and bump MiMa's previousSparkVersion to 3.3.0 Sep 2, 2022
@LuciferYang LuciferYang changed the title [SPARK-40283][INFRA] Make MiMa check default exclude private object and bump MiMa's previousSparkVersion to 3.3.0 [SPARK-40283][INFRA] Make MiMa check default exclude private object and bump previousSparkVersion to 3.3.0 Sep 2, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang and all!

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 3.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants