Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Nov 1, 2018

What changes were proposed in this pull request?

  • Remove some AccumulableInfo .apply() methods
  • Remove non-label-specific multiclass precision/recall/fScore in favor of accuracy
  • Remove toDegrees/toRadians in favor of degrees/radians (SparkR: only deprecated)
  • Remove approxCountDistinct in favor of approx_count_distinct (SparkR: only deprecated)
  • Remove unused Python StorageLevel constants
  • Remove Dataset unionAll in favor of union
  • Remove unused multiclass option in libsvm parsing
  • Remove references to deprecated spark configs like spark.yarn.am.port
  • Remove TaskContext.isRunningLocally
  • Remove ShuffleMetrics.shuffle* methods
  • Remove BaseReadWrite.context in favor of session
  • Remove Column.!== in favor of =!=
  • Remove Dataset.explode
  • Remove Dataset.registerTempTable
  • Remove SQLContext.getOrCreate, setActive, clearActive, constructors

Not touched yet

  • everything else in MLLib
  • HiveContext
  • Anything deprecated more recently than 2.0.0, generally

How was this patch tested?

Existing tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a bug fix I spotted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung might want to check if I'm handling these R changes correctly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my concern is that these are breaking changes in a version without having them deprecated first...
could we leave the old one to redirect and add .Deprecate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment didn't get connected to this one -- @felixcheung what do you think about the argument that this almost surely was meant to be deprecated along with counterparts in Scala/Python? leaving them in would make this inconsistent. As the degrees, radians, and approxCountDistinct are reasonably niche and have a direct replacement that's compatible with older versions, I feel like this is OK for 3.0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's super light weight to have a approxCountDistinct that calls approx_count_distinct with deprecation?
I thought was that R API was not always sync or complete compare to python, and a breaking API change - ie. the job will fail - seems a bit drastic even in a major release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but then again that's exactly what was deprecated and removed in Python and Scala. Major versions can have breaking changes. Yes R isn't always in sync but that's a bug not a feature? Let me surface this to dev@ as I think it's going to come up a few more times.

@SparkQA
Copy link

SparkQA commented Nov 1, 2018

Test build #98353 has finished for PR 22921 at commit 259e7d1.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to remove these? they are warnings for users if they set the wrong config right

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add them back. Wasn't sure whether they are still valuable or just old.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep these two lines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will add it back. Yeah will leave this open a short while to make sure there is time to comment.

@rxin
Copy link
Contributor

rxin commented Nov 1, 2018

seems good to me; might want to leave this open for a few days so more people can take a look

@SparkQA
Copy link

SparkQA commented Nov 1, 2018

Test build #98354 has finished for PR 22921 at commit 7a0ecd2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 1, 2018

Test build #98363 has finished for PR 22921 at commit d50b5b5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 2, 2018

Test build #98373 has finished for PR 22921 at commit bd4f5ab.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rewrite this test case using select(explode()), like what we did in the following test cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll try to bring back to the test case.

@gatorsmile
Copy link
Member

LGTM except a minor comment about the test case. Also we need to fix the PySpark test failure

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this pre Spark 3 cleanup work :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the since annotation here, where was the degrees implementation in 2.1.0? When I look at https://spark.apache.org/docs/latest/api/R/index.html I don't see the degrees function just toDegrees>=?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

degrees was added in Scala/Python in 2.1.0, which is what I was thinking of, but yeah really this must be since 3.0.0 right? I'll fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.. (version here is R API specific)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment with degrees

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the removal of this is causing the test failure, maybe do a grep for approxCountDistinct in the tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in some cases the deprecated user-facing method was named the same way as some internal method and I changed the wrong one. I'll investigate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @MLnick I know this was a thing on your radar in some way for dataframe caching maybe? Do we actually want to remove this for 3+?

@SparkQA
Copy link

SparkQA commented Nov 3, 2018

Test build #98414 has finished for PR 22921 at commit 57ef4e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Looks okay to me too but I'd also leave this open for few more days.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my concern is that these are breaking changes in a version without having them deprecated first...
could we leave the old one to redirect and add .Deprecate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

degrees and radians will need to be added to NAMESPACE file for export

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's actually new in R for 3.0.0 then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will fix that one too if I missed it, per #22921 (comment)

@srowen
Copy link
Member Author

srowen commented Nov 3, 2018

Yeah it's a good point that these weren't deprecated, but I assume they should have been. Same change, same time, same logic. given that it's a reasonably niche method, I thought it would be best to go ahead and be consistent here?

@SparkQA
Copy link

SparkQA commented Nov 4, 2018

Test build #98433 has finished for PR 22921 at commit df92f0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 5, 2018

Test build #98480 has finished for PR 22921 at commit a6891f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member Author

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung ready for another look. I retained the existing methods but deprecated them, and directed them to the newer methods in the JVM, because the old ones are gone. Not sure I got it 100% right.

@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98536 has finished for PR 22921 at commit 6bcbf79.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98537 has finished for PR 22921 at commit af748d5.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

#' head(select(df, approx_count_distinct(df$gear, 0.02)))
#' head(select(df, countDistinct(df$gear, df$cyl)))
#' head(select(df, n_distinct(df$gear)))
#' head(distinct(select(df, "gear")))}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need one set - they both are @rdname column_aggregate_functions so will duplicate all other examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @HyukjinKwon fixed this. Pending tests, does the change look OK to you on the R side @felixcheung ?

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R LGTM except 1 comment

#' @aliases toRadians toRadians,Column-method
#' @note toRadians since 1.4.0
setMethod("toRadians",
signature(x = "Column"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation?

@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98551 has finished for PR 22921 at commit 3070975.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98561 has finished for PR 22921 at commit 9f1ced3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 8, 2018

Test build #4419 has finished for PR 22921 at commit 9f1ced3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Nov 8, 2018

Merged to master

@asfgit asfgit closed this in 0025a83 Nov 8, 2018
@srowen srowen deleted the SPARK-25908 branch November 14, 2018 20:54
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

- Remove some AccumulableInfo .apply() methods
- Remove non-label-specific multiclass precision/recall/fScore in favor of accuracy
- Remove toDegrees/toRadians in favor of degrees/radians (SparkR: only deprecated)
- Remove approxCountDistinct in favor of approx_count_distinct (SparkR: only deprecated)
- Remove unused Python StorageLevel constants
- Remove Dataset unionAll in favor of union
- Remove unused multiclass option in libsvm parsing
- Remove references to deprecated spark configs like spark.yarn.am.port
- Remove TaskContext.isRunningLocally
- Remove ShuffleMetrics.shuffle* methods
- Remove BaseReadWrite.context in favor of session
- Remove Column.!== in favor of =!=
- Remove Dataset.explode
- Remove Dataset.registerTempTable
- Remove SQLContext.getOrCreate, setActive, clearActive, constructors

Not touched yet

- everything else in MLLib
- HiveContext
- Anything deprecated more recently than 2.0.0, generally

## How was this patch tested?

Existing tests

Closes apache#22921 from srowen/SPARK-25908.

Lead-authored-by: Sean Owen <[email protected]>
Co-authored-by: hyukjinkwon <[email protected]>
Co-authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
override def session(sparkSession: SparkSession): this.type = super.session(sparkSession)

// override for Java compatibility
override def context(sqlContext: SQLContext): this.type = super.session(sqlContext.sparkSession)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen This public method seems had not been deprecated before removal, and is avaiable in 2.4.5.

scala> import org.apache.spark.ml.util.GeneralMLWriter
import org.apache.spark.ml.util.GeneralMLWriter

scala> new GeneralMLWriter(null).context(spark.sqlContext)
res0: org.apache.spark.ml.util.GeneralMLWriter = org.apache.spark.ml.util.GeneralMLWriter@26b150cd

There is no deprecation warning above. Does it matter?

Copy link
Member

@HyukjinKwon HyukjinKwon Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems properly deprecated in MLWriter as its parent.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.util.GeneralMLWriter@context(sqlContext:org.apache.spark.sql.SQLContext):GeneralMLWriter.this.type
and the Scaladoc explicitly shows this at GeneralMLWriter too. Seems right to remove together if we should in MLWriter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was deprecated in 2.0.0 and marked for removal in 3.0.0.

 /**	
   * Sets the Spark SQLContext to use for saving/loading.	
   *	
   * @deprecated Use session instead. This method will be removed in 3.0.0.	
   */	
  @Since("1.6.0")	
  @deprecated("Use session instead. This method will be removed in 3.0.0.", "2.0.0")

I think ideally the Java overload and subclass overrides would be marked deprecated too, but they implicitly are. If there were a case that this is actually used, we could revive it, but just wondering how often people would be using save + SQLContext?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never use this method, just check it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants