-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3 #22921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just a bug fix I spotted
R/pkg/R/generics.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung might want to check if I'm handling these R changes correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my concern is that these are breaking changes in a version without having them deprecated first...
could we leave the old one to redirect and add .Deprecate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comment didn't get connected to this one -- @felixcheung what do you think about the argument that this almost surely was meant to be deprecated along with counterparts in Scala/Python? leaving them in would make this inconsistent. As the degrees, radians, and approxCountDistinct are reasonably niche and have a direct replacement that's compatible with older versions, I feel like this is OK for 3.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's super light weight to have a approxCountDistinct that calls approx_count_distinct with deprecation?
I thought was that R API was not always sync or complete compare to python, and a breaking API change - ie. the job will fail - seems a bit drastic even in a major release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, but then again that's exactly what was deprecated and removed in Python and Scala. Major versions can have breaking changes. Yes R isn't always in sync but that's a bug not a feature? Let me surface this to dev@ as I think it's going to come up a few more times.
|
Test build #98353 has finished for PR 22921 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to remove these? they are warnings for users if they set the wrong config right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can add them back. Wasn't sure whether they are still valuable or just old.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep these two lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will add it back. Yeah will leave this open a short while to make sure there is time to comment.
|
seems good to me; might want to leave this open for a few days so more people can take a look |
|
Test build #98354 has finished for PR 22921 at commit
|
|
Test build #98363 has finished for PR 22921 at commit
|
|
Test build #98373 has finished for PR 22921 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rewrite this test case using select(explode()), like what we did in the following test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll try to bring back to the test case.
|
LGTM except a minor comment about the test case. Also we need to fix the PySpark test failure |
holdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this pre Spark 3 cleanup work :)
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about the since annotation here, where was the degrees implementation in 2.1.0? When I look at https://spark.apache.org/docs/latest/api/R/index.html I don't see the degrees function just toDegrees>=?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
degrees was added in Scala/Python in 2.1.0, which is what I was thinking of, but yeah really this must be since 3.0.0 right? I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.. (version here is R API specific)
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment with degrees
python/pyspark/sql/functions.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the removal of this is causing the test failure, maybe do a grep for approxCountDistinct in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in some cases the deprecated user-facing method was named the same way as some internal method and I changed the wrong one. I'll investigate.
python/pyspark/storagelevel.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @MLnick I know this was a thing on your radar in some way for dataframe caching maybe? Do we actually want to remove this for 3+?
|
Test build #98414 has finished for PR 22921 at commit
|
|
Looks okay to me too but I'd also leave this open for few more days. |
R/pkg/R/generics.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my concern is that these are breaking changes in a version without having them deprecated first...
could we leave the old one to redirect and add .Deprecate?
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
degrees and radians will need to be added to NAMESPACE file for export
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's actually new in R for 3.0.0 then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, will fix that one too if I missed it, per #22921 (comment)
|
Yeah it's a good point that these weren't deprecated, but I assume they should have been. Same change, same time, same logic. given that it's a reasonably niche method, I thought it would be best to go ahead and be consistent here? |
|
Test build #98433 has finished for PR 22921 at commit
|
|
Test build #98480 has finished for PR 22921 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung ready for another look. I retained the existing methods but deprecated them, and directed them to the newer methods in the JVM, because the old ones are gone. Not sure I got it 100% right.
|
Test build #98536 has finished for PR 22921 at commit
|
|
Test build #98537 has finished for PR 22921 at commit
|
| #' head(select(df, approx_count_distinct(df$gear, 0.02))) | ||
| #' head(select(df, countDistinct(df$gear, df$cyl))) | ||
| #' head(select(df, n_distinct(df$gear))) | ||
| #' head(distinct(select(df, "gear")))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only need one set - they both are @rdname column_aggregate_functions so will duplicate all other examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @HyukjinKwon fixed this. Pending tests, does the change look OK to you on the R side @felixcheung ?
Fix CRAN check failure at 22921
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R LGTM except 1 comment
R/pkg/R/functions.R
Outdated
| #' @aliases toRadians toRadians,Column-method | ||
| #' @note toRadians since 1.4.0 | ||
| setMethod("toRadians", | ||
| signature(x = "Column"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix indentation?
|
Test build #98551 has finished for PR 22921 at commit
|
|
Test build #98561 has finished for PR 22921 at commit
|
|
Test build #4419 has finished for PR 22921 at commit
|
|
Merged to master |
## What changes were proposed in this pull request? - Remove some AccumulableInfo .apply() methods - Remove non-label-specific multiclass precision/recall/fScore in favor of accuracy - Remove toDegrees/toRadians in favor of degrees/radians (SparkR: only deprecated) - Remove approxCountDistinct in favor of approx_count_distinct (SparkR: only deprecated) - Remove unused Python StorageLevel constants - Remove Dataset unionAll in favor of union - Remove unused multiclass option in libsvm parsing - Remove references to deprecated spark configs like spark.yarn.am.port - Remove TaskContext.isRunningLocally - Remove ShuffleMetrics.shuffle* methods - Remove BaseReadWrite.context in favor of session - Remove Column.!== in favor of =!= - Remove Dataset.explode - Remove Dataset.registerTempTable - Remove SQLContext.getOrCreate, setActive, clearActive, constructors Not touched yet - everything else in MLLib - HiveContext - Anything deprecated more recently than 2.0.0, generally ## How was this patch tested? Existing tests Closes apache#22921 from srowen/SPARK-25908. Lead-authored-by: Sean Owen <[email protected]> Co-authored-by: hyukjinkwon <[email protected]> Co-authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
| override def session(sparkSession: SparkSession): this.type = super.session(sparkSession) | ||
|
|
||
| // override for Java compatibility | ||
| override def context(sqlContext: SQLContext): this.type = super.session(sqlContext.sparkSession) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen This public method seems had not been deprecated before removal, and is avaiable in 2.4.5.
scala> import org.apache.spark.ml.util.GeneralMLWriter
import org.apache.spark.ml.util.GeneralMLWriter
scala> new GeneralMLWriter(null).context(spark.sqlContext)
res0: org.apache.spark.ml.util.GeneralMLWriter = org.apache.spark.ml.util.GeneralMLWriter@26b150cd
There is no deprecation warning above. Does it matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems properly deprecated in MLWriter as its parent.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.util.GeneralMLWriter@context(sqlContext:org.apache.spark.sql.SQLContext):GeneralMLWriter.this.type
and the Scaladoc explicitly shows this at GeneralMLWriter too. Seems right to remove together if we should in MLWriter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it was deprecated in 2.0.0 and marked for removal in 3.0.0.
/**
* Sets the Spark SQLContext to use for saving/loading.
*
* @deprecated Use session instead. This method will be removed in 3.0.0.
*/
@Since("1.6.0")
@deprecated("Use session instead. This method will be removed in 3.0.0.", "2.0.0")
I think ideally the Java overload and subclass overrides would be marked deprecated too, but they implicitly are. If there were a case that this is actually used, we could revive it, but just wondering how often people would be using save + SQLContext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never use this method, just check it. Thanks!
What changes were proposed in this pull request?
Not touched yet
How was this patch tested?
Existing tests