Skip to content

Conversation

@yanboliang
Copy link
Contributor

Change cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number at SparkR side.
There are two reasons that we should make this change:

  • We should follow the naming convention rule of R
  • Spark DataFrame has deprecated the old convention (such as cumeDist) and will remove it in Spark 2.0.

It's better to fix this issue before 1.6 release, otherwise we will make breaking API change.
cc @shivaram @sun-rui

@SparkQA
Copy link

SparkQA commented Nov 27, 2015

Test build #46815 has finished for PR 10016 at commit c56a7bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor

sun-rui commented Nov 27, 2015

code change looks good. New names have no conflict with R base package.

But I don't see they are deprecated now? Could you point to me the discussion or JIRA that they will be removed in Spark 2.0?
If they are deprecated, I am fine with the name change.
If they are not deprecated, could we add new names as aliases, while keeping the old names at the same time?

@yanboliang
Copy link
Contributor Author

@sun-rui Thanks for your comments. You can get the issue from the source code and this PR(#9930).

@sun-rui
Copy link
Contributor

sun-rui commented Nov 27, 2015

@yanboliang, thanks for your info. I didn't look at the latest code:)

LGTM

@shivaram
Copy link
Contributor

LGTM. Thanks @yanboliang -- BTW were these APIs were present in SparkR 1.5 ? If so we'll need to update the release docs about this change.

asfgit pushed a commit that referenced this pull request Nov 27, 2015
Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side.
There are two reasons that we should make this change:
* We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645)
* Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0.

It's better to fix this issue before 1.6 release, otherwise we will make breaking API change.
cc shivaram sun-rui

Author: Yanbo Liang <[email protected]>

Closes #10016 from yanboliang/SPARK-12025.

(cherry picked from commit ba02f6c)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@asfgit asfgit closed this in ba02f6c Nov 27, 2015
@felixcheung
Copy link
Member

@shivaram they were added only for Spark 1.6, so no need to update release doc on breaking changes
dc3220c
40c77fb

But the new names seem to mask methods from dplyr.

@yanboliang yanboliang deleted the SPARK-12025 branch November 29, 2015 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants