Skip to content

Conversation

@felixcheung
Copy link
Member

Checked names, none of them should conflict with anything in base

@shivaram @davies @rxin

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45109 has finished for PR 9489 at commit 58d0fd9.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45148 has finished for PR 9489 at commit 7a2a904.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Nov 9, 2015

cc @sun-rui

@davies
Copy link
Contributor

davies commented Nov 9, 2015

LGTM

@mengxr
Copy link
Contributor

mengxr commented Nov 9, 2015

Should we add aliases var and sd in SparkR, which are more familiar to R users?

@felixcheung
Copy link
Member Author

@mengxr possibly.. though there are usage difference with SparkR DataFrame/Column as compared to R data.frame (eg. agg vs aggregate, var(faithful$eruptions) would have been a Column with DataFrame)
It might be more confusing.

@shivaram
Copy link
Contributor

Looks like dply has aggregate functions for variance, standard deviations ? Can we just match the syntax there ? https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

@felixcheung
Copy link
Member Author

This does work

> head(agg(df, stddev(df$age)))    # same as head(summarize(df, stddev(df$age)))
  stddev_samp(age)
1         7.778175

Also these work

> gd <- groupBy(df, "name")
> head(agg(gd, stddev(df$age)))    # same as head(agg(gd, age = "stddev"))
     name stddev_samp(age)
1    Andy                0
2 Michael               NA
3  Justin                0

I think they match dplyr, so I could name them sd and var, but just as I mention, doesn't work as one would expect on data.frame column vector.

@sun-rui
Copy link
Contributor

sun-rui commented Nov 10, 2015

@shivaram, I looked at dplyr package, there is no "sd", "var" defined in the package, seems the page is referencing "sd" "var" in R base.

Yes, stddev/variance agg functions we are adding have similar semantic as "sd" and "var" in R, as you can think of a Column as a vector of same type. So we'd better add aliases for them.

Note that var() in R computes sample variance (I think "sd" also uses sample variance, but not verified) instead of population variance.

R/pkg/NAMESPACE Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"stddev_pop" before "stddev_samp",

@shivaram
Copy link
Contributor

Yeah in that case lets add aliases for sd and var as well and match the population/sample behavior.

@felixcheung I think similar functions like mean don't match base R data frame but they are useful when used inside an expression and so its fine to have aliases for this.

@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45581 has finished for PR 9489 at commit 9243fa4.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45585 has finished for PR 9489 at commit 7498e39.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor

sun-rui commented Nov 11, 2015

LGTM

@davies
Copy link
Contributor

davies commented Nov 11, 2015

Merging this into master and 1.6 branch, thanks!

asfgit pushed a commit that referenced this pull request Nov 11, 2015
Checked names, none of them should conflict with anything in base

shivaram davies rxin

Author: felixcheung <[email protected]>

Closes #9489 from felixcheung/rstddev.

(cherry picked from commit 1a8e046)
Signed-off-by: Davies Liu <[email protected]>
@asfgit asfgit closed this in 1a8e046 Nov 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants