Skip to content

Conversation

@teunbrand
Copy link
Collaborator

This PR aims to fix #5058.

Briefly, it adds an optional weight aesthetic to stat_ecdf(). If a weight is present, we calculate the ecdf in a different way, wherein each observations is weighted by the amount of the observation's weight relative to the sum of all weights in the group.

@thomasp85
Copy link
Member

I have no experience with ecdfs so I can't really comment on the correctness of the weighted implementation. @clauswilke or @yutannihilation do you feel confident in reviewing this?

@yutannihilation
Copy link
Member

I too am not familiar with eCDF, sorry...

@clauswilke
Copy link
Member

clauswilke commented Mar 22, 2023

The calculation looks correct on first glance. I might want to read it through a little more carefully before signing off on it, but fundamentally it's very simple. An eCDF is simply a cumulative sum of the ordered values, divided by the total sum. To make this weighted, you multiply each value by a weight before you sum.

I have one concern though: I don't particularly like using a built-in function to calculate eCDF and a custom function to calculate weCDF. Why not use the same function for both and set the weights to 1 if not provided?

@teunbrand
Copy link
Collaborator Author

I have one concern though: I don't particularly like using a built-in function to calculate eCDF and a custom function to calculate weCDF. Why not use the same function for both and set the weights to 1 if not provided?

Fair point, I don't think there is a particular reason I did it this way. Setting the weights to 1 should indeed give identical output.

Merge branch 'main' into weighted_ecdf

# Conflicts:
#	R/stat-ecdf.R
#	tests/testthat/test-stat-ecdf.R
Merge branch 'main' into weighted_ecdf

# Conflicts:
#	R/stat-ecdf.R
#	tests/testthat/test-stat-ecdf.R
Merge branch 'weighted_ecdf' of https://github.com/teunbrand/ggplot2 into weighted_ecdf

# Conflicts:
#	R/stat-ecdf.R
@teunbrand teunbrand added feature a feature request or enhancement layers 📈 labels Jul 9, 2023
Copy link
Member

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the one comment

R/stat-ecdf.R Outdated
Comment on lines 124 to 128
if (is.null(data$weight)) {
data_ecdf <- ecdf(data$x)(x)
} else {
data_ecdf <- wecdf(data$x, data$weight)(x)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just use wecdf() as per the discussion

teunbrand added 4 commits May 20, 2024 10:39
Merge branch 'main' into weighted_ecdf

# Conflicts:
#	R/stat-ecdf.R
@teunbrand teunbrand merged commit e942833 into tidyverse:main May 20, 2024
@teunbrand teunbrand deleted the weighted_ecdf branch May 20, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature a feature request or enhancement layers 📈

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add weights to stat_ecdf

4 participants