Skip to content

Conversation

@Lewuathe
Copy link
Contributor

This will be important to improve LinearRegressionSummary, which currently has a mix of weighted and unweighted metrics.

@SparkQA
Copy link

SparkQA commented Nov 23, 2015

Test build #46525 has finished for PR 9907 at commit 1d4a5fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Lewuathe
Copy link
Contributor Author

@mengxr @jkbradley Could you review this? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rename this summary variable, it is used 3 times for different objects.

Also, sample is a bit too generic here, and using the _xx methods are not very intuitive. I suggest you unpack the tuple fully: { case (currentSummary, (vec, weight)) => currentSummary.add(vec, weight) }

@thunterdb
Copy link
Contributor

@Lewuathe thanks for your patch. I think it will require more work in RegressionMetrics to fully implement weighted metrics. We need to do the following changes:

  • expose weightSum in MultivariateStatisticalSummary (as a developer API)
  • the computations of SSreg and SStot should take the weights into account
  • in RegressionMetrics, all references to summary.count should be replaced by summary.weightSum
    Given that the default weights are 0, it should give the same result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants