Skip to content

Conversation

@yanboliang
Copy link
Contributor

withStd and withMean should be params of StandardScaler and StandardScalerModel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to check mean and std which are parts of the model, withStd and withStd are params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withStd and withStd of StandardScalerModel must be inherited from StandardScaler, so we can not construct StandardScalerModel directly by specifying the two variables. Here we combine the original test cases into one with testEstimatorAndModelReadWrite which both test the estimator and model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not an ideal unit test for read/write because the model fitting part shouldn't be part of it, which is already covered by other tests. Constructing estimator and model directly can save some test time.

@SparkQA
Copy link

SparkQA commented Nov 19, 2015

Test build #46330 has finished for PR 9839 at commit 37fe45d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2015

Test build #46333 has finished for PR 9839 at commit 76ef338.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If withMean and withStd are parameters, we should save them in metadata/ but not both under data/ and medadata/. Can we change the constructor of ml.StandardScalerModel to take only std and mean but construct scaler only inside transform? So scaler is no longer a member variable. We can fix performance issues in 1.7.

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46413 has finished for PR 9839 at commit c6b6d7e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46415 has finished for PR 9839 at commit c6b6d7e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Nov 20, 2015
```withStd``` and ```withMean``` should be params of ```StandardScaler``` and ```StandardScalerModel```.

Author: Yanbo Liang <[email protected]>

Closes #9839 from yanboliang/standardScaler-refactor.

(cherry picked from commit 9ace2e5)
Signed-off-by: Xiangrui Meng <[email protected]>
@mengxr
Copy link
Contributor

mengxr commented Nov 20, 2015

LGTM. Merged into master and branch-1.6. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants