Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Aug 20, 2014

Documentation for newly added feature transformations:

  1. TF-IDF
  2. StandardScaler
  3. Normalizer

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have started for PR 2068 at commit e339f64.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have finished for PR 2068 at commit e339f64.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • shift # Ignore main class (org.apache.spark.deploy.SparkSubmit) and use our own

@mengxr
Copy link
Contributor

mengxr commented Aug 20, 2014

copy @atalwalkar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too strong of a statement. Why not just say "Normalizing features to have unit variance and/or zero mean is very a common preprocessing step."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I say
"For example, RBF kernel of Support Vector Machines
or the L1 and L2 regularized linear models typically works better when all features have unit variance
and/or zero mean."

I actually have this statement from scikit documentation.
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion sounds good to me! Thanks.

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have started for PR 2068 at commit 0a8fd34.

  • This patch does not merge cleanly!

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have finished for PR 2068 at commit 0a8fd34.

  • This patch fails unit tests.
  • This patch does not merge cleanly!

@dbtsai
Copy link
Member Author

dbtsai commented Aug 23, 2014

@atalwalkar and @mengxr I just addressed the merge conflict. I think it's ready to merge. Thanks.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have started for PR 2068 at commit 109f324.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

Tests timed out after a configured wait of 120m.

asfgit pushed a commit that referenced this pull request Aug 25, 2014
Documentation for newly added feature transformations:
1. TF-IDF
2. StandardScaler
3. Normalizer

Author: DB Tsai <[email protected]>

Closes #2068 from dbtsai/transformer-documentation and squashes the following commits:

109f324 [DB Tsai] address feedback

(cherry picked from commit 572952a)
Signed-off-by: Xiangrui Meng <[email protected]>
@mengxr
Copy link
Contributor

mengxr commented Aug 25, 2014

LGTM. Merged into master and branch-1.1! Thanks for helping on the documentation!!

@asfgit asfgit closed this in 572952a Aug 25, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Documentation for newly added feature transformations:
1. TF-IDF
2. StandardScaler
3. Normalizer

Author: DB Tsai <[email protected]>

Closes apache#2068 from dbtsai/transformer-documentation and squashes the following commits:

109f324 [DB Tsai] address feedback
@dbtsai dbtsai deleted the transformer-documentation branch October 28, 2014 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants