Skip to content

Conversation

@yinxusen
Copy link
Contributor

See SPARK-6528. Add IDF transformer in ML package.

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29400 has started for PR 5266 at commit 2aa4be0.

@brennonyork
Copy link

To test #5269 I'm going to rerun these Jenkins tests as this is a prime example of that bug.

@brennonyork
Copy link

jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29421 has finished for PR 5266 at commit 2aa4be0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IDF extends Estimator[IDFModel] with IDFParams
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val idf = udf { (v: Vector) => idfModel.transform(v) }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengxr Here is my concern: Even if we put the transformSchema in the base class, there are too many boilerplates. Same code and same problem in StandardScaler. I think the Vector to Vector transform will be very common, how about pulling them in one place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary in this PR. We should make utility functions to make the type checks easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29581 has finished for PR 5266 at commit 03fbecb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IDF extends Estimator[IDFModel] with IDFParams
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks really complicated to me and hard to read. Could we pre-compute the expected output and validate the results directly?

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29931 has finished for PR 5266 at commit aef2cdf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IDF extends Estimator[IDFModel] with IDFParams
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29930 has finished for PR 5266 at commit 5760b49.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IDF extends Estimator[IDFModel] with IDFParams
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30730 has finished for PR 5266 at commit 4338a37.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IDF extends Estimator[IDFModel] with IDFParams
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30742 has finished for PR 5266 at commit c9c3759.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IDF extends Estimator[IDFModel] with IDFBase
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30745 has finished for PR 5266 at commit d169967.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class IDF extends Estimator[IDFModel] with IDFBase
  • This patch does not change any dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengxr Should I use map(minDocFreq) instead of getMinDocFreq? I think getMinDocFreq cannot fetch new param from paramMap.

@SparkQA
Copy link

SparkQA commented Apr 24, 2015

Test build #30927 has finished for PR 5266 at commit 741db31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class IDF extends Estimator[IDFModel] with IDFBase
  • This patch does not change any dependencies.

@yinxusen
Copy link
Contributor Author

@mengxr ready to review

@mengxr
Copy link
Contributor

mengxr commented Apr 24, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 6e57d57 Apr 24, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 14, 2015
See [SPARK-6528](https://issues.apache.org/jira/browse/SPARK-6528). Add IDF transformer in ML package.

Author: Xusen Yin <[email protected]>

Closes apache#5266 from yinxusen/SPARK-6528 and squashes the following commits:

741db31 [Xusen Yin] get param from new paramMap
d169967 [Xusen Yin] add final to param and IDF class
c9c3759 [Xusen Yin] simplify test suite
5867c09 [Xusen Yin] refine IDF transformer with new interfaces
7727cae [Xusen Yin] Merge branch 'master' into SPARK-6528
4338a37 [Xusen Yin] Merge branch 'master' into SPARK-6528
aef2cdf [Xusen Yin] add doc and group for param
5760b49 [Xusen Yin] fix code style
2add691 [Xusen Yin] fix code style and test
03fbecb [Xusen Yin] remove duplicated code
2aa4be0 [Xusen Yin] clean test suite
4802c67 [Xusen Yin] add IDF transformer and test suite
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
See [SPARK-6528](https://issues.apache.org/jira/browse/SPARK-6528). Add IDF transformer in ML package.

Author: Xusen Yin <[email protected]>

Closes apache#5266 from yinxusen/SPARK-6528 and squashes the following commits:

741db31 [Xusen Yin] get param from new paramMap
d169967 [Xusen Yin] add final to param and IDF class
c9c3759 [Xusen Yin] simplify test suite
5867c09 [Xusen Yin] refine IDF transformer with new interfaces
7727cae [Xusen Yin] Merge branch 'master' into SPARK-6528
4338a37 [Xusen Yin] Merge branch 'master' into SPARK-6528
aef2cdf [Xusen Yin] add doc and group for param
5760b49 [Xusen Yin] fix code style
2add691 [Xusen Yin] fix code style and test
03fbecb [Xusen Yin] remove duplicated code
2aa4be0 [Xusen Yin] clean test suite
4802c67 [Xusen Yin] add IDF transformer and test suite
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants