[SPARK-8467][MLlib][PySpark] Add LDAModel.describeTopics() in Python #8643

yu-iskw · 2015-09-07T13:50:47Z

Could @jkbradley and @davies review it?

Create a wrapper class: LDAModelWrapper for LDAModel. Because we can't deal with the return value ofdescribeTopics in Scala from pyspark directly. Array[(Array[Int], Array[Double])] is too complicated to convert it.
Add loadLDAModel in PythonMLlibAPI. Since LDAModel in Scala is an abstract class and we need to call load of DistributedLDAModel.

[SPARK-8467] Add LDAModel.describeTopics() in Python - ASF JIRA

SparkQA · 2015-09-07T14:01:16Z

Test build #42096 has finished for PR 8643 at commit f300798.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class LDAModel(JavaModelWrapper, JavaSaveable, Loader):

SparkQA · 2015-09-07T15:42:52Z

Test build #42099 has finished for PR 8643 at commit 97e78b6.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class LDAModel(JavaModelWrapper, JavaSaveable, Loader):

jkbradley · 2015-09-11T17:43:03Z

@yu-iskw Rather than using Java Any types and the old serialization patterns, would it be easier to convert to a local DataFrame? We should be able to take advantage of DataFrame serialization.

yu-iskw · 2015-09-17T06:33:04Z

@jkbradley thank you for the comment. Just to be sure, LDAModelWrapper.describeTopics() should return a DataFrame and then extract the return value from the DataFrame in pyspark, right?
If so, I'm not sure about that. It looks a little strange for me to use APIs related to DataFrame under spark.mllib.

@davies what do you think?

jkbradley · 2015-09-17T20:10:47Z

@yu-iskw I think it's OK to use DataFrame internally in spark.mllib. It already has the dependency, and it would be a private API.

yu-iskw · 2015-10-22T22:25:40Z

@jkbradley sorry for the delay of my update. I tried to use DataFrame serialization at yu-iskw@2f70193. Could you review it?

SparkQA · 2015-10-23T00:15:10Z

Test build #44182 has finished for PR 8643 at commit 2f70193.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

SparkQA · 2015-11-02T10:06:26Z

Test build #44799 has finished for PR 8643 at commit 353a6b0.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

… conversion

…ataFrame

davies · 2015-11-02T17:50:18Z

mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala

Serialize a DataFrame will trigger a Spark job, we could still use Pickle to serialize them without DataFrame, via PythomMLLibAPI.dumps()

@davies thanks for the comment. Should we rather PythonMLlibAPI.dmups() than Java Any types like below?
yu-iskw@e1c66d0#diff-71f42172be0b5fc14827b7bb31f4e80bR34

… local DataFrame" This reverts commit 1f98ac8.

…ization" This reverts commit 6e3cf05.

SparkQA · 2015-11-03T05:37:12Z

Test build #44881 has finished for PR 8643 at commit 0bc114e.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

yu-iskw · 2015-11-03T05:39:19Z

I reverted DataFrame serialization to Java Any types.

davies · 2015-11-03T06:52:38Z

mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala

nit: this line could be Array[Any](terms, termWeights)

…beTopics`

SparkQA · 2015-11-06T20:45:59Z

Test build #45250 has finished for PR 8643 at commit 56b69f9.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

yu-iskw · 2015-11-06T21:02:04Z

Jenkins, test this please

shaneknapp · 2015-11-06T21:15:32Z

jenkins, test this please

SparkQA · 2015-11-06T22:13:02Z

Test build #45258 has finished for PR 8643 at commit 81ee096.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

yu-iskw · 2015-11-07T00:26:28Z

@jkbradley @davies could you review it? I modified the type conversion using SerDe.dumps.
We still need to benchmark the difference for larger k.

davies · 2015-11-07T04:01:47Z

python/pyspark/mllib/clustering.py

no space around =

davies · 2015-11-07T04:04:00Z

python/pyspark/mllib/clustering.py

we could still call it model

davies · 2015-11-07T04:05:32Z

LGTM, but a few minor comments.

yu-iskw · 2015-11-07T05:49:42Z

Jenkins, test this please

SparkQA · 2015-11-07T06:50:19Z

Test build #2002 has finished for PR 8643 at commit e91c65a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

yu-iskw · 2015-11-07T06:51:37Z

@davies thanks for the review. I fixed them.

davies · 2015-11-07T06:56:12Z

LGTM, merging this into master and 1.6 branch, thanks!

Could jkbradley and davies review it? - Create a wrapper class: `LDAModelWrapper` for `LDAModel`. Because we can't deal with the return value of`describeTopics` in Scala from pyspark directly. `Array[(Array[Int], Array[Double])]` is too complicated to convert it. - Add `loadLDAModel` in `PythonMLlibAPI`. Since `LDAModel` in Scala is an abstract class and we need to call `load` of `DistributedLDAModel`. [[SPARK-8467] Add LDAModel.describeTopics() in Python - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8467) Author: Yu ISHIKAWA <[email protected]> Closes #8643 from yu-iskw/SPARK-8467-2.

yu-iskw · 2015-11-07T06:58:28Z

Thank you for merging it and your great support!

jkbradley · 2015-11-09T18:30:35Z

@yu-iskw Thanks for this! Quick request: Could you please send a little follow-up PR to document (in the Python doc) what is being returned?

yu-iskw · 2015-11-09T20:02:57Z

@jkbradley sure!

yu-iskw · 2015-11-09T23:31:08Z

@jkbradley I send the PR at #9577.

yu-iskw force-pushed the SPARK-8467-2 branch from 97e78b6 to 2f70193 Compare October 22, 2015 22:14

yu-iskw added 4 commits November 2, 2015 09:40

[SPARK-8467][MLlib][PySpark] Add LDAModel.describeTopics() in Python

7237c36

Fix an validatino problem because of Scala collection's implicit type…

f10574e

… conversion

Modify describeTopics to take advantage of DataFrame serialization

6e3cf05

convert internal describe topics under LDAModelWrapper into a local D…

1f98ac8

…ataFrame

davies reviewed Nov 2, 2015
View reviewed changes

yu-iskw added 2 commits November 2, 2015 19:37

Revert "convert internal describe topics under LDAModelWrapper into a…

89cbd77

… local DataFrame" This reverts commit 1f98ac8.

Revert "Modify describeTopics to take advantage of DataFrame serial…

0bc114e

…ization" This reverts commit 6e3cf05.

yu-iskw force-pushed the SPARK-8467-2 branch from 353a6b0 to 0bc114e Compare November 3, 2015 04:43

davies reviewed Nov 3, 2015
View reviewed changes

Use SerDe.dumps() to serialize the return value of `LDAModel.descri…

56b69f9

…beTopics`

Fix a style waring and change the way to convert into Java object

81ee096

davies reviewed Nov 7, 2015
View reviewed changes

python/pyspark/mllib/clustering.py Outdated

Copy link

Contributor

davies Nov 7, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no space around =

davies reviewed Nov 7, 2015
View reviewed changes

python/pyspark/mllib/clustering.py Outdated

Copy link

Contributor

davies Nov 7, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could still call it model

yu-iskw added 2 commits November 6, 2015 20:09

Fix the variable names

8bed1d3

Fix the style warnings

e91c65a

asfgit closed this in 2ff0e79 Nov 7, 2015

[SPARK-8467][MLlib][PySpark] Add LDAModel.describeTopics() in Python #8643

[SPARK-8467][MLlib][PySpark] Add LDAModel.describeTopics() in Python #8643

Uh oh!

Conversation

yu-iskw commented Sep 7, 2015

Uh oh!

SparkQA commented Sep 7, 2015

Uh oh!

SparkQA commented Sep 7, 2015

Uh oh!

jkbradley commented Sep 11, 2015

Uh oh!

yu-iskw commented Sep 17, 2015

Uh oh!

jkbradley commented Sep 17, 2015

Uh oh!

yu-iskw commented Oct 22, 2015

Uh oh!

SparkQA commented Oct 23, 2015

Uh oh!

SparkQA commented Nov 2, 2015

Uh oh!

davies Nov 2, 2015

Choose a reason for hiding this comment

Uh oh!

yu-iskw Nov 2, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 3, 2015

Uh oh!

yu-iskw commented Nov 3, 2015

Uh oh!

davies Nov 3, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 6, 2015

Uh oh!

yu-iskw commented Nov 6, 2015

Uh oh!

shaneknapp commented Nov 6, 2015

Uh oh!

SparkQA commented Nov 6, 2015

Uh oh!

yu-iskw commented Nov 7, 2015

Uh oh!

davies Nov 7, 2015

Choose a reason for hiding this comment

Uh oh!

davies Nov 7, 2015

Choose a reason for hiding this comment

Uh oh!

davies commented Nov 7, 2015

Uh oh!

yu-iskw commented Nov 7, 2015

Uh oh!

SparkQA commented Nov 7, 2015

Uh oh!

yu-iskw commented Nov 7, 2015

Uh oh!

davies commented Nov 7, 2015

Uh oh!

yu-iskw commented Nov 7, 2015

Uh oh!

jkbradley commented Nov 9, 2015

Uh oh!

yu-iskw commented Nov 9, 2015

Uh oh!

yu-iskw commented Nov 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants