Skip to content

Conversation

@yu-iskw
Copy link
Contributor

@yu-iskw yu-iskw commented Sep 7, 2015

Could @jkbradley and @davies review it?

  • Create a wrapper class: LDAModelWrapper for LDAModel. Because we can't deal with the return value ofdescribeTopics in Scala from pyspark directly. Array[(Array[Int], Array[Double])] is too complicated to convert it.
  • Add loadLDAModel in PythonMLlibAPI. Since LDAModel in Scala is an abstract class and we need to call load of DistributedLDAModel.

[SPARK-8467] Add LDAModel.describeTopics() in Python - ASF JIRA

@SparkQA
Copy link

SparkQA commented Sep 7, 2015

Test build #42096 has finished for PR 8643 at commit f300798.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LDAModel(JavaModelWrapper, JavaSaveable, Loader):

@SparkQA
Copy link

SparkQA commented Sep 7, 2015

Test build #42099 has finished for PR 8643 at commit 97e78b6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LDAModel(JavaModelWrapper, JavaSaveable, Loader):

@jkbradley
Copy link
Member

@yu-iskw Rather than using Java Any types and the old serialization patterns, would it be easier to convert to a local DataFrame? We should be able to take advantage of DataFrame serialization.

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Sep 17, 2015

@jkbradley thank you for the comment. Just to be sure, LDAModelWrapper.describeTopics() should return a DataFrame and then extract the return value from the DataFrame in pyspark, right?
If so, I'm not sure about that. It looks a little strange for me to use APIs related to DataFrame under spark.mllib.

@davies what do you think?

@jkbradley
Copy link
Member

@yu-iskw I think it's OK to use DataFrame internally in spark.mllib. It already has the dependency, and it would be a private API.

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Oct 22, 2015

@jkbradley sorry for the delay of my update. I tried to use DataFrame serialization at yu-iskw@2f70193. Could you review it?

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44182 has finished for PR 8643 at commit 2f70193.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

@SparkQA
Copy link

SparkQA commented Nov 2, 2015

Test build #44799 has finished for PR 8643 at commit 353a6b0.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialize a DataFrame will trigger a Spark job, we could still use Pickle to serialize them without DataFrame, via PythomMLLibAPI.dumps()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies thanks for the comment. Should we rather PythonMLlibAPI.dmups() than Java Any types like below?
yu-iskw@e1c66d0#diff-71f42172be0b5fc14827b7bb31f4e80bR34

@SparkQA
Copy link

SparkQA commented Nov 3, 2015

Test build #44881 has finished for PR 8643 at commit 0bc114e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Nov 3, 2015

I reverted DataFrame serialization to Java Any types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this line could be Array[Any](terms, termWeights)

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #45250 has finished for PR 8643 at commit 56b69f9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Nov 6, 2015

Jenkins, test this please

@shaneknapp
Copy link
Contributor

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #45258 has finished for PR 8643 at commit 81ee096.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Nov 7, 2015

@jkbradley @davies could you review it? I modified the type conversion using SerDe.dumps.
We still need to benchmark the difference for larger k.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no space around =

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could still call it model

@davies
Copy link
Contributor

davies commented Nov 7, 2015

LGTM, but a few minor comments.

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Nov 7, 2015

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Nov 7, 2015

Test build #2002 has finished for PR 8643 at commit e91c65a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class LDAModel(JavaModelWrapper, JavaSaveable, Loader):\n

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Nov 7, 2015

@davies thanks for the review. I fixed them.

@davies
Copy link
Contributor

davies commented Nov 7, 2015

LGTM, merging this into master and 1.6 branch, thanks!

@asfgit asfgit closed this in 2ff0e79 Nov 7, 2015
asfgit pushed a commit that referenced this pull request Nov 7, 2015
Could jkbradley and davies review it?

- Create a wrapper class: `LDAModelWrapper` for `LDAModel`. Because we can't deal with the return value of`describeTopics` in Scala from pyspark directly. `Array[(Array[Int], Array[Double])]` is too complicated to convert it.
- Add `loadLDAModel` in `PythonMLlibAPI`. Since `LDAModel` in Scala is an abstract class and we need to call `load` of `DistributedLDAModel`.

[[SPARK-8467] Add LDAModel.describeTopics() in Python - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8467)

Author: Yu ISHIKAWA <[email protected]>

Closes #8643 from yu-iskw/SPARK-8467-2.
@yu-iskw
Copy link
Contributor Author

yu-iskw commented Nov 7, 2015

Thank you for merging it and your great support!

@jkbradley
Copy link
Member

@yu-iskw Thanks for this! Quick request: Could you please send a little follow-up PR to document (in the Python doc) what is being returned?

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Nov 9, 2015

@jkbradley sure!

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Nov 9, 2015

@jkbradley I send the PR at #9577.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants