Skip to content

Conversation

@feynmanliang
Copy link
Contributor

  1. Add “asymmetricDocConcentration” and revert docConcentration changes. If the (internal) doc concentration vector is a single value, “getDocConcentration" returns it. If it is a constant vector, getDocConcentration returns the first item, and fails otherwise.
  2. Give LDAModel.gammaShape a default value in LDAModel concrete class constructors.

@jkbradley

@jkbradley
Copy link
Member

reviewing now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy doc here:

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@jkbradley
Copy link
Member

Note (not requiring changes): I had meant for the default gammaShape to be defined in the abstract LDAModel, in order to improve (but not fix) binary incompatibility issues for anyone who extended LDAModel. However, I just noticed that LDAModel has a private constructor, so we don't need to worry about binary incompatibility. So feel free to arrange gammaShape however you like. I think the current setup is fine.

@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #1422 has finished for PR 8077 at commit 9d6a71e.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

This will need an update to PySpark as well.

@jkbradley
Copy link
Member

LGTM pending tests

@jkbradley
Copy link
Member

(Are you updating PySpark in a separate PR?)

@feynmanliang
Copy link
Contributor Author

@jkbradley This PR doesn't need a PySpark update. Eventually, PythonMLLibAPI.scala will need to be updated to support asymmetric priors, hyperparam optimizations, and the predict methods. However, I think it might be better to hold off until the EM optimizer and Online optimizer have a consistent API.

@jkbradley
Copy link
Member

True, it's outside this PR. OK we'll see if tests will ever pass

@jkbradley
Copy link
Member

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40494 has finished for PR 8077 at commit 6b07bc8.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@feynmanliang
Copy link
Contributor Author

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40496 has finished for PR 8077 at commit 6b07bc8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Merging with master and branch-1.5

asfgit pushed a commit that referenced this pull request Aug 11, 2015
1. Add “asymmetricDocConcentration” and revert docConcentration changes. If the (internal) doc concentration vector is a single value, “getDocConcentration" returns it. If it is a constant vector, getDocConcentration returns the first item, and fails otherwise.
2. Give `LDAModel.gammaShape` a default value in `LDAModel` concrete class constructors.

jkbradley

Author: Feynman Liang <[email protected]>

Closes #8077 from feynmanliang/SPARK-9788 and squashes the following commits:

6b07bc8 [Feynman Liang] Code review changes
9d6a71e [Feynman Liang] Add asymmetricAlpha alias
bf4e685 [Feynman Liang] Asymmetric docConcentration
4cab972 [Feynman Liang] Default gammaShape

(cherry picked from commit be3e271)
Signed-off-by: Joseph K. Bradley <[email protected]>
@asfgit asfgit closed this in be3e271 Aug 11, 2015
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
1. Add “asymmetricDocConcentration” and revert docConcentration changes. If the (internal) doc concentration vector is a single value, “getDocConcentration" returns it. If it is a constant vector, getDocConcentration returns the first item, and fails otherwise.
2. Give `LDAModel.gammaShape` a default value in `LDAModel` concrete class constructors.

jkbradley

Author: Feynman Liang <[email protected]>

Closes apache#8077 from feynmanliang/SPARK-9788 and squashes the following commits:

6b07bc8 [Feynman Liang] Code review changes
9d6a71e [Feynman Liang] Add asymmetricAlpha alias
bf4e685 [Feynman Liang] Asymmetric docConcentration
4cab972 [Feynman Liang] Default gammaShape
@feynmanliang feynmanliang deleted the SPARK-9788 branch August 17, 2015 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants