[ML][MLLIB] SPARK-2426: Integrate Breeze QuadraticMinimizer with ALS #3221

debasish83 · 2014-11-12T08:20:15Z

ALS is a generic algorithm for matrix factorization which is equally applicable for both feature space and similarity space. Current ALS support L2 regularization and positivity constraint. This PR introduces userConstraint and productConstraint to ALS and let's the user select different constraints for user and product solves. The supported constraints are the following:

SMOOTH : default ALS with L2 regularization
ELASTIC NET: ALS with Elastic Net regularization
POSITIVE: ALS with positive factors
BOUNDS: ALS with factors bounded within upper and lower bound (default within 0 and 1)
EQUALITY: ALS with equality constraint (default the factors sum up to 1 and positive)

First let's focus on the problem formulation. Both implicit and explicit feedback ALS formulation can be written as a quadratic minimization problem. The quadratic objective can be written as xtHx + ctx. Each of the respective constraints take the following form (for example sparsity constraint)

minimize xtHx + ctx
s.t ||x||1 <= c (SPARSE constraint)

We rewrite the objective as f(x) = xtHx + ctx and the constraint as an indicator function g(x)
Now minimization of f(x) + g(x) can be carried out using various forward backward splitting algorithms. We choose ADMM for this PR.

For use-cases the PR is focused on the following:

Sparse matrix factorization
Example run:
MASTER=spark://localhost:7077 ./bin/run-example mllib.MovieLensALS --rank 20 --numIterations 10 --userConstraint SMOOTH --lambdaUser 0.065 --productConstraint SPARSE --lambdaProduct 0.1 --kryo hdfs://localhost:8020/sandbox/movielens/
Topic modeling using LSA
References:
2007 Sparse coding: papers.nips.cc/paper/2979-efficient-sparse-coding-algorithms.pdf
2011 Sparse Latent Semantic Analysis LSA(some of it is implemented in Graphlab):
https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
2012 Sparse Coding + MR/MPI Microsoft: http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf
Implementing the 20NG flow to validate the sparse coding result improvement over LDA based topic modeling.
Topic modeling using PLSA
Reference:
Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization
The EQUALITY formulation with a Quadratic loss is an approximation to the KL divergence loss being used in PLSA. We are interested to see if it improves the result further as compared to the Sparse coding.

Next steps:

Move QuadraticMinimizerSuite.scala to breeze
Scale the factorization rank and remove the need to construct H matrix
Replace the quadratic loss xtHx + ctx with a Convex loss

Detailed experiments are on the JIRA https://issues.apache.org/jira/browse/SPARK-2426

…iling

…nds, Qp with smoothness, Qp with L1

…ing for distributed runs; rho=50 for equality constraint, default rho=1.0, alpha = 1.0 (no over-relaxation) for convergence study

…cm/bda/spark into qp-als

…zer;Elastic net formulation in ALS, elastic net parameter not exposed to users

…ormulations

…pute map measure along with rmse

…tric for movielens dataset

… BoundedPriorityQueue similar to RDD.top

debasish83 · 2015-03-23T06:10:20Z

I looked more into it and I will open up an API in Breeze QuadraticMinimizer where in-place of DenseMatrix gram, upper triangular gram can be sent but the inner workspace has to be n x n because for Cholesky we need to compute LL' and for Quasi Definite System we have to compute LDL' / LU and both of them need n x n space...so I won't be able to decrease the QuadraticMinimizer workspace size...for dposv BLAS allocates memory for LL' and it is not visible to user...

SparkQA · 2015-03-23T19:27:48Z

Test build #29023 has finished for PR 3221 at commit 196d8c8.

This patch fails RAT tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
- class PowerMethod(maxIters: Int = 10,tolerance: Double = 1E-5) extends SerializableLogging
- trait Proximal
- case class ProjectIdentity() extends Proximal
- case class ProjectProbabilitySimplex(s: Double) extends Proximal
- case class ProjectL1(s: Double) extends Proximal
- case class ProjectBox(l: DenseVector[Double], u: DenseVector[Double]) extends Proximal
- case class ProjectPos() extends Proximal
- case class ProjectSoc() extends Proximal
- case class ProjectEquality(Aeq: DenseMatrix[Double], beq: DenseVector[Double]) extends Proximal
- case class ProjectHyperPlane(a: DenseVector[Double], b: Double) extends Proximal
- case class ProximalL1(var lambda: Double = 1.0) extends Proximal
- case class ProximalL2() extends Proximal
- case class ProximalSumSquare() extends Proximal
- case class ProximalLogBarrier() extends Proximal
- case class ProximalHuber() extends Proximal
- case class ProximalLinear(c: DenseVector[Double]) extends Proximal
- case class ProximalLp(c: DenseVector[Double]) extends Proximal
- class QuadraticMinimizer(nGram: Int,

debasish83 · 2015-03-23T19:36:36Z

@mengxr I added the optimization for lower triangular matrix and now they are very close...Let me know what do you think and if there are any other tricks you would like me to try...Note that with these optimization, QuadraticMinimizer with POSITIVE constraint will also run much faster

Breeze QuadraticMinimizer (default):

unset solver; ./bin/spark-submit --master spark://tusca09lmlvt00c.uswin.ad.vzwcorp.com:7077 --class org.apache.spark.examples.mllib.MovieLensALS --jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar --total-executor-cores 1 ./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50 --numIterations 2 ~/datasets/ml-1m/ratings.dat

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint SMOOTH productConstraint SMOOTH
Running Breeze QuadraticMinimizer for users with constraint SMOOTH
Running Breeze QuadraticMinimizer for items with constraint SMOOTH
Test RMSE = 2.4985081126233846.

15/03/23 12:26:55 INFO ALS: solveTime 205.379 ms
15/03/23 12:26:55 INFO ALS: solveTime 72.116 ms
15/03/23 12:26:56 INFO ALS: solveTime 74.034 ms
15/03/23 12:26:56 INFO ALS: solveTime 77.379 ms
15/03/23 12:26:57 INFO ALS: solveTime 36.532 ms
15/03/23 12:26:57 INFO ALS: solveTime 29.775 ms
15/03/23 12:26:58 INFO ALS: solveTime 48.925 ms
15/03/23 12:26:58 INFO ALS: solveTime 51.904 ms
15/03/23 12:26:59 INFO ALS: solveTime 30.882 ms
15/03/23 12:26:59 INFO ALS: solveTime 30.658 ms

ML CholeskySolver:

export solver=mllib; ./bin/spark-submit --master spark://tusca09lmlvt00c.uswin.ad.vzwcorp.com:7077 --class org.apache.spark.examples.mllib.MovieLensALS --jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar --total-executor-cores 1 ./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50 --numIterations 2 ~/datasets/ml-1m/ratings.dat

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint SMOOTH productConstraint SMOOTH
Test RMSE = 2.4985081126233846.

TUSCA09LMLVT00C:spark-qp-als v606014$ grep solveTime ./work/app-20150323122612-0002/0/stderr
15/03/23 12:26:20 INFO ALS: solveTime 102.243 ms
15/03/23 12:26:21 INFO ALS: solveTime 38.195 ms
15/03/23 12:26:21 INFO ALS: solveTime 60.583 ms
15/03/23 12:26:22 INFO ALS: solveTime 59.882 ms
15/03/23 12:26:22 INFO ALS: solveTime 36.59 ms
15/03/23 12:26:23 INFO ALS: solveTime 36.021 ms
15/03/23 12:26:23 INFO ALS: solveTime 59.271 ms
15/03/23 12:26:24 INFO ALS: solveTime 59.217 ms
15/03/23 12:26:24 INFO ALS: solveTime 36.344 ms
15/03/23 12:26:25 INFO ALS: solveTime 35.838 ms

I am running only 2 iterations but you can see in the tail the solvers run at par...

debasish83 · 2015-03-24T05:49:54Z

All the runtime enhancements are being added to Breeze in this PR: scalanlp/breeze#386
Please let me know if there are additional feedbacks. Except the first 2 solves, rest all are pretty much same...I looked into memory allocation and Breeze matrix/vector are backed by new Array[Double] as well similar to MLlib version...

debasish83 · 2015-03-25T03:36:56Z

@mengxr I discussed with David and the only reason I can think of is that inside the solvers I am using DenseMatrix and DenseVector in-place of primitive arrays for workspace creation....that might be causing the first iteration runtime difference due to loading up the interface classes and other features that comes with DenseMatrix and DenseVector...I can move to primitive arrays for the workspace but then the code will look ugly...Let me know if I should ? I am surprised that this issue does not show up after the first call !

debasish83 · 2015-03-27T14:06:46Z

@mengxr any updates on it ? breeze 0.11.2 is now integrated with Spark...I can clean up the PR for reviews

…arse and simplex constraints

debasish83 · 2015-03-28T05:39:14Z

I integrated with Breeze 0.11.2. Only visible difference is first iteration

Breeze QuadraticMinimizer:

TUSCA09LMLVT00C:spark-qp-als v606014$ grep solveTime ./work/app-20150327221722-0000/0/stderr
15/03/27 22:17:32 INFO ALS: solveTime 234.153 ms
15/03/27 22:17:32 INFO ALS: solveTime 82.499 ms
15/03/27 22:17:33 INFO ALS: solveTime 83.579 ms
15/03/27 22:17:33 INFO ALS: solveTime 83.039 ms
15/03/27 22:17:34 INFO ALS: solveTime 35.545 ms
15/03/27 22:17:34 INFO ALS: solveTime 30.707 ms
15/03/27 22:17:35 INFO ALS: solveTime 53.025 ms
15/03/27 22:17:36 INFO ALS: solveTime 53.021 ms
15/03/27 22:17:36 INFO ALS: solveTime 31.329 ms
15/03/27 22:17:37 INFO ALS: solveTime 32.136 ms

mllib CholeskySolver:

TUSCA09LMLVT00C:spark-qp-als v606014$ grep solveTime ./work/app-20150327221/0/stderr
app-20150327221722-0000/ app-20150327221803-0001/
TUSCA09LMLVT00C:spark-qp-als v606014$ grep solveTime ./work/app-20150327221803-0001/0/stderr
15/03/27 22:18:11 INFO ALS: solveTime 98.692 ms
15/03/27 22:18:12 INFO ALS: solveTime 38.997 ms
15/03/27 22:18:12 INFO ALS: solveTime 62.361 ms
15/03/27 22:18:13 INFO ALS: solveTime 60.316 ms
15/03/27 22:18:13 INFO ALS: solveTime 36.569 ms
15/03/27 22:18:14 INFO ALS: solveTime 36.321 ms
15/03/27 22:18:14 INFO ALS: solveTime 60.007 ms
15/03/27 22:18:15 INFO ALS: solveTime 59.771 ms
15/03/27 22:18:15 INFO ALS: solveTime 36.519 ms
15/03/27 22:18:16 INFO ALS: solveTime 38.295 ms

Visible difference is in first 2 iterations as showed in previous experiments as well. I fixed the random seed test now and so different runs will not produce the same result.

I need this structure to build ALM as ALM extends mllib.ALS and adds LossType in constructor along with userConstraint and itemConstraint...

Right now I am experimenting with LeastSquare (for validation with ALS) and then I will start experimenting with LogLikelihood loss...I am keen to run very large document, word, count datasets through ALM once it is ready...

For this PR I have updated MovieLensALS with userConstraint and itemConstraint and I am considering if we should add a Sparse Coding formulation in examples now or we bring that in a separate PR ?

I have not cleaned up CholeskySolver from ALS yet and waiting for the feedbacks but I have added test-cases in ml.ALSSuite for all the constraints....At ALS flow level I need to construct more test-cases and I can bring them in separate PR as well...

SparkQA · 2015-03-28T05:55:24Z

Test build #29340 has finished for PR 3221 at commit 1848181.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-03-28T18:54:53Z

Test build #29353 has finished for PR 3221 at commit 33b5a97.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

debasish83 · 2015-03-29T02:27:17Z

What are MiMa tests ? I am bit confused on it...From the logs look like ALS class now take userConstraint and productConstraint which is Enumeration and that caused it...Enumeration.value is not allowed in class parameters ?

debasish83 · 2015-04-07T22:54:34Z

@mengxr @jkbradley In my internal testing, I am finding the sparse formulations useful for extracting genre/topic information out of netflix/movielens dataset...I did not get any improvement on MAP / RMSE (which was expected from other papers)...The sparse formulations are:

Sparse coding: L2 on users/words, L1 on documents/movies
L2 on users/words, probability simplex on documents/movies

The reference:
2011 Sparse Latent Semantic Analysis LSA(some of it is implemented in Graphlab):
https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
showed sparse coding producing better result than LDA...

I am considering if it makes sense to add a 20 newsgroup flow in examples that was shown in the paper to show the value of adding sparsity implemented in ALS ? Also do we have perplexity implemented so that we can start comparing topic models.....The ALS runtime with sparse formulations are also pretty good....

jkbradley · 2015-04-08T23:41:52Z

@debasish83 Including sparsity for both recommendation and for topic modeling will be useful, to be sure. We don't have perplexity (or even prediction) implemented yet, but that definitely needs to be done for LDA. New JIRA: [https://issues.apache.org/jira/browse/SPARK-6793]

debasish83 · 2015-04-10T02:41:54Z

can we use RankingMetric to see the prediction on a document dataset ? LDA vs Sparse Coding in this PR for example....actually we can that on 20 NG or some other known dataset...let me know which one I should try...perplexity is a bit different measure than MAP for example...

jkbradley · 2015-04-10T04:19:54Z

There isn't a clear answer about which metric is best, and none correspond that well with human perception. Most of the literature seems to think perplexity is better than log likelihood. If you can get labeled data, then using predictions would be reasonable, but even then, it's hard to say what the best metric is. For 20 newsgroups, maybe we could compare intra- and inter- newsgroup similarity based on topic distributions?

debasish83 · 2015-04-10T04:25:07Z

In this paper https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf Xi compared classifier accuracy on the extracted features from spare coding and LDA...we can also do that...Ranking might be a good idea as well...loglikelihood is tricky since without sparsity constraint and only positivity you will for sure make loglikelihood higher than someone who is adding sparsity !

jkbradley · 2015-04-10T04:29:04Z

Good point about log likelihood not being very meaningful. Perhaps the same will be true about perplexity?

Using the topics as features sounds like a good idea. If we want to argue that topics are meaningful, comparing within and between newsgroups might be best. That covers the 2 main use cases for LDA I've heard of.

debasish83 · 2015-04-10T04:31:01Z

Cool...let me do that and add a 20NG flow...

debasish83 · 2015-04-10T04:35:19Z

Just out of curiosity, what's the second use case ? Actually topics as feature is good but most datasets where someone will run stuff are unlabeled...so we have to just pick one of these ranking/perplexity...I am comparing if loglikelihood with pos vs loglikelihood with sparsity improves ranking metric in 0-1 recommendation as compared to ALS implicit feedback but I am not sure if it is possible to use ranking metric in any unlabeled dataset.....but then ranking does not care about underlying topic goodness...so may be we need the feature extractor flow and also perplexity

jkbradley · 2015-04-10T05:22:51Z

By "2nd use case," I meant trying to recover topics which correspond to human perceptions. Since the 20 newsgroup dataset is nicely divided into 20 human-perceived groups, it seems reasonable to expect that:

For documents within a newsgroup, topic distributions should be similar
For documents across newsgroups, topic distributions should be different

It's just a heuristic, but at least it's based on human-defined groupings.

debasish83 · 2015-04-10T22:44:28Z

@jkbradley we still could not access the wikipedia dataset on ec2...will it be possible for you to upload the 1 Billion token dataset on EC2 ? I wanted to do a sparse coding scalability run on the large dataset as well...

debasish83 · 2015-04-10T22:45:24Z

@jkbradley let me know if you need vzcloud access and I can create few nodes for you...ec2 might be easier for other's to access it as well...

jkbradley · 2015-04-11T14:30:24Z

@debasish83 Just to make sure, you're specifying it as a requestor-pays bucket, right? [http://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPaysBuckets.html]

debasish83 · 2015-04-11T15:21:56Z

ohh sorry I don't know about requester pays...let me look into it

andrewor14 · 2015-12-15T20:18:32Z

@mengxr @jkbradley is this still relevant given the recent changes in ALS?

rxin · 2015-12-31T02:39:47Z

I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks!

Debasish Das added 30 commits June 12, 2014 20:24

jecos integrated as the default qpsolve in ALS; implicit tests are fa…

0b3f053

…iling

Qp options added to Spark ALS: unbounded Qp, Qp with pos, Qp with bou…

8ba4871

…nds, Qp with smoothness, Qp with L1

Prepared branch of ALS-QP feature/runtime testing

dd912db

QpProblem drivers in MovieLensALS

6dd320b

debug option for octave quadprog validation

48023c8

L1 option added to ALS; Driver added to MovieLensALS

e7e64b7

Qp with equality and bounds added to option 4 of ECOS based QpSolver

84f1d67

Movielens runtime experiments for Spark Summit talk

90bca10

ADMM based QuadraticMinimizer in mllib.optimization;Used in ALS

4e2c623

Refactored to use com.github.ecos package

3f93ee5

moved interior point based qp-als to feature/ipmqp-als branch; prepar…

f288846

…ing for distributed runs; rho=50 for equality constraint, default rho=1.0, alpha = 1.0 (no over-relaxation) for convergence study

license cleanup; Copyright added to NOTICE

21d7990

Merge with HEAD

13cb89b

BSD license for Proximal algorithms

a12d92a

LICENSE and NOTICE updates as per Legal

02199a8

Merge with master

f43ed66

Merge branch 'feature/qp-als' of https://istg.vzvisp.com:8443/stash/s…

c03dbed

…cm/bda/spark into qp-als

Redesign of ALS API; userConstraint and productConstraint separated;

c9d1fbf

delimiter added to MovieLensALS example;rho tuning in QuadraticMinimi…

f2cab3e

…zer;Elastic net formulation in ALS, elastic net parameter not exposed to users

rho as sqrt(eigenMin*eigenMax) for sparse, sqrt(eigenMax) for other f…

b2c9dac

…ormulations

NNLS bug2

c01f3e3

removed ecos dependency from mllib pom

a941207

validate user/product on MovieLens dataset through user input and com…

9b3951f

…pute map measure along with rmse

merged with AbstractParams serialization bug

cd3ab31

comments fixed as per scalastyle

4bbae0f

import scala.math.round

9fa063e

provide ratio for topN product validation; generate MAP and prec@k me…

10cbb37

…tric for movielens dataset

use sampleByKey for per user sampling

f38a1b5

recommendAll API to MatrixFactorizationModel, uses topK finding using…

d144f57

… BoundedPriorityQueue similar to RDD.top

Updated qp-als with irmetrics for experiments

1e7e36e

merged with master; added memory optimization for upper triangular gram

196d8c8

merged with breeze-0.11.2; added testcases for positivity, bounds, sp…

1848181

…arse and simplex constraints

LICENSE and NOTICE cleaned as the code moved to Breeze

33b5a97

asfgit closed this in 7b4452b Dec 31, 2015

[ML][MLLIB] SPARK-2426: Integrate Breeze QuadraticMinimizer with ALS #3221

[ML][MLLIB] SPARK-2426: Integrate Breeze QuadraticMinimizer with ALS #3221

Uh oh!

Conversation

debasish83 commented Nov 12, 2014

Uh oh!

debasish83 commented Mar 23, 2015

Uh oh!

SparkQA commented Mar 23, 2015

Uh oh!

debasish83 commented Mar 23, 2015

Uh oh!

debasish83 commented Mar 24, 2015

Uh oh!

debasish83 commented Mar 25, 2015

Uh oh!

debasish83 commented Mar 27, 2015

Uh oh!

debasish83 commented Mar 28, 2015

Uh oh!

SparkQA commented Mar 28, 2015

Uh oh!

SparkQA commented Mar 28, 2015

Uh oh!

debasish83 commented Mar 29, 2015

Uh oh!

debasish83 commented Apr 7, 2015

Uh oh!

jkbradley commented Apr 8, 2015

Uh oh!

debasish83 commented Apr 10, 2015

Uh oh!

jkbradley commented Apr 10, 2015

Uh oh!

debasish83 commented Apr 10, 2015

Uh oh!

jkbradley commented Apr 10, 2015

Uh oh!

debasish83 commented Apr 10, 2015

Uh oh!

debasish83 commented Apr 10, 2015

Uh oh!

jkbradley commented Apr 10, 2015

Uh oh!

debasish83 commented Apr 10, 2015

Uh oh!

debasish83 commented Apr 10, 2015

Uh oh!

jkbradley commented Apr 11, 2015

Uh oh!

debasish83 commented Apr 11, 2015

Uh oh!

andrewor14 commented Dec 15, 2015

Uh oh!

rxin commented Dec 31, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants