-
Notifications
You must be signed in to change notification settings - Fork 28.9k
SPARK-1668: Add implicit preference as an option to examples/MovieLensALS #597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sALS Add --implicitPrefs as an command-line option to the example app MovieLensALS under examples/
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
@techaddict Thanks for working on this JIRA. You also need to change the evaluation code. Implicit ALS predicts 0/1 instead of the original rating. So you need some mapping before computing RMSE. |
|
Mapping rating in case of ImplicitPref to |
|
It is true that implicit prefs predict 0/1 (ie a "preference" matrix rather than a "rating" matrix), but the ratings are taken as confidence levels indicating preference (or in the case of negative ratings, lack of preference). So already there is an implicit mapping of 1 if r > 0, 0 if r == 0, with the actual rating being a confidence value in the case of r > 0. So keeping ratings input as is, is a reasonable approach. Even better would be to map low ratings to zero or perhaps even negative scores, as a low rating would indicate a lack of preference certainly. |
|
On this note, recall there was a change a while back to handle the case of negative confidence levels. 0 still means "don't know" and positive values mean "confident that the prediction should be 1". Negative values means "confident that the prediction should be 0". I have in this case used some kind of weighted RMSE. The weight is the absolute value of the confidence. The error is the difference between prediction and either 1 or 0, depending on whether r is positive or negative. |
|
MovieLens ratings are on a scale of 1-5: So we should not recommend a movie if the predicted rating is less than For evaluation, the mapping should be |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Can I make a tiny suggestion to map from ratings to weights with something like "rating - 2.5" instead of "rating - 3"? So that 3 becomes a small positive value like 0.5? There is an argument that even neutral ratings are weak positive interactions; to have even consumed the item to be able to rate it means you had an interest. But more than that, the semantics of 0 in this expanded world of non-positive weights are "the same as never having interacted at all" -- which doesn't quite fit. I don't know if the intermediate sparse representations do this internally, at the moment, but it's possible that 0 values are ignored when constructing the sparse representation, because the 0s are implicit. This would be a problem, at least, a theoretical one. |
|
+1 on @srowen 's suggestion. |
|
Merged build triggered. |
|
Merged build started. |
|
@techaddict For training, we should keep the |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
@mengxr i'm bit confused. def computeRmse(model: MatrixFactorizationModel, data: RDD[Rating], n: Long) = {
val predictions: RDD[Rating] = model.predict(data.map(x => (x.user, x.product)))
val predictionsAndRatings = predictions.map(x => ((x.user, x.product), (x.rating + 2.5) / 5.0))
.join(data.map(x => ((x.user, x.product), x.rating)))
.values
math.sqrt(predictionsAndRatings.map(x => (x._1 - x._2) * (x._1 - x._2)).mean())
} |
|
|
|
Merged build triggered. |
|
Merged build started. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change it to the following:
if (implicitPrefs) math.max(math.min(r, 1.0), 0.0) else r
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
@mengxr done |
|
LGTM. Thanks! |
|
Merged. Thanks! |
…sALS Add --implicitPrefs as an command-line option to the example app MovieLensALS under examples/ Author: Sandeep <[email protected]> Closes #597 from techaddict/SPARK-1668 and squashes the following commits: 8b371dc [Sandeep] Second Pass on reviews by mengxr eca9d37 [Sandeep] based on mengxr's suggestions 937e54c [Sandeep] Changes 5149d40 [Sandeep] Changes based on review 1dd7657 [Sandeep] use mean() 42444d7 [Sandeep] Based on Suggestions by mengxr e3082fa [Sandeep] SPARK-1668: Add implicit preference as an option to examples/MovieLensALS Add --implicitPrefs as an command-line option to the example app MovieLensALS under examples/ (cherry picked from commit 108c4c1) Signed-off-by: Reynold Xin <[email protected]>
|
Just a question on the result. Here, 0.57 is the error we will make when we predict 0/1, but is that too much ? In the paper on which the implicit ALS is based on, we see that it used expected percentile rank. Thank you. =) |
|
Simple RMSE is not a great metric for this model, because it treats all errors equally when the model itself does not at all. 1s are much more important than 0s. The predictions are not rating-like. See my comment above. I usually try to look at metrics that measure how good the top of the ranking is, since this is far more like what the user experiences. MAP or something like area under the curve are about as good as you can hope for, but still somewhat flawed. It's hard to eval recommenders since you have such incomplete information on what the "right" or "relevant" items are. |
|
I have recently tested expected percentile rank(EPR) evaluation method proposed in the paper on the Movielens data set and a real world data set. However, I got a expected rank about 50% in both set, according to the paper, that means implicit ALS actually does not predict anything. I am not sure if any evaluation has been done like this. How can we make sure that implicit ALS is implemented correctly in MLlib without checking code? |
|
The results depend a whole lot on the choice of parameters. Did you try some degree of search for the best lambda / # features? it's quite possible to make a model that can't predict anything. I have generally found ALS works fine on the Movielens data set. |
|
I have tried different lamdba and # features. But nothing has changed. To be clear, initially, the Movielens dataset it is divided into training set(80%) and test set(20%). The ratings are re-interpreted as |
|
You mentioned trying lots of values but what did you try? What about other test metrics -- to rule out some problem in the evaluation? Maybe you can share some of how you ran the test in a gist. |
|
Here is the values I have tried: seed is set to 42 in & out means in sample (training set) out-of-sample (test set) #factor = 12, lamda = 1, alpha = 1#factor = 50, alpha = 1, iter = 30I have not tried other metrics, as said before, RMSE is not that good. I listed some code snippets here. There are 2 evaluation methods and the main |
|
Ok, I have found the error in my metric. This line is for creating a item-factor matrix, the problem is that item factors are not ordered by item id when collecting them, which leads to a wrong matrix, that's y the result is non sense. Adding a sortBy(_._1), like give a EPR like 9%(in sample), 10%(out of sample) Implicit ALS works. Thanks. |
…sALS Add --implicitPrefs as an command-line option to the example app MovieLensALS under examples/ Author: Sandeep <[email protected]> Closes apache#597 from techaddict/SPARK-1668 and squashes the following commits: 8b371dc [Sandeep] Second Pass on reviews by mengxr eca9d37 [Sandeep] based on mengxr's suggestions 937e54c [Sandeep] Changes 5149d40 [Sandeep] Changes based on review 1dd7657 [Sandeep] use mean() 42444d7 [Sandeep] Based on Suggestions by mengxr e3082fa [Sandeep] SPARK-1668: Add implicit preference as an option to examples/MovieLensALS Add --implicitPrefs as an command-line option to the example app MovieLensALS under examples/
…pache#597) * Avoids adding duplicated secret volumes when init-container is used Cherry-picked from apache#20148. * Added the missing commit from upstream
we use project_domain_name instead of project_domain_id for citynetwork provider
Add --implicitPrefs as an command-line option to the example app MovieLensALS under examples/