Commit 2bd02e2
[SPARK-28866][ML] Persist item factors RDD when checkpointing in ALS
### What changes were proposed in this pull request?
In ALS ML implementation, for non-implicit case, we checkpoint the RDD of item factors, between intervals. Before checkpointing (.checkpoint()) and materializing (.count()) RDD, this RDD was not persisted. It causes recomputation. In an experiment, there is performance difference between persisting and no persisting before checkpointing the RDD.
The performance difference is not big, but this change is not big too. The actual performance difference varies depending the interval of checkpoint, training dataset, etc.
### Why are the changes needed?
Persisting the RDD before checkpointing the RDD of item factors can avoid recomputation.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Manual check RDD recomputation or not.
Taking 30% MovieLens 20M Dataset as training dataset. Setting checkpoint dir for SparkContext. Fitting an ALS model like:
```scala
val als = new ALS()
.setMaxIter(100)
.setCheckpointInterval(5)
.setRegParam(0.01)
.setUserCol("userId")
.setItemCol("movieId")
.setRatingCol("rating")
val t0 = System.currentTimeMillis()
val model = als.fit(training)
val t1 = System.currentTimeMillis()
```
Before this patch: 65.386 s
After this patch: 61.022 s
Closes apache#25576 from viirya/persist-item-factors.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>1 parent 8279693 commit 2bd02e2
File tree
1 file changed
+6
-1
lines changed- mllib/src/main/scala/org/apache/spark/ml/recommendation
1 file changed
+6
-1
lines changedLines changed: 6 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
990 | 990 | | |
991 | 991 | | |
992 | 992 | | |
| 993 | + | |
993 | 994 | | |
994 | 995 | | |
995 | 996 | | |
996 | 997 | | |
| 998 | + | |
997 | 999 | | |
998 | 1000 | | |
999 | 1001 | | |
1000 | 1002 | | |
1001 | 1003 | | |
| 1004 | + | |
| 1005 | + | |
1002 | 1006 | | |
| 1007 | + | |
1003 | 1008 | | |
1004 | 1009 | | |
1005 | 1010 | | |
| |||
1029 | 1034 | | |
1030 | 1035 | | |
1031 | 1036 | | |
1032 | | - | |
1033 | 1037 | | |
| 1038 | + | |
1034 | 1039 | | |
1035 | 1040 | | |
1036 | 1041 | | |
| |||
0 commit comments