[SPARK-21535][ML]Reduce memory requirement for CrossValidator and TrainValidationSplit #18733

hhbyyh · 2017-07-25T22:00:19Z

What changes were proposed in this pull request?

CrossValidator and TrainValidationSplit both use
models = est.fit(trainingDataset, epm)
to fit the models, where epm is Array[ParamMap].
Even though the training process is sequential, current implementation consumes extra driver memory for holding the trained models, which is not necessary and often leads to memory exception for both CrossValidator and TrainValidationSplit. My proposal is to optimize the training implementation, thus that used model can be collected by GC, to avoid the unnecessary OOM exceptions.

E.g. when grid search space is 12, old implementation needs to hold all 12 trained models in the driver memory at the same time, while the new implementation only needs to hold 1 trained model at a time, and previous model can be cleared by GC.

How was this patch tested?

Existing unit test since there's no change to logic.

I've manually tested and the new implementation can allow CrossValidator and TrainValidationSplit to train on much larger models with the same max-heap-memory.

SparkQA · 2017-07-25T22:59:00Z

Test build #79944 has finished for PR 18733 at commit a7667e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123

Currently we have this PR #16774 . Maybe we should pending on it merged first. Because after applying parallelism support, the code is different.

hhbyyh · 2017-08-02T00:54:23Z

Nothing of this change depends on #16774.

The basic idea is that we should release the driver memory as soon as a trained model is evaluated. I don't see there's any conflict.

hhbyyh · 2017-08-02T01:23:57Z

Features should be merged when they are reasonable and ready, but not waiting on uncertain changes especially when there's no conflicts.

MLnick · 2017-08-03T11:55:04Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

        metrics(i) += metric
        i += 1
      }
+      trainingDataset.unpersist()


One consideration here is that we're unpersisting the training data only after all models (for a fold) are evaluated. This means the full dataset (train and validation) is in cluster memory throughout, whereas previously only one dataset would be in cluster memory at a time. It's possible the impact of this on resources may be a greater than the saving on the driver from storing 1 instead of numModels models temporarily per fold?

It obviously depends on a lot of factors (dataset size, cluster resources, driver memory, model size, etc).

Ah you're right. I was under the wrong impression that validationDataset is always in the memory.

Even though the size of validationDataset is 1/kfold of the trainingDataset's and it's only used in the transform but not the fit process, I still cannot prove that the new implementation is better in all circumstances.

I'll close the PR unless there's a better way to resolve the concern. Thanks.

memory optimization

a7667e7

WeichenXu123 reviewed Aug 2, 2017

View reviewed changes

WeichenXu123 mentioned this pull request Aug 2, 2017

[SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala #18313

Closed

hhbyyh mentioned this pull request Aug 2, 2017

[SPARK-19357][ML] Adding parallel model evaluation in ML tuning #16774

Closed

MLnick reviewed Aug 3, 2017

View reviewed changes

hhbyyh closed this Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-21535][ML]Reduce memory requirement for CrossValidator and TrainValidationSplit #18733

[SPARK-21535][ML]Reduce memory requirement for CrossValidator and TrainValidationSplit #18733

Uh oh!

hhbyyh commented Jul 25, 2017 •

edited

Loading

Uh oh!

SparkQA commented Jul 25, 2017

Uh oh!

WeichenXu123 left a comment •

edited

Loading

Uh oh!

hhbyyh commented Aug 2, 2017

Uh oh!

hhbyyh commented Aug 2, 2017 •

edited

Loading

Uh oh!

MLnick Aug 3, 2017

Uh oh!

hhbyyh Aug 3, 2017 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-21535][ML]Reduce memory requirement for CrossValidator and TrainValidationSplit #18733

[SPARK-21535][ML]Reduce memory requirement for CrossValidator and TrainValidationSplit #18733

Uh oh!

Conversation

hhbyyh commented Jul 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jul 25, 2017

Uh oh!

WeichenXu123 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hhbyyh commented Aug 2, 2017

Uh oh!

hhbyyh commented Aug 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MLnick Aug 3, 2017

Choose a reason for hiding this comment

Uh oh!

hhbyyh Aug 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hhbyyh commented Jul 25, 2017 •

edited

Loading

WeichenXu123 left a comment •

edited

Loading

hhbyyh commented Aug 2, 2017 •

edited

Loading

hhbyyh Aug 3, 2017 •

edited

Loading