[SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce test time of LogisticRegressionSuite #22659

shahidki31 · 2018-10-06T17:20:22Z

...with intercept with L1 regularization

What changes were proposed in this pull request?

In the test, "multinomial logistic regression with intercept with L1 regularization" in the "LogisticRegressionSuite", taking more than a minute due to training of 2 logistic regression model.
However after analysing the training cost over iteration, we can reduce the computation time by 50%.
Training cost vs iteration for model 1

So, model1 is converging after iteration 150.

Training cost vs iteration for model 2

After around 100 iteration, model2 is converging.
So, if we give maximum iteration for model1 and model2 as 175 and 125 respectively, we can reduce the computation time by half.

How was this patch tested?

Computation time in local setup :
Before change:
~53 sec
After change:
~26 sec

Please review http://spark.apache.org/contributing.html before opening a pull request.

HyukjinKwon · 2018-10-07T16:16:02Z

ok to test

SparkQA · 2018-10-07T17:35:24Z

Test build #97087 has finished for PR 22659 at commit 2040ada.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shahidki31 · 2018-10-08T02:42:36Z

In the test, "multinomial logistic regression with intercept with elasticnet regularization" in the "LogisticRegressionSuite", taking around 1 minute to train 2 logistic regression model.
However after analyzing the training cost over iteration, we can reduce the computation time by 50%.
Training cost vs iteration for model 1

So, model1 is converging after iteration 200.

Training cost vs iteration for model 2:
image

After around 50 iteration, model2 is converging.
So, if we give maximum iteration for model1 and model2 as 220 and 90 respectively, we can reduce the computation time by half.

Computation time in local setup :
Before change:
~54 sec
After change:
~35 sec

… with intercept with L1 regularization 1 min 10 sec

shahidki31 · 2018-10-08T03:06:54Z

In the test "binary logistic regression with intercept with ElasticNet regularization", taking around 30sec to run. But we can reduce the time to 15 sec by reducing the iteration.

model1 converges after 100 iteration,

model2 converges after 20 iterations.
So, if we make maxIter of model1 and model2 as 120 and 30 respectively, we can reduce the time to ~15 sec.

In the test "multinomial logistic regression without intercept with elasticnet regularization", taking around 30 sec to run. This also can be reduced to 15 sec by reducing number of iteration.

model1 converges after 50 iteration.

model2 converges after 30 iteration.
So, if we make maxIter of model1 and model2 as 75 and 50 respectively, we can reduce the computation time less than 15sec

shahidki31 · 2018-10-08T03:20:47Z

Before the changes:
Running time of logistic regression suite: 4min 35 sec
After the changes:
Running time of logistic regression suite: 3min 22 sec

cc @srowen @HyukjinKwon . Kindly review

SparkQA · 2018-10-08T03:39:23Z

Test build #97093 has finished for PR 22659 at commit 3d9673e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-08T04:09:29Z

Test build #97094 has finished for PR 22659 at commit c28fd05.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shahidki31 · 2018-10-08T04:15:27Z

In Jenkins CI, testing time of logisticRegressionSuite without the PR is 5 min 10 sec and with the PR, 4 min 21 sec

srowen · 2018-10-09T00:10:00Z

Merged to master

shahidki31 · 2018-10-09T02:16:35Z

Thank you @srowen for merging.

…isticRegressionSuite ...with intercept with L1 regularization ## What changes were proposed in this pull request? In the test, "multinomial logistic regression with intercept with L1 regularization" in the "LogisticRegressionSuite", taking more than a minute due to training of 2 logistic regression model. However after analysing the training cost over iteration, we can reduce the computation time by 50%. Training cost vs iteration for model 1 ![image](https://user-images.githubusercontent.com/23054875/46573805-ddab7680-c9b7-11e8-9ee9-63a99d498475.png) So, model1 is converging after iteration 150. Training cost vs iteration for model 2 ![image](https://user-images.githubusercontent.com/23054875/46573790-b3f24f80-c9b7-11e8-89c0-81045ad647cb.png) After around 100 iteration, model2 is converging. So, if we give maximum iteration for model1 and model2 as 175 and 125 respectively, we can reduce the computation time by half. ## How was this patch tested? Computation time in local setup : Before change: ~53 sec After change: ~26 sec Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#22659 from shahidki31/SPARK-25623. Authored-by: Shahid <[email protected]> Signed-off-by: Sean Owen <[email protected]>

shahidki31 mentioned this pull request Oct 7, 2018

[SPARK-25624][TEST] Reduce test time of LogisticRegressionSuite.multinomial logistic regression… #22660

Closed

shahidki31 force-pushed the SPARK-25623 branch from 2040ada to 3d9673e Compare October 8, 2018 02:23

[SPARK-25623]LogisticRegressionSuite: multinomial logistic regressioN…

c28fd05

… with intercept with L1 regularization 1 min 10 sec

shahidki31 force-pushed the SPARK-25623 branch from 3d9673e to c28fd05 Compare October 8, 2018 02:52

shahidki31 changed the title ~~[SPARK-25623][TEST] Reduce test time of LogisticRegressionSuite: multinomial logistic regression....~~ [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce test time of LogisticRegressionSuite Oct 8, 2018

srowen approved these changes Oct 8, 2018

View reviewed changes

asfgit closed this in a4b14a9 Oct 9, 2018

shahidki31 deleted the SPARK-25623 branch October 9, 2018 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce test time of LogisticRegressionSuite #22659

[SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce test time of LogisticRegressionSuite #22659

Uh oh!

shahidki31 commented Oct 6, 2018 •

edited

Loading

Uh oh!

HyukjinKwon commented Oct 7, 2018

Uh oh!

SparkQA commented Oct 7, 2018

Uh oh!

shahidki31 commented Oct 8, 2018

Uh oh!

shahidki31 commented Oct 8, 2018

Uh oh!

shahidki31 commented Oct 8, 2018

Uh oh!

SparkQA commented Oct 8, 2018

Uh oh!

SparkQA commented Oct 8, 2018

Uh oh!

shahidki31 commented Oct 8, 2018

Uh oh!

srowen commented Oct 9, 2018

Uh oh!

shahidki31 commented Oct 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce test time of LogisticRegressionSuite #22659

[SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce test time of LogisticRegressionSuite #22659

Uh oh!

Conversation

shahidki31 commented Oct 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Oct 7, 2018

Uh oh!

SparkQA commented Oct 7, 2018

Uh oh!

shahidki31 commented Oct 8, 2018

Uh oh!

shahidki31 commented Oct 8, 2018

Uh oh!

shahidki31 commented Oct 8, 2018

Uh oh!

SparkQA commented Oct 8, 2018

Uh oh!

SparkQA commented Oct 8, 2018

Uh oh!

shahidki31 commented Oct 8, 2018

Uh oh!

srowen commented Oct 9, 2018

Uh oh!

shahidki31 commented Oct 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shahidki31 commented Oct 6, 2018 •

edited

Loading