Skip to content

Conversation

@shahidki31
Copy link
Contributor

@shahidki31 shahidki31 commented Oct 6, 2018

...with intercept with L1 regularization

What changes were proposed in this pull request?

In the test, "multinomial logistic regression with intercept with L1 regularization" in the "LogisticRegressionSuite", taking more than a minute due to training of 2 logistic regression model.
However after analysing the training cost over iteration, we can reduce the computation time by 50%.
Training cost vs iteration for model 1
image

So, model1 is converging after iteration 150.

Training cost vs iteration for model 2

image

After around 100 iteration, model2 is converging.
So, if we give maximum iteration for model1 and model2 as 175 and 125 respectively, we can reduce the computation time by half.

How was this patch tested?

Computation time in local setup :
Before change:
~53 sec
After change:
~26 sec

Please review http://spark.apache.org/contributing.html before opening a pull request.

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Oct 7, 2018

Test build #97087 has finished for PR 22659 at commit 2040ada.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shahidki31
Copy link
Contributor Author

In the test, "multinomial logistic regression with intercept with elasticnet regularization" in the "LogisticRegressionSuite", taking around 1 minute to train 2 logistic regression model.
However after analyzing the training cost over iteration, we can reduce the computation time by 50%.
Training cost vs iteration for model 1

image

So, model1 is converging after iteration 200.

Training cost vs iteration for model 2:
image
image

After around 50 iteration, model2 is converging.
So, if we give maximum iteration for model1 and model2 as 220 and 90 respectively, we can reduce the computation time by half.

Computation time in local setup :
Before change:
~54 sec
After change:
~35 sec

… with intercept with L1 regularization 1 min 10 sec
@shahidki31
Copy link
Contributor Author

In the test "binary logistic regression with intercept with ElasticNet regularization", taking around 30sec to run. But we can reduce the time to 15 sec by reducing the iteration.

image
model1 converges after 100 iteration,
image
model2 converges after 20 iterations.
So, if we make maxIter of model1 and model2 as 120 and 30 respectively, we can reduce the time to ~15 sec.

In the test "multinomial logistic regression without intercept with elasticnet regularization", taking around 30 sec to run. This also can be reduced to 15 sec by reducing number of iteration.
image
model1 converges after 50 iteration.
image
model2 converges after 30 iteration.
So, if we make maxIter of model1 and model2 as 75 and 50 respectively, we can reduce the computation time less than 15sec

@shahidki31 shahidki31 changed the title [SPARK-25623][TEST] Reduce test time of LogisticRegressionSuite: multinomial logistic regression.... [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce test time of LogisticRegressionSuite Oct 8, 2018
@shahidki31
Copy link
Contributor Author

Before the changes:
Running time of logistic regression suite: 4min 35 sec
After the changes:
Running time of logistic regression suite: 3min 22 sec

cc @srowen @HyukjinKwon . Kindly review

@SparkQA
Copy link

SparkQA commented Oct 8, 2018

Test build #97093 has finished for PR 22659 at commit 3d9673e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 8, 2018

Test build #97094 has finished for PR 22659 at commit c28fd05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shahidki31
Copy link
Contributor Author

In Jenkins CI, testing time of logisticRegressionSuite without the PR is 5 min 10 sec and with the PR, 4 min 21 sec

@srowen
Copy link
Member

srowen commented Oct 9, 2018

Merged to master

@asfgit asfgit closed this in a4b14a9 Oct 9, 2018
@shahidki31
Copy link
Contributor Author

Thank you @srowen for merging.

@shahidki31 shahidki31 deleted the SPARK-25623 branch October 9, 2018 02:16
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…isticRegressionSuite

...with intercept with L1 regularization

## What changes were proposed in this pull request?

In the test, "multinomial logistic regression with intercept with L1 regularization" in the "LogisticRegressionSuite", taking more than a minute due to training of 2 logistic regression model.
However after analysing the training cost over iteration, we can reduce the computation time by 50%.
Training cost vs iteration for model 1
![image](https://user-images.githubusercontent.com/23054875/46573805-ddab7680-c9b7-11e8-9ee9-63a99d498475.png)

So, model1 is converging after iteration 150.

Training cost vs iteration for model 2

![image](https://user-images.githubusercontent.com/23054875/46573790-b3f24f80-c9b7-11e8-89c0-81045ad647cb.png)

After around 100 iteration, model2 is converging.
So, if we give maximum iteration for model1 and model2 as 175 and 125 respectively, we can reduce the computation time by half.

## How was this patch tested?
Computation time in local setup :
Before change:
~53 sec
After change:
~26 sec

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes apache#22659 from shahidki31/SPARK-25623.

Authored-by: Shahid <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants