Skip to content

Conversation

@yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Aug 6, 2016

What changes were proposed in this pull request?

Similar to LeastSquaresAggregator in #14109, AFTAggregator used for AFTSurvivalRegression ends up serializing the parameters and featuresStd, which is not necessary and can cause performance issues for high dimensional data. This patch removes this serialization. This PR is highly inspired by #14109.

How was this patch tested?

I tested this locally and verified the serialization reduction.

Before patch
image

After patch
image

@SparkQA
Copy link

SparkQA commented Aug 6, 2016

Test build #63314 has finished for PR 14519 at commit d152a3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

cc @sethah @dbtsai

@srowen
Copy link
Member

srowen commented Aug 6, 2016

Let's put this into #14109

// sigma is the scale parameter of the AFT model
private val sigma = math.exp(parameters(0))
@transient private lazy val sigma = math.exp(parameters(0))

Copy link
Member

@dbtsai dbtsai Aug 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 506,

  private val gradientSumArray = Array.ofDim[Double](parameters.length)

the code will evaluate the lazy parameters in the driver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, after thinking a bit, some of the lazy is not needed. lazy is for avoiding doing computation in the driver; however
@transient private val parameters = bcParameters.value should work without lazy. Also, sigma or intercept may not need lazy. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbtsai I addressed the parameters.length issue. But I can not remove lazy from @transient private lazy val parameters = bcParameters.value and intercept/sigma. Otherwise, it complains NullPointerException. If I removed both @transient and lazy, it works well, but this does not coincide with our requirements. It's a little weird and I'm still work on to figure out the root cause, can you give me some suggestion? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. In scala, when we use @transient private val, that lazy evaluation will be only evaluated once even after serialization/deserialization cycle. As a result, after the AFTAggregator is broadcasted into executors, the variable will be be evaluated again, and will be default to null.

@SparkQA
Copy link

SparkQA commented Aug 8, 2016

Test build #63362 has finished for PR 14519 at commit 287f153.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor

sethah commented Aug 8, 2016

@yanboliang Can you do a quick test to make sure the shuffle write size is the expected size? For example, in logistic regression only the gradient should be serialized which is an array of numFeatures doubles. The expected shuffle write size is then roughly numFeatures * 8 bytes for each task. It would be nice to check before/after.

@dbtsai
Copy link
Member

dbtsai commented Aug 8, 2016

LGTM. Will be nice to see the compassion of shuffle write size, and then will be ready to merge. Thanks.

@yanboliang
Copy link
Contributor Author

I tested this locally and verified the serialization reduction. I posted the shuffle size comparison diagram in the PR description. I will merge this into master. Thanks for your review! @dbtsai @sethah

@asfgit asfgit closed this in 182e119 Aug 9, 2016
@yanboliang yanboliang deleted the spark-16933 branch August 9, 2016 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants