Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

After #17596 , we do not send internal accumulator name to executor side anymore, and always look up the accumulator name in AccumulatorContext.

This cause a regression if the accumulator is already garbage collected, this PR fixes this by still sending accumulator name for SQLMetrics.

How was this patch tested?

N/A

@cloud-fan
Copy link
Contributor Author

cc @vanzin

@rxin
Copy link
Contributor

rxin commented May 10, 2017

What's the issue with SQL metrics?

@SparkQA
Copy link

SparkQA commented May 10, 2017

Test build #76726 has started for PR 17931 at commit 5ae4f6e.

@cloud-fan
Copy link
Contributor Author

we will keep SQLMetrics that sent from executors as UI data, while the actual accumulator registered for SQLMetrics may be garbage collected as the SparkPlan linked with it is garbage collected.

task context accumulators don't have this problem as we always keep the registered accumulators in DAGScheduler

@cloud-fan cloud-fan changed the title [SPARK-12837][CORE][FOLLOWUP] getting name should not fail if accumulator is garbage collected [SPARK-12837][SPARK-20666][CORE][FOLLOWUP] getting name should not fail if accumulator is garbage collected May 10, 2017
@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 10, 2017

Test build #76741 has finished for PR 17931 at commit 5ae4f6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented May 11, 2017

by still sending accumulator name for SQLMetrics

Do you have an idea of how much this undoes the benefits of SPARK-12837? You're still avoiding sending the names of internal metrics, but I don't have a feel for how many accumulators a large sql query might generate.

Also, is that really necessary? The errors seen in the bug (and that I noticed in my testing) were always on the driver side.

@cloud-fan
Copy link
Contributor Author

at the time we received SQLMetrics in SQLListener with task end event, the registered accumulator may already be GCed, then there is no way to retrieve the accumulator names, except we sending accumulator names to executor side, so that when executor can send back accumulators to driver side with names.

@vanzin
Copy link
Contributor

vanzin commented May 12, 2017

I see. What about the gains from SPARK-12837? Are they still enough that the change is justified, or should we just revert it instead?

@vanzin
Copy link
Contributor

vanzin commented May 12, 2017

(BTW, if keeping the code, a slightly more verbose comment in the code explaining why non-internal accumulators still need to send their names would be good.)

@cloud-fan
Copy link
Contributor Author

cloud-fan commented May 15, 2017

I checked the SPARK-12837 again and we only lose a little from this PR. I'll add more comments.

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76937 has finished for PR 17931 at commit 2a3f773.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented May 15, 2017

LGTM.

@vanzin
Copy link
Contributor

vanzin commented May 15, 2017

Merging to master / 2.2.

asfgit pushed a commit that referenced this pull request May 15, 2017
…il if accumulator is garbage collected

## What changes were proposed in this pull request?

After #17596 , we do not send internal accumulator name to executor side anymore, and always look up the accumulator name in `AccumulatorContext`.

This cause a regression if the accumulator is already garbage collected, this PR fixes this by still sending accumulator name for `SQLMetrics`.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes #17931 from cloud-fan/bug.

(cherry picked from commit e1aaab1)
Signed-off-by: Marcelo Vanzin <[email protected]>
@asfgit asfgit closed this in e1aaab1 May 15, 2017
robert3005 pushed a commit to palantir/spark that referenced this pull request May 19, 2017
…il if accumulator is garbage collected

## What changes were proposed in this pull request?

After apache#17596 , we do not send internal accumulator name to executor side anymore, and always look up the accumulator name in `AccumulatorContext`.

This cause a regression if the accumulator is already garbage collected, this PR fixes this by still sending accumulator name for `SQLMetrics`.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes apache#17931 from cloud-fan/bug.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
…il if accumulator is garbage collected

## What changes were proposed in this pull request?

After apache#17596 , we do not send internal accumulator name to executor side anymore, and always look up the accumulator name in `AccumulatorContext`.

This cause a regression if the accumulator is already garbage collected, this PR fixes this by still sending accumulator name for `SQLMetrics`.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes apache#17931 from cloud-fan/bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants