Skip to content

Conversation

@gaborgsomogyi
Copy link
Contributor

What changes were proposed in this pull request?

There is a race condition introduced in SPARK-11141 which could cause data loss.
The problem is that ReceivedBlockTracker.insertAllocatedBatch function assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the batch and clears the queue.

In this PR only the allocated blocks will be removed from the queue which will prevent data loss.

How was this patch tested?

Additional unit test + manually.

@gaborgsomogyi
Copy link
Contributor Author

@jose-torres @tdas @zsxwing could you take a look at this please?

@vanzin
Copy link
Contributor

vanzin commented Feb 15, 2018

ok to test

@SparkQA
Copy link

SparkQA commented Feb 15, 2018

Test build #87489 has finished for PR 20620 at commit 152fec4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 15, 2018

Test build #87494 has finished for PR 20620 at commit bd46d1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

s"${allocatedBlocks.streamIdToAllocatedBlocks}")
streamIdToUnallocatedBlockQueues.values.foreach { _.clear() }
allocatedBlocks.streamIdToAllocatedBlocks.foreach {
case (streamId, allocatedBlocks) =>
Copy link
Member

@viirya viirya Feb 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we use another name (maybe allocatedBlocksInStream?) other than allocatedBlocks to avoid confusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed.

@SparkQA
Copy link

SparkQA commented Feb 17, 2018

Test build #87519 has finished for PR 20620 at commit 23e0204.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

Seems like unrelated issue.

@viirya
Copy link
Member

viirya commented Feb 17, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Feb 17, 2018

Test build #87522 has finished for PR 20620 at commit 23e0204.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Feb 20, 2018

LGTM.

@vanzin
Copy link
Contributor

vanzin commented Feb 26, 2018

Merging to master, will try back to 2.0.

@asfgit asfgit closed this in b308182 Feb 26, 2018
@vanzin
Copy link
Contributor

vanzin commented Feb 26, 2018

(Argh, the wifi here is horrible, I'll need to manually merge things, so hang on a sec...)

asfgit pushed a commit that referenced this pull request Feb 26, 2018
…rashes

There is a race condition introduced in SPARK-11141 which could cause data loss.
The problem is that ReceivedBlockTracker.insertAllocatedBatch function assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the batch and clears the queue.

In this PR only the allocated blocks will be removed from the queue which will prevent data loss.

Additional unit test + manually.

Author: Gabor Somogyi <[email protected]>

Closes #20620 from gaborgsomogyi/SPARK-23438.

(cherry picked from commit b308182)
Signed-off-by: Marcelo Vanzin <[email protected]>
asfgit pushed a commit that referenced this pull request Feb 26, 2018
…rashes

There is a race condition introduced in SPARK-11141 which could cause data loss.
The problem is that ReceivedBlockTracker.insertAllocatedBatch function assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the batch and clears the queue.

In this PR only the allocated blocks will be removed from the queue which will prevent data loss.

Additional unit test + manually.

Author: Gabor Somogyi <[email protected]>

Closes #20620 from gaborgsomogyi/SPARK-23438.

(cherry picked from commit b308182)
Signed-off-by: Marcelo Vanzin <[email protected]>
asfgit pushed a commit that referenced this pull request Feb 26, 2018
…rashes

There is a race condition introduced in SPARK-11141 which could cause data loss.
The problem is that ReceivedBlockTracker.insertAllocatedBatch function assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the batch and clears the queue.

In this PR only the allocated blocks will be removed from the queue which will prevent data loss.

Additional unit test + manually.

Author: Gabor Somogyi <[email protected]>

Closes #20620 from gaborgsomogyi/SPARK-23438.

(cherry picked from commit b308182)
Signed-off-by: Marcelo Vanzin <[email protected]>
asfgit pushed a commit that referenced this pull request Feb 26, 2018
…rashes

There is a race condition introduced in SPARK-11141 which could cause data loss.
The problem is that ReceivedBlockTracker.insertAllocatedBatch function assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the batch and clears the queue.

In this PR only the allocated blocks will be removed from the queue which will prevent data loss.

Additional unit test + manually.

Author: Gabor Somogyi <[email protected]>

Closes #20620 from gaborgsomogyi/SPARK-23438.

(cherry picked from commit b308182)
Signed-off-by: Marcelo Vanzin <[email protected]>
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…rashes

There is a race condition introduced in SPARK-11141 which could cause data loss.
The problem is that ReceivedBlockTracker.insertAllocatedBatch function assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the batch and clears the queue.

In this PR only the allocated blocks will be removed from the queue which will prevent data loss.

Additional unit test + manually.

Author: Gabor Somogyi <[email protected]>

Closes apache#20620 from gaborgsomogyi/SPARK-23438.

(cherry picked from commit b308182)
Signed-off-by: Marcelo Vanzin <[email protected]>
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…rashes

There is a race condition introduced in SPARK-11141 which could cause data loss.
The problem is that ReceivedBlockTracker.insertAllocatedBatch function assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the batch and clears the queue.

In this PR only the allocated blocks will be removed from the queue which will prevent data loss.

Additional unit test + manually.

Author: Gabor Somogyi <[email protected]>

Closes apache#20620 from gaborgsomogyi/SPARK-23438.

(cherry picked from commit b308182)
Signed-off-by: Marcelo Vanzin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants