Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Sep 25, 2015

Fix the following issues in StandaloneDynamicAllocationSuite:

  1. It should not assume master and workers start in order
  2. It should not assume master and workers get ready at once
  3. It should not assume the application is already registered with master after creating SparkContext
  4. It should not access Master.app and idToApp which are not thread safe

The changes includes:

  • Use eventually to wait until master and workers are ready to fix 1 and 2
  • Use eventually to wait until the application is registered with master to fix 3
  • Use askWithRetry[MasterStateResponse](RequestMasterState) to get the application info to fix 4

@zsxwing
Copy link
Member Author

zsxwing commented Sep 25, 2015

I will send a new PR to include #6457 and #8905 once this one gets merged.

@zsxwing
Copy link
Member Author

zsxwing commented Sep 25, 2015

/cc @andrewor14 since you wrote the original codes.

@SparkQA
Copy link

SparkQA commented Sep 25, 2015

Test build #43005 has finished for PR 8914 at commit e6657e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 25, 2015

Test build #43006 has finished for PR 8914 at commit 89e72fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 25, 2015

Test build #43026 has finished for PR 8914 at commit d06d476.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Sep 25, 2015

LGTM but Andrew should take a look. I looked at this code recently and really found it odd that it expected things to just be updated "atomically", so good thing I was not crazy.

asfgit pushed a commit that referenced this pull request Sep 29, 2015
…AllocationSuite

Fix the following issues in StandaloneDynamicAllocationSuite:

1. It should not assume master and workers start in order
2. It should not assume master and workers get ready at once
3. It should not assume the application is already registered with master after creating SparkContext
4. It should not access Master.app and idToApp which are not thread safe

The changes includes:
* Use `eventually` to wait until master and workers are ready to fix 1 and 2
* Use `eventually`  to wait until the application is registered with master to fix 3
* Use `askWithRetry[MasterStateResponse](RequestMasterState)` to get the application info to fix 4

Author: zsxwing <[email protected]>

Closes #8914 from zsxwing/fix-StandaloneDynamicAllocationSuite.

(cherry picked from commit dba95ea)
Signed-off-by: Andrew Or <[email protected]>
@asfgit asfgit closed this in dba95ea Sep 29, 2015
@andrewor14
Copy link
Contributor

Thanks for fixing this! I'm surprised that it hasn't been flaky until recently. It must be due to the increase on Jenkins load. Merged into master and 1.5.

@zsxwing zsxwing deleted the fix-StandaloneDynamicAllocationSuite branch September 30, 2015 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants