-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[EC2] [SPARK-6188] Instance types can be mislabeled when re-starting cluster with default arguments #4916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EC2] [SPARK-6188] Instance types can be mislabeled when re-starting cluster with default arguments #4916
Changes from all commits
3ebd52a
7b32429
a3d29fe
6705b98
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1259,6 +1259,17 @@ def real_main(): | |
| cluster_instances=(master_nodes + slave_nodes), | ||
| cluster_state='ssh-ready' | ||
| ) | ||
|
|
||
| # Determine types of running instances | ||
| existing_master_type = master_nodes[0].instance_type | ||
| existing_slave_type = slave_nodes[0].instance_type | ||
| # Setting opts.master_instance_type to the empty string indicates we | ||
| # have the same instance type for the master and the slaves | ||
| if existing_master_type == existing_slave_type: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The setting of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @shivaram I think you are correct, the opts.master_instance_type is not accessed at any point in setup_cluster, so this is redundant. But I think this could lead to another bug. Since the master gets set up using the slaves' instance type (line 860), it might be mis-configured in the same way. But perhaps this is not a problem, as the master should not be an executor as well. In that sense deploy_files is a bit problematic currently, as it will deploy files on the master that will be rsynced to the slaves, so in one sense it does the right thing by deploying the files configured as they should be on the slaves, but at the same time, the configuration of the master itself is done with the instance type of the slaves, which could be different from the masters'. Example:
Do you think this can be a problem? Other wise I can just remove the if clause as the setting is unnecessary with the current code.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point -- but the problem of Anyways I'm fine with setting it to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, added a comment indicating why we do this, even though the master instance type it's not used in this PR. |
||
| existing_master_type = "" | ||
| opts.master_instance_type = existing_master_type | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this not force the instance types to match the existing ones? that's not inherently required, right? I might misunderstand.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are correct. Perhaps giving the user the option to change the instance type is a good middle ground. But mis-representing the instance type by having the default value can cause problems as we saw, without the user's knowledge.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It makes some sense that the default is to 'same instance type', yes, but not forcing it to be the same one. It could emit an info message if they differ.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK so the use case would be:
Is 3.ii a viable use-case? Is it possible to change the type of the instance being re-launched, for example in the case where the cluster also had persistent storage .i.e. extra EBS volumes that are still attached to the stopped instances?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Paging @nchammas and or @shivaram for review as I am really not the best person to ask, but, I had expected that you could change the instance type on restart. It may not work for all possible combinations, but it seemed like something this script doesn't have to police -- but also should not prohibit. |
||
| opts.instance_type = existing_slave_type | ||
|
|
||
| setup_cluster(conn, master_nodes, slave_nodes, opts, False) | ||
|
|
||
| else: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question here too, does this assume there exists a running master node? it's referencing the first running master node, but if there aren't any? or is that handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is handled in get_existing_cluster, which is called in line 1247
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my comments earlier, this change does not remove existing
functionality. It is currently not possible to change instance types when
starting a stopped cluster.
2015년 3월 7일 (토) 오후 12:08, Theodore Vasiloudis [email protected]님이
작성: