Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions ec2/spark_ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -1259,6 +1259,17 @@ def real_main():
cluster_instances=(master_nodes + slave_nodes),
cluster_state='ssh-ready'
)

# Determine types of running instances
existing_master_type = master_nodes[0].instance_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question here too, does this assume there exists a running master node? it's referencing the first running master node, but if there aren't any? or is that handled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is handled in get_existing_cluster, which is called in line 1247

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comments earlier, this change does not remove existing
functionality. It is currently not possible to change instance types when
starting a stopped cluster.
2015년 3월 7일 (토) 오후 12:08, Theodore Vasiloudis [email protected]님이
작성:

In ec2/spark_ec2.py
#4916 (comment):

@@ -1259,6 +1259,15 @@ def real_main():
cluster_instances=(master_nodes + slave_nodes),
cluster_state='ssh-ready'
)
+

  •    # Determine types of running instances
    
  •    existing_master_type = master_nodes[0].instance_type
    

Yup, this is handled in get_existing_cluster
https://github.com/thvasilo/spark/blob/SPARK-6188%5D-Instance-types-can-be-mislabeled-when-re-starting-cluster-with-default-arguments/ec2/spark_ec2.py#L617


Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/4916/files#r25997522.

existing_slave_type = slave_nodes[0].instance_type
# Setting opts.master_instance_type to the empty string indicates we
# have the same instance type for the master and the slaves
if existing_master_type == existing_slave_type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setting of master_instance_type = "" seems unnecessary even if its the same as slave_instance_type ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram I think you are correct, the opts.master_instance_type is not accessed at any point in setup_cluster, so this is redundant.

But I think this could lead to another bug. Since the master gets set up using the slaves' instance type (line 860), it might be mis-configured in the same way. But perhaps this is not a problem, as the master should not be an executor as well.

In that sense deploy_files is a bit problematic currently, as it will deploy files on the master that will be rsynced to the slaves, so in one sense it does the right thing by deploying the files configured as they should be on the slaves, but at the same time, the configuration of the master itself is done with the instance type of the slaves, which could be different from the masters'.

Example:

  1. User launches cluster, slaves are m3.xlarge, master is m3.large.
  2. User stops and restarts cluster.
  3. Files deployed on master are configured for m3.xlarge, these get rsynced to the slaves through setup.sh.
  4. Now the slaves have correct configuration files, but the master has been configured as an m3.xlarge instance.

Do you think this can be a problem? Other wise I can just remove the if clause as the setting is unnecessary with the current code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- but the problem of master_instance_type differing from the slaves and handling the configs cleanly is outside the scope of this PR.

Anyways I'm fine with setting it to "" or not -- one thing you could do is add a comment saying why we do it as it seems unintuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added a comment indicating why we do this, even though the master instance type it's not used in this PR.

existing_master_type = ""
opts.master_instance_type = existing_master_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not force the instance types to match the existing ones? that's not inherently required, right? I might misunderstand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Perhaps giving the user the option to change the instance type is a good middle ground. But mis-representing the instance type by having the default value can cause problems as we saw, without the user's knowledge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes some sense that the default is to 'same instance type', yes, but not forcing it to be the same one. It could emit an info message if they differ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so the use case would be:

  1. User starts cluster with some instance type, and later stops it
  2. The user now wants to restart the cluster.
  3. We can either:
    1. Just restart the same instances.
    2. Or allow the user to change the instance types by redefining them in the options passed with the start command.

Is 3.ii a viable use-case? Is it possible to change the type of the instance being re-launched, for example in the case where the cluster also had persistent storage .i.e. extra EBS volumes that are still attached to the stopped instances?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paging @nchammas and or @shivaram for review as I am really not the best person to ask, but, I had expected that you could change the instance type on restart. It may not work for all possible combinations, but it seemed like something this script doesn't have to police -- but also should not prohibit.

opts.instance_type = existing_slave_type

setup_cluster(conn, master_nodes, slave_nodes, opts, False)

else:
Expand Down