-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-3391][EC2] Support attaching up to 8 EBS volumes. #2260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This needs to be used together with mesos/spark-ec2#65 and mesos/spark-ec2#66 |
|
Tested by launching 8 EBS volumes on r3.8xlarge instances. |
ec2/spark_ec2.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is gp2 ? Is this applicable to all instances etc. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gp2 is the new general purpose instance, which is the new recommended one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://aws.amazon.com/ebs/details/ says that GP2 implies attaching SSD-based EBS volumes -- which sounds good. But this is 2x more expensive compared to standard ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdeyhim any comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just make this configurable. Some people might prefer the spinning disks.
|
Ok I made the ebs volume type configurable. |
|
LGTM. |
|
for io1, specifying the number of iops is required. So we either have to limit this to gp2 and standard or fully support io1 by allowing users to specify the number of iops |
|
Ok merging this (and removed io1 for now). |
|
And what happens when the additional EBS volumes get added? We probably want to configure spark-env.sh and spark_local_dir with the new volumes correct? the place this happens is here: https://github.com/rxin/spark/blob/ec2-ebs-vol/ec2/spark_ec2.py#L674-L678 but that snippet only configures local disks in spark-env.sh and not the new EBS volumes. |
|
the ebs volumes are not great for shuffle (bad small write performance). Let's hold that off for now. |
|
@rxin ok that's correct for smaller instance types. But FYI, EBS on larger instances (and ebs optimized instances) should perform well on shuffle read/write |
Please merge this at the same time as mesos/spark-ec2#66