-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-4449][Core] Specify port range in spark #3314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #23477 has finished for PR 3314 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 is a special port, why change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No changed, here generate a random port in range (startPort, endPort)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scwf I think you need to pass through 0 here. It has special meaning. Refer to the scaladoc. Aren't you just changing the 'else' branch here to start and end within a certain range, instead of 1024-65536 always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen, I am wondering if pass 0, will it give us a port out of the range?
Here I want to limit all the ports used in spark to range startPort ~ endPort.
And in 'else' branch, yes, here i should use the certain range instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 is passed through to the start function unchanged.
This looks better, but I think we have to be careful with the semantics. "spark.port.max" sounds like a maximum allowable port, inclusive, but the way it's used here it's exclusive. That is, if I want to use ports 9000-9005, I would have to set min=9000 and max=9006, which isn't intuitive. You may need endPort+1 and document this.
This doesn't work if port isn't in the range, right? that seems like a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This doesn't work if port isn't in the range, right? that seems like a problem" , Hi, @srowen, you mean in else branch here? if port is not in the range, then ((port + offset - startPort) % (endPort - startPort)) + startPort will give us a port in this range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with inclusive-exclusive issue. But i am still not clear for port 0, In my understanding, if we pass 0, it will use a random port available, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try port = 8949, offset = 0, startPort = 9000, endPort = 9050 for example. You'll get port 8999.
Hm, I think 0 is supposed to mean "startFunction decides". In practice it means choose randomly in most (all?) usages. This change would force it to be random. shrug maybe that's a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, get it.
|
Test build #23537 has finished for PR 3314 at commit
|
|
Test build #23744 has finished for PR 3314 at commit
|
|
@scwf what happens if I explicitly set my |
|
If you need to control what ports are chosen, don't you probably want to set them directly? |
|
Not necessarily (set them directly). We have a use case where certain hosts are restricted to use a subset of ports for communication purposes. On yarn you might end up with multiple executors on a single host but they can't share a port so having a range that they automatically find a free one on is best. I was actually wanting to implement this as its needed for some of our customers. |
|
Also, maybe we should limit how small the difference between min and max is. For instance if there are only 10 available ports then it won't work because Spark uses more ports than that by default. |
the real spark.ui.port will be 1024, i think here is some problem, my initial idea is to make random port follow the specified range, we should respect user defined port explicitly(here is 5566). so how about change like this: |
|
I haven't officially reviewed this, @andrewor14 @srowen, just wondering if there was open issues on this? |
|
@scwf assuming no open issues can you upmerge this. Many of the changes actually go away as the startServiceOnPort is now taking the conf in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm missing something, why aren't we applying the restriction when using port 0 (ephemeral port), most of the things default to 0 which is pick a port, we want those to end up in this range.
It seems like this would be more clear if the range is specified, just to ignore port passed in and iterate over that range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I think your last comment/question hits on this issue. that seems better. As long as all the services default to port 0 (other then web ui) this seems fine. That way if user does specify a port explicitly it will still use that.
|
Test build #28856 has finished for PR 3314 at commit
|
|
Test build #28866 has finished for PR 3314 at commit
|
|
Hm, I suppose I still feel a little funny about one global 'minimum port' and 'maximum port' when most port settings are not related and not going to have the same use case that drives the need for this with executors. Let me throw out a different idea: alternate syntax for ports that lets you specify a range like |
|
Actually we have this use case to specify range for random port started in spark.
|
|
I mean, make all existing port properties accept not just a number x, but a string like x:y, which means, "choose a port between x and y inclusive at random". This would you let you use a large range, small range, or no range at all for different ports. |
|
I like that idea. Gives flexibility to control each one, uses same configs so users aren't confused by the globals overriding. |
|
Does this configuration include the akka port as well? |
|
yes, it is |
|
My biggest concern with introducing a general port range is that people might assume all ports Spark will use will fall under that range. If we do want to enforce that then we need to make sure all of these entities (block manager, connection manager...) go through Maybe we should first do a survey of all the ports used by Spark to see how invasive of a change this will be, but at this point I am hesitant to go forward with this issue unless we have an explicit idea of what a general port range across all of Spark means. |
|
@andrewor14 I'm not sure I follow your comment about "all ports Spark will use fill all under that range"? |
|
Ah I see. I misunderstood what was being proposed. The proposal is not to have a global range, but a local range for each of the existing port configs, correct? |
|
correct, that was the last proposal anyway. @scwf were you going to make those changes or did you have concerns or other ideas? |
|
ok, I will make these changes later this week. |
|
i am closing this in favor of #5722 |
In some case, we need specify port range used in spark, such as firewall.