Skip to content

Conversation

@scwf
Copy link
Contributor

@scwf scwf commented Nov 17, 2014

In some case, we need specify port range used in spark, such as firewall.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23477 has finished for PR 3314 at commit 1773b12.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is a special port, why change it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changed, here generate a random port in range (startPort, endPort)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scwf I think you need to pass through 0 here. It has special meaning. Refer to the scaladoc. Aren't you just changing the 'else' branch here to start and end within a certain range, instead of 1024-65536 always?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen, I am wondering if pass 0, will it give us a port out of the range?
Here I want to limit all the ports used in spark to range startPort ~ endPort.
And in 'else' branch, yes, here i should use the certain range instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is passed through to the start function unchanged.

This looks better, but I think we have to be careful with the semantics. "spark.port.max" sounds like a maximum allowable port, inclusive, but the way it's used here it's exclusive. That is, if I want to use ports 9000-9005, I would have to set min=9000 and max=9006, which isn't intuitive. You may need endPort+1 and document this.

This doesn't work if port isn't in the range, right? that seems like a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This doesn't work if port isn't in the range, right? that seems like a problem" , Hi, @srowen, you mean in else branch here? if port is not in the range, then ((port + offset - startPort) % (endPort - startPort)) + startPort will give us a port in this range

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with inclusive-exclusive issue. But i am still not clear for port 0, In my understanding, if we pass 0, it will use a random port available, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try port = 8949, offset = 0, startPort = 9000, endPort = 9050 for example. You'll get port 8999.

Hm, I think 0 is supposed to mean "startFunction decides". In practice it means choose randomly in most (all?) usages. This change would force it to be random. shrug maybe that's a good thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, get it.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23537 has finished for PR 3314 at commit cd1a88d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 22, 2014

Test build #23744 has finished for PR 3314 at commit da05d64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

@scwf what happens if I explicitly set my spark.ui.port to 5566 but my spark.port.max is 1024? There also seems to be a ton of other random ports that are affected. Do we need some kind of check to avoid conflicting port settings?

@srowen
Copy link
Member

srowen commented Feb 19, 2015

If you need to control what ports are chosen, don't you probably want to set them directly?

@tgravescs
Copy link
Contributor

Not necessarily (set them directly). We have a use case where certain hosts are restricted to use a subset of ports for communication purposes. On yarn you might end up with multiple executors on a single host but they can't share a port so having a range that they automatically find a free one on is best.

I was actually wanting to implement this as its needed for some of our customers.

@andrewor14
Copy link
Contributor

Also, maybe we should limit how small the difference between min and max is. For instance if there are only 10 available ports then it won't work because Spark uses more ports than that by default.

@scwf
Copy link
Contributor Author

scwf commented Feb 27, 2015

what happens if I explicitly set my spark.ui.port to 5566 but my spark.port.max is 1024? There also seems to be a ton of other random ports that are affected. Do we need some kind of check to avoid conflicting port settings?

the real spark.ui.port will be 1024, i think here is some problem, my initial idea is to make random port follow the specified range, we should respect user defined port explicitly(here is 5566). so how about change like this:

val serviceString = if (serviceName.isEmpty) "" else s" '$serviceName'"
    val startPort = conf.getInt("spark.random.port.min", 1024)
    val endPort = conf.getInt("spark.random.port.max", 65536)
    val maxRetries = portMaxRetries(conf)
    for (offset <- 0 to maxRetries) {
      // specific a random port between 'spark.random.port.min' and 'spark.random.port.max'
      // if port is 0
      val tryPort = if (port == 0) {
        (startPort + Math.random() * (endPort - startPort + 1)).toInt
      } else {
        // If the new port wraps around, do not try a privilege port
        ((port + offset - 1024) % (65536 - 1024)) + 1024
      }

@tgravescs
Copy link
Contributor

I haven't officially reviewed this, @andrewor14 @srowen, just wondering if there was open issues on this?

@tgravescs
Copy link
Contributor

@scwf assuming no open issues can you upmerge this. Many of the changes actually go away as the startServiceOnPort is now taking the conf in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm missing something, why aren't we applying the restriction when using port 0 (ephemeral port), most of the things default to 0 which is pick a port, we want those to end up in this range.

It seems like this would be more clear if the range is specified, just to ignore port passed in and iterate over that range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I think your last comment/question hits on this issue. that seems better. As long as all the services default to port 0 (other then web ui) this seems fine. That way if user does specify a port explicitly it will still use that.

@SparkQA
Copy link

SparkQA commented Mar 19, 2015

Test build #28856 has finished for PR 3314 at commit 6e609da.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class CheckpointWriteHandler(

@SparkQA
Copy link

SparkQA commented Mar 19, 2015

Test build #28866 has finished for PR 3314 at commit fa5bcbb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 19, 2015

Hm, I suppose I still feel a little funny about one global 'minimum port' and 'maximum port' when most port settings are not related and not going to have the same use case that drives the need for this with executors.

Let me throw out a different idea: alternate syntax for ports that lets you specify a range like 8080:8085 instead of just 8080? OK, that touches more code, but gives more fine-grained control?

@scwf
Copy link
Contributor Author

scwf commented Mar 19, 2015

Actually we have this use case to specify range for random port started in spark.

specify a range like 8080:8085 ? maybe not get you, how can this way give more fine-grained control?

@srowen
Copy link
Member

srowen commented Mar 19, 2015

I mean, make all existing port properties accept not just a number x, but a string like x:y, which means, "choose a port between x and y inclusive at random". This would you let you use a large range, small range, or no range at all for different ports.

@tgravescs
Copy link
Contributor

I like that idea. Gives flexibility to control each one, uses same configs so users aren't confused by the globals overriding.

@felixb
Copy link

felixb commented Apr 2, 2015

Does this configuration include the akka port as well?

@scwf
Copy link
Contributor Author

scwf commented Apr 2, 2015

yes, it is

@andrewor14
Copy link
Contributor

My biggest concern with introducing a general port range is that people might assume all ports Spark will use will fall under that range. If we do want to enforce that then we need to make sure all of these entities (block manager, connection manager...) go through startServiceOnPort, which may be a bigger change and may add maintenance burden on us in the future.

Maybe we should first do a survey of all the ports used by Spark to see how invasive of a change this will be, but at this point I am hesitant to go forward with this issue unless we have an explicit idea of what a general port range across all of Spark means.

@tgravescs
Copy link
Contributor

@andrewor14 I'm not sure I follow your comment about "all ports Spark will use fill all under that range"?
If you keep the configs separate like what was mentioned in the last couple comments and just allow the range specifier for each then I don't think your statement applies unless I'm misunderstanding.

@andrewor14
Copy link
Contributor

Ah I see. I misunderstood what was being proposed. The proposal is not to have a global range, but a local range for each of the existing port configs, correct?

@tgravescs
Copy link
Contributor

correct, that was the last proposal anyway. @scwf were you going to make those changes or did you have concerns or other ideas?

@scwf
Copy link
Contributor Author

scwf commented Apr 15, 2015

ok, I will make these changes later this week.

@scwf
Copy link
Contributor Author

scwf commented Apr 27, 2015

i am closing this in favor of #5722

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants