Skip to content

Conversation

@WangTaoTheTonic
Copy link
Contributor

https://issues.apache.org/jira/browse/SPARK-7086

I just fix it in master side and maybe there are more to fix?

//cc @andrewor14

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30828 has finished for PR 5657 at commit a9dbda8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@WangTaoTheTonic
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30835 has finished for PR 5657 at commit a9dbda8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class NumericType extends NativeType
  • This patch does not change any dependencies.

@srowen
Copy link
Member

srowen commented Apr 25, 2015

This is related to #5575. I am not sure this is something we must force on users. There could be decent reasons to retry binding, and it doesn't 'hurt' except to waste a few tries. This is also a little arbitrary to change the behavior on just a few services. I don't think we should do this.

@WangTaoTheTonic
Copy link
Contributor Author

Considering one condition: user submit apps to master with a port config, let's say spark://somehost:7077, and let workers connect to master same way. Once master open another port, for instance 7078, it will be unavailable to submit apps or accept works' registeration.

I know that retrying policy decrease probability of failing when launching master but in the meantime it increase the chance for others to connect it.

Besides I have taken a look at start-all.sh, and it pass one specify port to slave node to launch worker. Obviously worker could not take to master when master take a different port with the passed one.

@srowen I thought it must make some trouble if we take that retries on "public" port unless we found other way to solve it.

@srowen
Copy link
Member

srowen commented Apr 27, 2015

This change merely causes it to never retry. It doesn't cause the master to use another port, right? That would be bad for the reason you give, but this is changing the retry property.

@WangTaoTheTonic
Copy link
Contributor Author

If retry, then master will use another port. We can see it from Utils.scala:

for (offset <- 0 to maxRetries) {
...
((startPort + offset - 1024) % (65536 - 1024)) + 1024
...
logWarning(s"Service$serviceString could not bind on port $tryPort. " +
s"Attempting port ${tryPort + 1}.")

@srowen
Copy link
Member

srowen commented Apr 27, 2015

Ah, we don't have this change committed yet: #3314 (Or, a variant on this.) The right-er way to fix this is to be able to express a range of ports, which might only include 1 port, in which case there would be no more retries anyway. I suggesting focusing on resolving SPARK-4449 as a way to fix this.

@WangTaoTheTonic
Copy link
Contributor Author

After taking a look at #3314 and discussion with @scwf offline, we both think the "speifty port range for each" idea is better for issue SPARK-7086 and SPARK-4449.

So I will close this and keep track at #3314.

@srowen Thanks for your comments and nice idea. 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants