-
Notifications
You must be signed in to change notification settings - Fork 383
Scheduler fixes and improvements #828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
90a04da
cf39d45
222c897
2a4ed10
4004cf7
8ac3d1e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -55,27 +55,15 @@ abstract class ZippedPartitionsBaseRDD[V: ClassManifest]( | |
| } | ||
|
|
||
| override def getPreferredLocations(s: Partition): Seq[String] = { | ||
| // Note that as number of rdd's increase and/or number of slaves in cluster increase, the computed preferredLocations below | ||
| // become diminishingly small : so we might need to look at alternate strategies to alleviate this. | ||
| // If there are no (or very small number of preferred locations), we will end up transferred the blocks to 'any' node in the | ||
| // cluster - paying with n/w and cache cost. | ||
| // Maybe pick a node which figures max amount of time ? | ||
| // Choose node which is hosting 'larger' of some subset of blocks ? | ||
| // Look at rack locality to ensure chosen host is atleast rack local to both hosting node ?, etc (would be good to defer this if possible) | ||
| val splits = s.asInstanceOf[ZippedPartitionsPartition].partitions | ||
| val rddSplitZip = rdds.zip(splits) | ||
|
|
||
| // exact match. | ||
| val exactMatchPreferredLocations = rddSplitZip.map(x => x._1.preferredLocations(x._2)) | ||
| val exactMatchLocations = exactMatchPreferredLocations.reduce((x, y) => x.intersect(y)) | ||
|
|
||
| // Remove exact match and then do host local match. | ||
| val exactMatchHosts = exactMatchLocations.map(Utils.parseHostPort(_)._1) | ||
| val matchPreferredHosts = exactMatchPreferredLocations.map(locs => locs.map(Utils.parseHostPort(_)._1)) | ||
| .reduce((x, y) => x.intersect(y)) | ||
| val otherNodeLocalLocations = matchPreferredHosts.filter { s => !exactMatchHosts.contains(s) } | ||
|
|
||
| otherNodeLocalLocations ++ exactMatchLocations | ||
| val parts = s.asInstanceOf[ZippedPartitionsPartition].partitions | ||
| val prefs = rdds.zip(parts).map { case (rdd, p) => rdd.preferredLocations(p) } | ||
| // Check whether there are any hosts that match all RDDs; otherwise return the union | ||
| val exactMatchLocations = prefs.reduce((x, y) => x.intersect(y)) | ||
| if (!exactMatchLocations.isEmpty) { | ||
| exactMatchLocations | ||
| } else { | ||
| prefs.flatten.distinct | ||
| } | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there are both "host:port" and "host" preferred locations, this will result in loosing the "host" preferred locations - right ? Same applies to other similar changes ...
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I've decided not to allow host:port in preferred locations. Instead, Tasks that have a preferred executor can pass an executorID as part of their TaskLocation object.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, that is an excellent change ! had not noticed that |
||
|
|
||
| override def clearDependencies() { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very sure about this - iirc this needs to be hostport - since we can have multiple executors in the same node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used to set the host below in this file (no other references), so it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove references to host also ?
iirc we can infer the host from the executorId right ? This will remove all references to host when we can avoid it ...