-
Notifications
You must be signed in to change notification settings - Fork 117
Support HDFS rack locality #350
Support HDFS rack locality #350
Conversation
|
FYI, the unit tests failure seem genuine. I'm looking at it. |
|
Thanks @kimoonkim! This looks awesome. Looking into this in detail shortly. |
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @kimoonkim !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha yes I've seen this too -- I think Hadoop 2.8.0 lowered this log level. What version of are you testing against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using 2.7. Good to know 2.8 fixed this (YARN-3350)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put sc as a parameter into a RackResolverUtil constructor? I'm hoping to be able to get rid of the rackResolverUtil.init method since that class has a two-step initialization step with it (instantiate class, call .init)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks for the suggestion. I wanted to do this earlier but didn't try hard enough :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this cast once in the constructor (outside this method) ? if this is cheap in the JVM JIT then maybe no need to avoid frequent casts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. FYI it appears this has to be done in the separate init method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if/else could be turned into a match which might be cleaner in scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work on keeping this speedy for non-HDFS users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change needed? are we parallelizing tests and running multiple drivers in the same JVM at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this project, we have KubernetesTaskSetManagerSuite and KubernetesTaskSchedulerImplSuite. The two suites run inside the single JVM. We are still running one suite at a time sequentially. But we are creating two SparkContext instances. Without this option, the second suite fails because only one SparkContext instance is allowed. I was hoping to prevent that.
But I think there is a better option, which is just to call SparkContext.clearActiveContext(). Switched to that.
|
LGTM! Thanks @kimoonkim, that looks good, and appears to handle both hdfs nodes and executor pods well. Question - for the executor pods, how do you see the script to resolve rack-info working? Could it have access to more than just the Pod IP to find what rack it belongs to? |
|
@foxish Good question. My understanding is that executors do not call the topology plugin. Only the driver will consult with the topology plugin to decide which executor, hopefully a rack-local one, should receive a new task. When an executor reads a HDFS block for the new task, it will then simply use Hadoop library code that sends a RPC request to the namenode. The namenode will consider the list of datanodes that have copies of the block. And the namenode asks the topology plugin which datanodes are better. When the namenode returns the list of datanodes, it sort them in the locality order. (node local to rack local to remote) FYI, the HDFS on kubernetes we have does not support configuration of the topology plugin yet in the namenode helm chart. But I intend to do it soon. The pod IP question on the namenode side is not as important as we originally thought. Because most k8s network plugins do NAT and the namenode sees the k8s cluster node IPs. (The pod IP issue on the namenode side only manifests in the kubenet on GKE) For details, please see this kubernetes-HDFS/topology/README.md |
mccheah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this took awhile to review. I have some suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Option.map.getOrElse instead of matching on isEmpty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks for the suggestion. Found and fixed a minor bug thanks to that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be done in a before block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be put in a before block. But then I need to put val sc = new SparkContext(...) also inside the block because SparkContext.clearActiveContext() should be called before a new SparkContext is created. Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use a mock SparkContext here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about the possibility and tried a bit. But then I realized SparkContext is not friendly to mock. It just has too many methods returning too many objects. I'll have to mock many of those that the test subject class happens to interact with. And if any of interaction changes, the test will break.
I think the test maintenance cost becomes too much. I found other Spark core unit tests avoid using mock probably for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unusual to me to be mocking the RackResolverUtil. It seems to be part of the core KubernetesTaskSchedulerImpl because the RackResolverUtil is a nested private class.
If we indeed want to be testing these separately then the architecture should reflect as such:
- Extract
RackResolverUtilto another class, - Put a
traiton top of theRackResolverUtil, - Inject an instance into the
KubernetesTaskSchedulerImplwhen we create it, - Write a separate test for
RackResolverUtil.
If we don't want to test these separately then we should create a real RackResolveUtil and test the top level methods accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It started as a pure wrapper of RackResolver, then I ended up adding a few business logics. I like the suggestion, but I'll have to think about this a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. New code looks better. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this can't be a real InetAddressUtil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can't be then move InetAddressUtil to its own file and place a trait over it. The real InetAddressUtil could probably be an object that extends the trait in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to mock it so unit tests don't call real DNS and potentially get influenced by the responses. The trail approach sounds good, I'll probably try in the next patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again - should this be being done in a before block? Also, can we use a mock SparkContext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the answer above.
|
@mccheah Thanks for the review. Addressed comments. PTAL. |
|
I was trying to retarget this to branch-2.2-kubernetes using github UI. Then I realized that there will be too many diffs in this PR. Maybe I should retarget to branch-2.2-kubernetes and merge changes from branch-2.2-kubernetes? @foxish Any better suggestion? |
d9de98e to
2e49e48
Compare
|
Ok. Retargeted to branch-2.2-kubernetes after rebasing my branch. |
|
Merged with |
|
@mccheah Can you please take a look at this PR? Perhaps, this is ready to merge after another look. |
* Support HDFS rack locality * Fix unit tests * Address review comments * Address some review comments * Use traits for InetAddress and RackResolver util classes * Disables expensive DNS lookup by default
Closes #349 and #206.
@ash211 @foxish
Supports HDFS rack locality by implementing
getRackForHostin KubernetesTaskSchedulerImplAdded unit tests.
Also did manual testing using a dummy topology script that always returns a dummy rack name, "/rack0".
The driver log shows a small number of
RACK_LOCALtasks, which used to beANYtasks. (The majority of tasks are stillNODE_LOCALtasks)The job was
HdfsTest.spark-defaults.confspecified the dummy topology script:The dummy script
print_rack.shI added the script to the driver docker image manually.