-
Notifications
You must be signed in to change notification settings - Fork 117
Use master environment variable from KubernetesClusterSchedulerBackend #117
Conversation
|
@foxish this fixes the issue on my machine. |
| var clientConfigBuilder = new ConfigBuilder() | ||
| .withApiVersion("v1") | ||
| .withMasterUrl(kubernetesMaster) | ||
| .withMasterUrl(s"$urlScheme://$kubernetesHost:$kubernetesPort") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be https always. Even if the user uses the insecure endpoint to access the apiserver from outside the cluster, KUBERNETES_SERVICE_PORT should point to the secure endpoint.
|
I'm a bit wary about this change because the most robust way here is to use the dns name and let it be resolved. The comments in https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L412-L416 read: Have you tried resolving the DNS name in a loop within the driver pod? Does it always fail in your environment? |
|
The issue has reproduced every time I've run the integration tests. I haven't tried resolving in a loop yet. |
|
If it is specific to minikube 0.16.0, I think we should report it and switch to the newer version when it's fixed. |
|
I also saw it in Minikube 0.15.0. I also just tried resolving in a loop - basically trying the failing call to |
|
I wasn't testing cross namespace. The issue was that it is no longer automatically looking up I should've caught this earlier. Apologies. |
|
Closing in favor of #118 which I've confirmed fixes the issue. |
…-palantir4-k8s
### What changes were proposed in this pull request? Updated kubernetes client. ### Why are the changes needed? https://issues.apache.org/jira/browse/SPARK-27812 https://issues.apache.org/jira/browse/SPARK-27927 We need this fix fabric8io/kubernetes-client#1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in apache#25785 ### Does this PR introduce any user-facing change? Nope, it should be transparent to users ### How was this patch tested? This patch was tested manually using a simple pyspark job ```python from pyspark.sql import SparkSession if __name__ == '__main__': spark = SparkSession.builder.getOrCreate() ``` The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running ``` "OkHttp WebSocket https://10.96.0.1/..." apache-spark-on-k8s#121 prio=5 os_prio=0 tid=0x00007fb27c005800 nid=0x24b waiting on condition [0x00007fb300847000] "OkHttp WebSocket https://10.96.0.1/..." apache-spark-on-k8s#117 prio=5 os_prio=0 tid=0x00007fb28c004000 nid=0x247 waiting on condition [0x00007fb300e4b000] ``` This is caused by a bug on `kubernetes-client` library, which is fixed on the version that we are upgrading to. When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.3 is restored and both processes terminate successfully Closes apache#26093 from igorcalabria/k8s-client-update. Authored-by: igor.calabria <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
Closes #112.