Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Aug 31, 2019

What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manual testing

@andygrove andygrove changed the base branch from master to branch-2.4 August 31, 2019 14:34
@SparkQA
Copy link

SparkQA commented Aug 31, 2019

Test build #4849 has finished for PR 25641 at commit 30940bc.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andygrove
Copy link
Member Author

When I run ./dev/test-dependencies.sh --replace-manifest I get:

[ERROR] Failed to execute goal on project spark-yarn_2.11: Could not resolve dependencies for project org.apache.spark:spark-yarn_2.11:jar:spark-786633: Could not find artifact jdk.tools:jdk.tools:jar:1.6 at specified path /usr/lib/jvm/java-11-openjdk-amd64/../lib/tools.jar -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :spark-yarn_2.11

I could use some guidance on how to apply this patch to the 2.4 branch.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 31, 2019

According to your error message, are you running it in JDK11? We need to do this in JDK8.

Could not find artifact jdk.tools:jdk.tools:jar:1.6 at specified path /usr/lib/jvm/java-11-openjdk-amd64/../lib/tools.jar

Apache Spark 2.x doesn't support JDK11 yet.

@dongjoon-hyun dongjoon-hyun changed the title SPARK-28921: Update kubernetes client to 4.4.2 for Spark 2.4 branch [SPARK-28921][BUILD][2.4] Update kubernetes client to 4.4.2 Aug 31, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28921][BUILD][2.4] Update kubernetes client to 4.4.2 [SPARK-28921][BUILD][K8S][2.4] Update kubernetes client to 4.4.2 Aug 31, 2019
@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Sep 1, 2019

@SparkQA
Copy link

SparkQA commented Sep 1, 2019

@SparkQA
Copy link

SparkQA commented Sep 2, 2019

Test build #109997 has finished for PR 25641 at commit 8364838.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 3, 2019

This PR is tested with EKS. Thank you, @andygrove .
Merged to branch-2.4.

$ kubectl version --short
Client Version: v1.15.3
Server Version: v1.13.10-eks-5ac0f1
$ aws ecr list-images --repository-name spark
{
    "imageIds": [
        {
            "imageDigest": "sha256:c92d634507aa8336c79cb094ba69083d9cb50f4a6f09259e3b0cb4b6bf1c5214",
            "imageTag": "PR-25640"
        },
        {
            "imageDigest": "sha256:0a57b8479a54b371621fee90a84126aa259a13edfc73a28c139046475ab604d1",
            "imageTag": "PR-25641"
        },
        {
            "imageDigest": "sha256:5318c3b9a2f1c85bae5d913c799d35a732b0b658f7500dbf733a94b5d8981552",
            "imageTag": "2.4.5-SNAPSHOT"
        },
        {
            "imageDigest": "sha256:a2a48304453c147ec2f049ea0b6c4dbadb625a0c8d76d4c1eb4f7cb3f134890c",
            "imageTag": "latest"
        }
    ]
}
$ echo $K8S_MASTER
https://9310EC45A37C51BCCF6BC12CDBFCBB61.sk1.us-west-2.eks.amazonaws.com

$ echo $IMAGE
095589911305.dkr.ecr.us-west-2.amazonaws.com/spark:PR-25641                                                                                                                                                                                                                     

$ bin/spark-submit \ 
  --master k8s://$K8S_MASTER \
  --deploy-mode cluster \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.executor.instances=1 \
  --conf spark.kubernetes.container.image=$IMAGE \
  local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5-SNAPSHOT.jar

dongjoon-hyun pushed a commit that referenced this pull request Sep 3, 2019
### What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

### Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manual testing

Closes #25641 from andygrove/SPARK-28921-2.4.

Authored-by: Andy Grove <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@andygrove andygrove deleted the SPARK-28921-2.4 branch September 3, 2019 14:09
skonto pushed a commit to lightbend/spark that referenced this pull request Sep 4, 2019
### What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

### Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manual testing

Closes apache#25641 from andygrove/SPARK-28921-2.4.

Authored-by: Andy Grove <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 446ffb1)
@Jeffwan
Copy link
Contributor

Jeffwan commented Sep 6, 2019

Spark on all platform version are affected. Does it mean we have a spark version available to handle this issue? Looks like we have to build our own spark to use?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 6, 2019

@Jeffwan . I'm wondering how you are using EKS now.
Apache Spark doesn't provide pre-built docker images and there is only a reference Docker file as an example in the binary release. You need to build your own docker images always.

Since you works for AWS, I think you know it better than me. 😄

@Jeffwan
Copy link
Contributor

Jeffwan commented Sep 8, 2019

@dongjoon-hyun Thanks for coming back to me.
Yeah. If we check out github.com/apache/spark branch-2.4 and build the image, it's definitely ok since it has patches and other changes from 2.4.4 to 2.4.5-SNAPSHOT

Most of the users I think they download spark from official website (https://spark.apache.org/downloads.html. ) and build image from pre-built binaries. In this case, it's not working.. I am just curious the PATCH version release cycle. How does community determine if we need to release a new patch version?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 9, 2019

@Jeffwan . First of all, could you tell me if EKS provides a fallback option (to launch the previous working EKS 1.13.x versions) for that outage? If not, I'm very sorry about that because I'm one of the EKS customers.

For the outage you mentioned, I believe you know that the other K8s users (including AWS customers who are using their own K8s services) with old K8s versions are not suffering from this. There are many users with various environments unlikely managed EKS. Given that, the above production outage from EKS environment may be considered as a hidden pitfall of managed service providers due to lack of the downstream testing (or lack of consideration on customer use cases).

More importantly, you should not assume that customers always live on the latest Spark releases. In the customer production environments, there are many used versions simultaneously; 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4 (the last EOL release), 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4. So, new release cannot recover all production-down issues because old version' users have their reasons to stick to that. This is the reason why I think EKS had better provide a fallback option. That will be a better way to support all the customers. You may recommend to upgrade Apache Spark versions, but what will happen if this kind of outages happens again and again frequently? We have many downstreams like EKS. As you know, technically, Apache Spark community didn't make a decision to break the customer production environments like this.

Lastly, Apache Spark community is an independent community which has own release cycle and 2.4.4 is announced this Month and 2.4.3 was May 9th. We will release the next version when we have enough reasons to release.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 9, 2019

As @andygrove mentioned the original PR on master branch, this is a regression of EKS which decided to block all existing Apache Spark users. And, as you see, we are doing our best here.

The regression is with EKS, not with Spark ...

smrtl pushed a commit to nagra-insight/spark that referenced this pull request Sep 9, 2019
### What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

### Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manual testing

Closes apache#25641 from andygrove/SPARK-28921-2.4.

Authored-by: Andy Grove <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@andygrove andygrove restored the SPARK-28921-2.4 branch September 10, 2019 15:00
@Jeffwan
Copy link
Contributor

Jeffwan commented Sep 10, 2019

@dongjoon-hyun Yes.. This is the first time we do fallback for some customers. If there's a big impact on customer's business, we help fallback specific clusters.. We know users complain jenkins or spark which depends on fabric k8s SDK not working after CVE patch..

Thanks a lot for the details. Totally understand it's just related to some managed services like EKS. I am trying to understand more on the release and we can take quick actions with reasonable mitigation options in the future.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 10, 2019

Thank you for your understanding, @Jeffwan . I know you guys have own Spark versions, but we want to collaborate you guys more before the customers' production-down happens. The current situation is not good for all of us. If you are interested,

  1. It would be great if you guys can participate Apache Spark RC vote at the early stages (RC1).
  2. It would be great if you guys can file a Spark JIRA issue before rolling out new EKS versions. If there is one integration test with Apache Spark in your environment, it will be easily discovered.

@Jeffwan
Copy link
Contributor

Jeffwan commented Sep 10, 2019

@dongjoon-hyun

  1. It would be great if you guys can participate Apache Spark RC vote at the early stages (RC1).

I am still new to the community, could I know where I can participate the RC vote? I'd love to attend community meeting.

  1. It would be great if you guys can file a Spark JIRA issue before rolling out new EKS versions. If there is one integration test with Apache Spark in your environment, it will be easily discovered.

This would be great! I will try to setup it up and detect the problem earlier as we can. Thanks for the suggestion and I will go check docs to see how to make it.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 11, 2019

Thank you so much, @Jeffwan .

In Apache project community, everything should be processed on mailing list. That is the only official channel for vote, too. So, you need to subscribe dev@spark mailing list. Then you will receive that information. For example, there were three 2.4.4 RC. Each RC vote is open for 3 days to receive the community verifications.

You can browse the old emails via https://lists.apache.org/[email protected] . Please see the emails with the titles starting [VOTE], [VOTE][RESULT] and [ANNOUNCE].

For the general contribution guide, please see https://spark.apache.org/contributing.html .

@Jeffwan
Copy link
Contributor

Jeffwan commented Sep 11, 2019

@dongjoon-hyun This is great! Thank you so much. We will get involved in the community and make more contributions then.

rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
### What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

### Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manual testing

Closes apache#25641 from andygrove/SPARK-28921-2.4.

Authored-by: Andy Grove <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
rodrigovedovato pushed a commit to elo7/spark that referenced this pull request Sep 20, 2019
### What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

### Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manual testing

Closes apache#25641 from andygrove/SPARK-28921-2.4.

Authored-by: Andy Grove <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 446ffb1)
nikunjb pushed a commit to nikunjb/spark that referenced this pull request Oct 1, 2019
### What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

### Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manual testing

Closes apache#25641 from andygrove/SPARK-28921-2.4.

Authored-by: Andy Grove <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
yidetu pushed a commit to yidetu/spark that referenced this pull request Oct 10, 2019
### What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

### Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manual testing

Closes apache#25641 from andygrove/SPARK-28921-2.4.

Authored-by: Andy Grove <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@HarryWeppner
Copy link

@dongjoon-hyun do you know yet when 2.4.5 will be released?

@dongjoon-hyun
Copy link
Member

Hi, @HarryWeppner .
Please see the release dates between versions on our website. We have a release cadence.

@HarryWeppner
Copy link

@dongjoon-hyun I had seen https://spark.apache.org/versioning-policy.html, which states that

Maintenance releases happen as needed in between feature releases

This is really critical as cloud providers upgraded their K8s versions and there is no released Spark version that works!

@dongjoon-hyun
Copy link
Member

@HarryWeppner . Can you make a promise that that provider doesn't break anything in the next month again? Are you going to ask Apache Spark releases whenever one of the cloud provider doesn't care about their customers? Sadly, I'm also one of the customer of that vendor, so I understand why you are frustrated.

However, as you know, Apache Spark didn't break anything there. You had better file an issue on that company if it breaks something without providing fallbacks.

This is really critical as cloud providers upgraded their K8s versions and there is no released Spark version that works!

In addition, please see the above discussion here. There is an efforts to try collaboration to reduce the chance of this kind of surprises which no one wants. That's the best effort the Apache Spark community can do. We works voluntarily and willing for the community. Not for that company.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 23, 2019

One more thing. This PR is already superseded by #26152 (Bump K8S client version to 4.6.1) 5 days ago. I guess you want that PR instead of this already :)

@HarryWeppner
Copy link

@dongjoon-hyun point well taken - it would still be very valuable to know about an approximate timeline for a 2.4.5 maintenance release. Thanks!

@dongjoon-hyun
Copy link
Member

Until now, there is no plan. However, according to https://spark.apache.org/news, we can guess like the following.

  • 2.4.3: May 8, 2019
  • 2.4.4: September 1, 2019
  • 2.4.5: Jan 2020? (This is reasonable. At least, I can volunteer for the release manager on January.)

3.0.0 RC1 also have the similar ETA (https://spark.apache.org/versioning-policy.html). I believe you will have 3.0.0 and 2.4.5 together at least early 2020.

BTW, K8s dev cycle is fast. Although Apache Spark 2.4.5 will get the latest version of K8s client at that time instead of 4.6.1, that is just a best effort like 2.4.4. For the missed things, Apache Spark 2.4.6 will catch up later.

@HarryWeppner
Copy link

Fyi, there is a relatively simple workaround a colleague found, which is to explicitly add port :443 (when using https as a scheme). That's enough to make the K8s client library work again.

@dongjoon-hyun
Copy link
Member

Thank you for sharing the workaround, @HarryWeppner !

@dongjoon-hyun
Copy link
Member

Hi, All.
Could you participate Apache Spark 3.0.0-preview RC2 vote after testing your K8s environments?

@rvarlikli
Copy link

Fyi, there is a relatively simple workaround a colleague found, which is to explicitly add port :443 (when using https as a scheme). That's enough to make the K8s client library work again.

can you give us more specifis around this solution? where do we need to change exactly ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants