Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@ash211
Copy link

@ash211 ash211 commented Mar 8, 2017

Re-upped version of #89

mccheah and others added 30 commits March 8, 2017 10:18
- Don't hold the raw secret bytes
- Add CPU limits and requests
The build process fails ScalaStyle checks otherwise.
* Use tar and gzip to archive shipped jars.

* Address comments

* Move files to resolve merge
* Use alpine and java 8 for docker images.

* Remove installation of vim and redundant comment
* Error messages when the driver container fails to start.

* Fix messages a bit

* Use timeout constant

* Delete the pod if it fails for any reason (not just timeout)

* Actually set submit succeeded

* Fix typo
* Documentation for the current state of the world.

* Adding navigation links from other pages

* Address comments, add TODO for things that should be fixed

* Address comments, mostly making images section clearer

* Virtual runtime -> container runtime
#20)

* Development workflow documentation for the current state of the world.

* Address comments.

* Clarified code change and added ticket link
* Added service name as prefix to executor pods to be able to tell them apart from kubectl output

* Addressed comments
* Add kubernetes profile to travis yml file

* Fix long lines in CompressionUtils.scala
* Improved the example commands in running-on-k8s document.

* Fixed more example commands.

* Fixed typo.
* Support custom labels on the driver pod.

* Add integration test and fix logic.

* Fix tests

* Fix minor formatting mistake

* Reduce unnecessary diff
* A number of small tweaks to the MVP.

- Master protocol defaults to https if not specified
- Removed upload driver extra classpath functionality
- Added ability to specify main app resource with container:// URI
- Updated docs to reflect all of the above
- Add examples to Docker images, mostly for integration testing but
could be useful for easily getting started without shipping anything

* Add example to documentation.
* Support setting the driver pod launching timeout.

And increase the default value from 30s to 60s. The current value of
30s is kind of short for pulling the image from public docker registry
plus the container/JVM start time.

* Use a better name for the default timeout.
* Use "extraTestArgLine" to pass extra options to scalatest.

Because the "argLine" option of scalatest is set in pom.xml and we can't
overwrite it from the command line.

Ref #37

* Added a default value for extraTestArgLine

* Use a better name.

* Added a tip for this in the dev docs.
@ash211
Copy link
Author

ash211 commented Mar 8, 2017

First release branch reorganization

This PR creates the new structure for release branches. The problem we have with the current branch is that it's based off a random commit of Apache master, rather than a specific release. This PR creates a new branch structure off the latest Apache release (2.1.0) for improved stability.

I propose we merge this branch, create another branch on the resulting commit named branch-2.1-kubernetes-develop, then redirect all open PRs from going into the k8s-support-alternate-incremental branch to going into branch-2.1-kubernetes-develop

This way we have two branches, the release branch for stabilization, and the dev branch for ongoing work.

  • branch-2.1-kubernetes what we'll tag the first release off of, build docker images of, etc
  • branch-2.1-kubernetes-develop what we'll continue merging ongoing work into

Then as we merge new PRs into branch-2.1-kubernetes-develop, if it's a stabilization/docs PR that we want in the release, we additionally cherry pick the commit into a new PR against branch-2.1-kubernetes and merge after build passes (since CR already happened on the other branch). This follows the pattern that Apache Spark does with master vs branch-2.1 vs branch-2.0 etc.

Additionally I'd suggest that when this PR merges we use that to cut the first alpha for socialization at GCP Next.

Does this make sense @foxish @mccheah @ssuchter @lins05 @iyanuobidele @erikerlandson ?

@ash211 ash211 mentioned this pull request Mar 8, 2017
11 tasks
@ssuchter
Copy link
Member

ssuchter commented Mar 8, 2017 via email

@ash211
Copy link
Author

ash211 commented Mar 8, 2017

@kimoonkim / @cvpatel does anything jump out to you about those test failures? Is it problematic to run them on this different branch?

@kimoonkim
Copy link
Member

@ash211

The Travis builds are complaining about resource-managers/kubernetes/core/pom.xml:

[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR] The project org.apache.spark:spark-kubernetes_2.11:2.2.0-SNAPSHOT (/home/travis/build/apache-spark-on-k8s/spark/resource-managers/kubernetes/core/pom.xml) has 5 errors
[ERROR] 'dependencies.dependency.version' for com.netflix.feign:feign-core:jar is missing. @ org.apache.spark:spark-kubernetes_2.11:[unknown-version], /home/travis/build/apache-spark-on-k8s/spark/resource-managers/kubernetes/core/pom.xml, line 55, column 17
[ERROR] 'dependencies.dependency.version' for com.netflix.feign:feign-okhttp:jar is missing. @ org.apache.spark:spark-kubernetes_2.11:[unknown-version], /home/travis/build/apache-spark-on-k8s/spark/resource-managers/kubernetes/core/pom.xml, line 59, column 17
[ERROR] 'dependencies.dependency.version' for com.netflix.feign:feign-jackson:jar is missing. @ org.apache.spark:spark-kubernetes_2.11:[unknown-version], /home/travis/build/apache-spark-on-k8s/spark/resource-managers/kubernetes/core/pom.xml, line 63, column 17
[ERROR] 'dependencies.dependency.version' for com.netflix.feign:feign-jaxrs:jar is missing. @ org.apache.spark:spark-kubernetes_2.11:[unknown-version], /home/travis/build/apache-spark-on-k8s/spark/resource-managers/kubernetes/core/pom.xml, line 67, column 17
[ERROR] 'dependencies.dependency.version' for com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar is missing. @ org.apache.spark:spark-kubernetes_2.11:[unknown-version], /home/travis/build/apache-spark-on-k8s/spark/resource-managers/kubernetes/core/pom.xml, line 77, column 17

I remember we removed some dependencies from the top project pom.xml recently. Maybe that's the root cause?

@foxish
Copy link
Member

foxish commented Mar 8, 2017

+1 for the strategy. It seems like the top-level POM in this PR has the versions of those dependencies that the CI reports as missing. I'm not sure why it is reporting errors.

@foxish
Copy link
Member

foxish commented Mar 8, 2017

Can someone restart the integration test? Looks like minikube crashed.

+ ./resource-managers/kubernetes/integration-tests/target/minikube-bin/linux-amd64/minikube status
minikubeVM: Does Not Exist
localkube: N/A
[PR-spark-k8s-integration-test] $ /bin/sh -xe /tmp/hudson4547463082719487775.sh
+ ./resource-managers/kubernetes/integration-tests/target/minikube-bin/linux-amd64/minikube stop
Stopping local Kubernetes cluster...
Error stopping machine:  Error loading host: minikube: Host does not exist: "minikube"
+ true

@cvpatel
Copy link
Member

cvpatel commented Mar 8, 2017

rerun integration test please

@cvpatel
Copy link
Member

cvpatel commented Mar 8, 2017

@foxish Anyone should be able to rerun the test by adding a comment "rerun integration test please"

The minikube commands are just a red-herring, because it tries to run stop and delete whenever there is a failure. But the main issue here is the pom.xml issue.

[INFO] Scanning for projects...
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-kubernetes_2.11:[unknown-version]: Could not find artifact org.apache.spark:spark-parent_2.11:pom:2.2.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 20, column 11
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-docker-minimal-bundle_2.11:[unknown-version]: Could not find artifact org.apache.spark:spark-parent_2.11:pom:2.2.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 21, column 11
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-kubernetes-integration-tests_2.11:[unknown-version]: Could not find artifact org.apache.spark:spark-parent_2.11:pom:2.2.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 20, column 11
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-kubernetes-integration-tests-spark-jobs_2.11:[unknown-version]: Could not find artifact org.apache.spark:spark-parent_2.11:pom:2.2.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 20, column 11
[FATAL] Non-resolvable parent POM for org.apache.spark:spark-kubernetes-integration-tests-spark-jobs-helpers_2.11:[unknown-version]: Could not find artifact org.apache.spark:spark-parent_2.11:pom:2.2.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 20, column 11
@

@foxish
Copy link
Member

foxish commented Mar 8, 2017 via email

@foxish
Copy link
Member

foxish commented Mar 8, 2017

Expecting #178 to fix it.

foxish and others added 2 commits March 13, 2017 17:46
* Fix pom versioning

* fix k8s versions in pom

* Change pom string to 2.1.0-k8s-0.1.0-SNAPSHOT
@foxish
Copy link
Member

foxish commented Mar 15, 2017

Any commits this is missing w.r.t. k8s-support-alternate-incremental? Maybe we should do a fast-fwd if nothing that was post-alpha has merged since.

* [MINOR][BUILD] Fix lint-check failures and javadoc8 break

## What changes were proposed in this pull request?

This PR proposes to fix lint-check failures and javadoc8 break.

Few errors were introduced as below:

**lint-check failures**

```
[ERROR] src/test/java/org/apache/spark/network/TransportClientFactorySuite.java:[45,1] (imports) RedundantImport: Duplicate import to line 43 - org.apache.spark.network.util.MapConfigProvider.
[ERROR] src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java:[255,10] (modifier) RedundantModifier: Redundant 'final' modifier.
```

**javadoc8**

```
[error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamingQueryProgress.java:19: error: bad use of '>'
[error]  *                   "max" -> "2016-12-05T20:54:20.827Z"  // maximum event time seen in this trigger
[error]                             ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamingQueryProgress.java:20: error: bad use of '>'
[error]  *                   "min" -> "2016-12-05T20:54:20.827Z"  // minimum event time seen in this trigger
[error]                             ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamingQueryProgress.java:21: error: bad use of '>'
[error]  *                   "avg" -> "2016-12-05T20:54:20.827Z"  // average event time seen in this trigger
[error]                             ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamingQueryProgress.java:22: error: bad use of '>'
[error]  *                   "watermark" -> "2016-12-05T20:54:20.827Z"  // watermark used in this trigger
[error]
```

## How was this patch tested?

Manually checked as below:

**lint-check failures**

```
./dev/lint-java
Checkstyle checks passed.
```

**javadoc8**

This seems hidden in the API doc but I manually checked after removing access modifier as below:

It looks not rendering properly (scaladoc).

![2016-12-16 3 40 34](https://cloud.githubusercontent.com/assets/6477701/21255175/8df1fe6e-c3ad-11e6-8cda-ce7f76c6677a.png)

After this PR, it renders as below:

- scaladoc
  ![2016-12-16 3 40 23](https://cloud.githubusercontent.com/assets/6477701/21255135/4a11dab6-c3ad-11e6-8ab2-b091c4f45029.png)

- javadoc
  ![2016-12-16 3 41 10](https://cloud.githubusercontent.com/assets/6477701/21255137/4bba1d9c-c3ad-11e6-9b88-62f1f697b56a.png)

Author: hyukjinkwon <[email protected]>

Closes apache#16307 from HyukjinKwon/lint-javadoc8.

(cherry picked from commit ed84cd0)

* Bring back needed import
@ash211
Copy link
Author

ash211 commented Mar 16, 2017

@foxish it has to be cherry picks not a fast forward, since a merge would pull in lots of things from apache master (the reason for the rebase).

I definitely plan on pulling in any activity merged into the k8s-support-alternate-incremental branch since this one got started

@cvpatel
Copy link
Member

cvpatel commented Mar 16, 2017

rerun integration test please

@cvpatel
Copy link
Member

cvpatel commented Mar 16, 2017

There have been intermittent issues with integration after running the full build and unit-tests on the same system... I have fixed some those issues by moving the integration tests builds to have a dedicated local maven repo... I'm thinking that the mvn install step was messing with the integration test builds...

In the last failure here it seems that the kubectl binary got uninstalled somehow... hmm not exactly sure how that happened, have reinstalled it and seems to pass after that.

foxish and others added 4 commits March 15, 2017 23:56
* Adding official alpha docker image to docs

* Reorder sections and create a specific one for "advanced"

* Provide limitations and instructions about running on GKE

* Fix title of advanced section: submission

* Improved section on running in the cloud

* Update versioning

* Address comments

* Address comments

(cherry picked from commit e5da90d)
* Add Apache license to a few files

* Ignore license check on META-INF service

(cherry picked from commit 2a61438)
* Allow providing an OAuth token for authenticating against k8s

* Organize imports

* Fix style

* Remove extra newline

* Use OAuth token data instead of a file.

(cherry picked from commit 1aba361)
@ash211
Copy link
Author

ash211 commented Mar 16, 2017

Just pushed the latest commits from https://github.com/apache-spark-on-k8s/spark/commits/k8s-support-alternate-incremental

If these 3 Jenkins builds pass, then I'd suggest we finally merge (NOT squash and merge) this PR

@ash211
Copy link
Author

ash211 commented Mar 16, 2017

@foxish comfortable with the merge? NOT squash and merge?

@foxish
Copy link
Member

foxish commented Mar 16, 2017

Yes, that would be best, given that we'll have the dev branch off of this one, and want history as it is.

@ash211 ash211 merged commit f9f5af4 into branch-2.1-kubernetes Mar 16, 2017
@ash211 ash211 deleted the prep-for-alpha-release branch March 16, 2017 08:29
@ash211
Copy link
Author

ash211 commented Mar 16, 2017

Great, thanks for everyone's contributions helping diagnose test failures!

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants