Fix pom versions #178

foxish · 2017-03-08T19:50:36Z

No description provided.

ash211

+1 to merging when this passes

@mccheah take a look?

foxish · 2017-03-08T20:05:46Z

@cvpatel, any way to cancel the old integration test build and run this one?

cvpatel · 2017-03-08T20:12:44Z

@foxish Unfortunately, there is no public way to cancel the build after it starts. If there are multiple commits before the build starts, it should run only one with the latest. For now, I manually cancelled it.

ash211 · 2017-03-08T20:28:13Z

Looks like the latest build ran out of memory in the DistributedSuite?

https://travis-ci.org/apache-spark-on-k8s/spark/jobs/209099308

DistributedSuite:
- task throws not serializable exception
- local-cluster format
- simple groupByKey
- groupByKey where map output sizes exceed maxMbInFlight
- accumulators
- broadcast variables
- repeatedly failing task
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x000000075e500000, 426246144, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 426246144 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/travis/build/apache-spark-on-k8s/spark/core/hs_err_pid4903.log

foxish · 2017-03-08T20:30:26Z

This is odd. Maybe a travis issue? I also see:

$ dev/lint-java
Using `mvn` from path: /home/ramanathana/Install/apache-maven-3.3.9/bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java:[255,10] (modifier) RedundantModifier: Redundant 'final' modifier.

I don't think we actually touch that file.

foxish · 2017-03-08T20:32:37Z

I also see that same linter error when running on branch-2.1-kubernetes.

mccheah · 2017-03-08T20:32:59Z

resource-managers/kubernetes/core/pom.xml

Should these versions also not reflect the kubernetes branch?

I am not sure. The underlying version of spark is still 2.1.0, which is why I thought that was appropriate.

The same version string as the image makes sense?
2.1.0-k8s-support-0.1.0-alpha.1?

I think so. This is particularly because we likely want to publish these libraries to a Maven repository as well - something we still need to discuss the specifics of. One example use case is for developing custom implementations of DriverServiceManager which would require projects to take a dependency on spark-kubernetes. But if we publish these we need to choose a version string on the pom files that differs from what Spark is already publishing to maven central.

That's a good point. Do we switch over all the POMs to publish our version? Including say - sql, ml and other parts we did not touch?

I think we want everything to be synchronized, yes.

mccheah · 2017-03-08T21:27:43Z

+1 when the build succeeds.

foxish · 2017-03-08T21:28:25Z

@mccheah The linter error still seems to exist in a file that we didn't touch. I think that will cause the travis build to fail.

mccheah · 2017-03-08T21:29:25Z

Let's get the build to that point and make sure it's not failing because of something directly related to this change. A build that fails with a complaint about a bad version string would be worth catching, for example.

kimoonkim · 2017-03-08T22:12:42Z

I see the new Travis unit test build failing for a similar reason:

ExternalShuffleServiceSuite:
- groupByKey without compression
- shuffle non-zero block size
- shuffle serializer
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000742e00000, 17301504, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 17301504 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/travis/build/apache-spark-on-k8s/spark/core/hs_err_pid4921.log

I don't have much clue why.

cvpatel · 2017-03-08T22:19:30Z

Integration test failing due to filenames > 100 chars

http://spark-k8s-jenkins.pepperdata.org:8080/job/PR-spark-k8s-integration-test/56/consoleFull#124214759204b09c7b-0d94-4ce5-8a08-8f343248b3d8

java.lang.RuntimeException: file name 'spark-kubernetes-integration-tests-spark-jobs-helpers_2.11-2.1.0-k8s-support-0.1.0-alpha.1-SNAPSHOT.jar' is too long ( > 100 bytes)�
�  at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.handleLongName(TarArchiveOutputStream.java:674)�
�  at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.putArchiveEntry(TarArchiveOutputStream.java:275)�
�  at org.apache.spark.deploy.rest.kubernetes.CompressionUtils$$anonfun$2$$anonfun$apply$2$$anonfun$apply$4$$anonfun$apply$5.apply(CompressionUtils.scala:77)�
�  at org.apache.spark.deploy.rest.kubernetes.CompressionUtils$$anonfun$2$$anonfun$apply$2$$anonfun$apply$4$$anonfun$apply$5.apply(CompressionUtils.scala:58)�
�  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)�
�  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)�
�  at org.apache.spark.deploy.rest.kubernetes.CompressionUtils$$anonfun$2$$anonfun$apply$2$$anonfun$apply$4.apply(CompressionUtils.scala:58)�
�  at org.apache.spark.deploy.rest.kubernetes.CompressionUtils$$anonfun$2$$anonfun$apply$2$$anonfun$apply$4.apply(CompressionUtils.scala:56)�
�  at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2494)�
�  at org.apache.spark.deploy.rest.kubernetes.CompressionUtils$$anonfun$2$$anonfun$apply$2.apply(CompressionUtils.scala:56)�
�  ...�

mccheah · 2017-03-08T22:34:16Z

Looks like we'll need a different version string for the poms. Not sure what to use here - maybe "2.1.0-k8s-0.1.0-SNAPSHOT"?

ash211 · 2017-03-09T01:25:50Z

Does that meet the 100 byte name limit?

foxish · 2017-03-13T19:05:57Z

Changed the POM string. Integration tests are fine now, but travis is still seeing memory errors. Any ideas @cvpatel @ssuchter

mccheah · 2017-03-13T20:53:30Z

Just retriggered the build, hopefully it's healthy now.

cvpatel · 2017-03-13T21:01:03Z

I had seen the resource related failures earlier, especially around changing the included profiles and goals...

Lets see if the retrigger fixes the issue, if it continues to have this type of issues we could investigate having these run via jenkins.

mccheah · 2017-03-13T21:02:15Z

@cvpatel can we increase the memory available to Travis?

kimoonkim · 2017-03-13T23:44:50Z

The latest Travis build did not see OOM (, which is a good thing). It just saw ExternalShuffleServiceSuite hanging. Maybe it's flaky and we can just blacklist it:

ExternalShuffleServiceSuite:
- groupByKey without compression
- shuffle non-zero block size
- shuffle serializer
- zero sized blocks
- zero sized blocks without kryo
- shuffle on mutable pairs
- sorting on mutable pairs
- cogroup using mutable pairs
- subtract mutable pairs
- sort with Java non serializable class - Kryo
- sort with Java non serializable class - Java
- shuffle with different compression settings (SPARK-3426)
- [SPARK-4085] rerun map stage if reduce stage cannot find its local shuffle file
- metrics for shuffle without aggregation
- metrics for shuffle with aggregation
- multiple simultaneous attempts for one task (SPARK-8029)
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

cvpatel · 2017-03-13T23:48:56Z

@mccheah Unfortunately no, we are already running at the largest possible container@ 7.5g. more details

@foxish The second test seems to pass the build but fails the linter for both Java and Scala.

foxish · 2017-03-13T23:50:33Z

@kimoonkim That's a good idea.
I think we've lost some of the flake fixes which came in after 2.1 in this branch which is forked off the 2.1 release, which is why we might need some more tests to be blacklisted.

As for the linter, it seems to be failing on branch-2.1-kubernetes which is up-to-date with the upstream 2.1 release. Any ideas why this might be happening?

foxish · 2017-03-14T00:45:58Z

Since the integration test passes, I think we should merge this first and then fix the subsequent travis issues, such as #185.
Thoughts @mccheah @cvpatel @kimoonkim

mccheah · 2017-03-14T00:46:36Z

I'm ok with this

kimoonkim · 2017-03-14T01:12:55Z

SGTM.

cvpatel · 2017-03-14T01:22:41Z

Ditto. But seems like #185 is failing as well because of memory issues... going to start porting the unit-test to jenkins and see how it behaves there.

* Fix pom versioning * fix k8s versions in pom * Change pom string to 2.1.0-k8s-0.1.0-SNAPSHOT

foxish mentioned this pull request Mar 8, 2017

Prep for alpha release #177

Merged

ash211 reviewed Mar 8, 2017

View reviewed changes

mccheah reviewed Mar 8, 2017

View reviewed changes

Fix pom versioning

d041b98

foxish force-pushed the fix-alpha branch from 384ed02 to d041b98 Compare March 8, 2017 21:03

fix k8s versions in pom

6b7b336

Change pom string to 2.1.0-k8s-0.1.0-SNAPSHOT

7f0d406

kimoonkim mentioned this pull request Mar 14, 2017

Exclude flaky ExternalShuffleServiceSuite from Travis #185

Merged

foxish merged commit 3636939 into prep-for-alpha-release Mar 14, 2017

foxish deleted the fix-alpha branch March 14, 2017 00:46

foxish added a commit that referenced this pull request Jul 24, 2017

Fix pom versions (#178)

a9f1d6e

* Fix pom versioning * fix k8s versions in pom * Change pom string to 2.1.0-k8s-0.1.0-SNAPSHOT

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 25, 2019

Merge pull request apache-spark-on-k8s#178 from palantir/rk/merge-again

4a5890d

puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019

Fix pom versions (apache-spark-on-k8s#178)

fe32862

* Fix pom versioning * fix k8s versions in pom * Change pom string to 2.1.0-k8s-0.1.0-SNAPSHOT

Fix pom versions #178

Fix pom versions #178

Uh oh!

Conversation

foxish commented Mar 8, 2017

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

Uh oh!

foxish commented Mar 8, 2017

Uh oh!

cvpatel commented Mar 8, 2017

Uh oh!

ash211 commented Mar 8, 2017

Uh oh!

foxish commented Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

foxish commented Mar 8, 2017

Uh oh!

mccheah Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

foxish Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

foxish Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

foxish Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah commented Mar 8, 2017

Uh oh!

foxish commented Mar 8, 2017

Uh oh!

mccheah commented Mar 8, 2017

Uh oh!

kimoonkim commented Mar 8, 2017

Uh oh!

cvpatel commented Mar 8, 2017

Uh oh!

mccheah commented Mar 8, 2017

Uh oh!

ash211 commented Mar 9, 2017

Uh oh!

foxish commented Mar 13, 2017

Uh oh!

mccheah commented Mar 13, 2017

Uh oh!

cvpatel commented Mar 13, 2017

Uh oh!

mccheah commented Mar 13, 2017

Uh oh!

kimoonkim commented Mar 13, 2017

Uh oh!

cvpatel commented Mar 13, 2017

Uh oh!

foxish commented Mar 13, 2017

Uh oh!

foxish commented Mar 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mccheah commented Mar 14, 2017

Uh oh!

kimoonkim commented Mar 14, 2017

Uh oh!

cvpatel commented Mar 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

foxish commented Mar 8, 2017 •

edited

Loading

foxish commented Mar 14, 2017 •

edited

Loading