Skip to content

Conversation

@foxish
Copy link
Member

@foxish foxish commented Dec 20, 2017

Removed the MINIKUBE_TEST_BACKEND requirements for the SparkPI tests and some deprecated info.

Verified running on cloud with:

mvn clean -Ddownload.plugin.skip=true integration-test  \
-Dspark-distro-tgz=/home/ramanathana/go-workspace/src/apache-spark-on-k8s/release/spark/dist/spark.tar.gz  \
-Dspark-dockerfiles-dir=/home/ramanathana/go-workspace/src/apache-spark-on-k8s/release/spark/dist/kubernetes/dockerfiles \
-DextraScalaTestArgs="-Dspark.kubernetes.test.master=k8s://https://... -Dspark.docker.test.driverImage=spark-driver -Dspark.docker.test.executorImage=spark-executor"

cc/ @kimoonkim @mccheah @liyinan926

Minor changes to old pom - creating a directory that needs to exist
Copy link
Member

@kimoonkim kimoonkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example command line, I see you used custom docker images like gcr.io/my-image/driver:latest. Are you assuming those docker images are pre-built? Is that a requirement to use the cloud?

I understand we won't be able to use dockerd inside mini-kube. But can we use dockerd that comes with the cloud? So we can still build docker images off a distro tarball we got?

<arguments>
<argument>-c</argument>
<argument>rm -rf spark-distro; mkdir spark-distro-tmp; cd spark-distro-tmp; tar xfz ${spark-distro-tgz}; mv * ../spark-distro; cd ..; rm -rf spark-distro-tmp</argument>
<argument>rm -rf spark-distro; mkdir spark-distro; mkdir spark-distro-tmp; cd spark-distro-tmp; tar xfz ${spark-distro-tgz}; mv * ../spark-distro; cd ..; rm -rf spark-distro-tmp</argument>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I see this adding mkdir spark-distro. But later we go inside a tmp dir, untar the distro and do mv * ../spark-distro.

The unpacked tarball has a top level dir, like spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9. Wouldn't the mv command create a subdir hierarchy we don't want, like spark-distro/spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my usage, I was actually seeing a failure without this create step. mv * ../spark-distro actually expects the directory to be created and present was my understanding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was the mv statement will rename the unpacked top level dir to be the ../spark-distro dir. I guess your tarball is not like my tarball :-) Can you check if your tarball creates a top-level dir?

Here's mine:

$ tar tvf spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9.tgz | head
drwxr-xr-x kimoonkim/staff   0 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/
drwxr-xr-x kimoonkim/staff   0 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/bin/
-rwxr-xr-x kimoonkim/staff 1089 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/bin/beeline
-rw-r--r-- kimoonkim/staff 1064 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/bin/beeline.cmd
-rwxr-xr-x kimoonkim/staff 1933 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/bin/find-spark-home
-rw-r--r-- kimoonkim/staff 2681 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/bin/find-spark-home.cmd
-rw-r--r-- kimoonkim/staff 1892 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/bin/load-spark-env.cmd
-rw-r--r-- kimoonkim/staff 2025 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/bin/load-spark-env.sh
-rwxr-xr-x kimoonkim/staff 2989 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/bin/pyspark
-rw-r--r-- kimoonkim/staff 1170 2017-12-18 13:09 spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9/bin/pyspark.cmd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, forgot to mention that the unpacked top-level dir is the only top level file. That's the precondition for mv * ../spark-distro to work:

/Tmp/spark-distro-tmp$ tar xfz ../spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9.tgz
~/Tmp/spark-distro-tmp$ ls
spark-2.3.0-SNAPSHOT-bin-20171218-772e4648d9


<profiles>
<profile>
<id>v2</id>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we need this v2 profile with duplicate plugin config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud environments will break from the minikube flow here and not need the pre-integration-test phase to run at all. It's not strictly needed, and I'm happy to change it if we have a way to invoke the integration-test step without running the pre-integration-test step.

cc @echarles

@foxish
Copy link
Member Author

foxish commented Dec 20, 2017

In your example command line, I see you used custom docker images like gcr.io/my-image/driver:latest. Are you assuming those docker images are pre-built? Is that a requirement to use the cloud?
I understand we won't be able to use dockerd inside mini-kube. But can we use dockerd that comes with the cloud? So we can still build docker images off a distro tarball we got?

I think we can use dockerd in our test infrastructure. My intent there was to have the flexibility to use a different step for building those docker images. If we're specifying the repo and the docker image anyway, it could but doesn't necessarily have to be built by maven correct?

@kimoonkim
Copy link
Member

I think we can use dockerd in our test infrastructure. My intent there was to have the flexibility to use a different step for building those docker images. If we're specifying the repo and the docker image anyway, it could but doesn't necessarily have to be built by maven correct?

If we want to skip image building, we can set -Dspark.docker.test.skipBuildImages=true, which is already supported. That and -Dspark.docker.test.*Image will allow people to use pre-built images.

But this should not be a requirement for using the cloud, IMO. It's still nice to be able to build Docker images as part of the integration test automation, especially in CI. To erase the doubt like "Was I using the right images for this test?" when the test fails.

@foxish
Copy link
Member Author

foxish commented Dec 20, 2017 via email

@kimoonkim
Copy link
Member

Yes, we want to avoid downloading minikube step for the cloud. The download plugin seems to support a skip option. Can you try using that instead of the profile?

mvn help:describe -Dplugin=com.googlecode.maven-download-plugin:download-maven-plugin -Ddetail | grep -A 5 skip | head -5
    skip (Default: false)
      User property: download.plugin.skip
      Whether to skip execution of Mojo

@foxish
Copy link
Member Author

foxish commented Dec 21, 2017

Works as expected. I'm going to try without the changes to the pom again with the skip enabled.

@foxish
Copy link
Member Author

foxish commented Dec 21, 2017

@kimoonkim, I'm getting the following when trying to unpack the tar.gz.

[INFO] --- exec-maven-plugin:1.4.0:exec (unpack-spark-distro) @ spark-kubernetes-integration-tests_2.11 ---
mv: target ‘../spark-distro’ is not a directory

I'm running:

mvn clean -Ddownload.plugin.skip=true integration-test  \
-Dspark-distro-tgz=/home/ramanathana/go-workspace/src/apache-spark-on-k8s/release/spark/dist/spark.tar.gz  \
-Dspark-dockerfiles-dir=/home/ramanathana/go-workspace/src/apache-spark-on-k8s/release/spark/dist/kubernetes/dockerfiles \
-DextraScalaTestArgs="-Dspark.kubernetes.test.master=k8s://https://... -Dspark.docker.test.driverImage=spark-driver -Dspark.docker.test.executorImage=spark-executor"

@foxish foxish merged commit f8a9dec into apache-spark-on-k8s:master Dec 22, 2017
@foxish foxish deleted the minor-fixes-and-new-phase branch December 22, 2017 00:21
@foxish
Copy link
Member Author

foxish commented Dec 22, 2017

In #6 (comment), I had an error in the way I built the tar.gz distro. It works as expected now.

@echarles
Copy link
Member

Side question Is this repo aimed to replace the current integration-tests?

From what I undersand, the answer is yes. I find the creation or download of the tgz distribution a slowing step as developer productivity (think about making the code change, building the dist, building the docker image and running tests... - Fine if the result is green, but if the test fails, it is going to become IMHO counter-productive).

If not, we could think about a way to have faster iteration with integration tests in the spark repo, and to have the full download... in this repo with code reuse (think about a spark-integration-test.jar)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants