Use tar and gzip to compress+archive shipped jars #2

mccheah · 2016-12-09T23:19:24Z

Augments #1 with compressing the user's jar uploads before they are sent over the wire. Note that while gzipping is the important part to compress things, there could be multiple possible choices over tar as the format for indicating "boundaries" between items in the stream. Encoding the file names is tricky to do by hand however so being able to get this for free using tar is helpful.

erikerlandson · 2016-12-12T19:17:17Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/CompressionUtils.scala

+          val usedFileNames = mutable.HashSet.empty[String]
+          for (path <- paths) {
+            val file = new File(path)
+            if (!file.isFile) {


Does leveraging the tar API make it easy to provide a directory, and have it send all the files underneath that directory? I can imagine that might be a nice feature, although it increases the likelihood of accidentally grabbing a lot of data that wasn't intended.

Related question: is useful to allow a configured hard limit on tarball size?

We could do this but we need to add the logic to recursively add the files in directories. I think what we have here will cover enough use cases; I recall that the jars API in general doesn't allow for adding directories.

I'm not sure about the usefulness of a configured limit mostly because the size that matters is the size after compression which could be difficult for application submitters to predict.

erikerlandson · 2016-12-12T19:18:05Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/CompressionUtils.scala

+
+  def unpackAndWriteCompressedFiles(
+      compressedData: TarGzippedData,
+      rootOutputDir: File): Seq[String] = {


Is root output directory a configuration parameter?

They're dynamically created in temp space.

erikerlandson · 2016-12-12T21:56:43Z

We can wait to see if anybody else has questions, but LGTM

ash211

I like it! Some requests for better comments and a warning log line but nothing significant

ash211 · 2017-01-07T00:16:12Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/CompressionUtils.scala

+            var deduplicationCounter = 1
+            while (usedFileNames.contains(resolvedFileName)) {
+              resolvedFileName = s"$nameWithoutExtension-$deduplicationCounter.$extension"
+              deduplicationCounter += 1


I'd be a fan of logging a warning here -- having multiple jars on a classpath with the same name is bad practice anyway, especially if their contents are different...

ash211 · 2017-01-07T00:19:51Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/CompressionUtils.scala

+
+private[spark] object CompressionUtils {
+  private val BLOCK_SIZE = 10240
+  private val RECORD_SIZE = 512


add comment that these are the defaults from TarArchiveOutputStream

ash211 · 2017-01-07T00:20:41Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/CompressionUtils.scala

+    val paths = mutable.Buffer.empty[String]
+    val compressedBytes = Base64.decodeBase64(compressedData.dataBase64)
+    if (!rootOutputDir.exists) {
+      rootOutputDir.mkdir


does this need to be a mkdir -p -- create intermediate directories?

ash211 · 2017-01-07T00:23:30Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/CompressionUtils.scala

+  private val RECORD_SIZE = 512
+  private val ENCODING = CharsetNames.UTF_8
+
+  def createTarGzip(paths: Iterable[String]): TarGzippedData = {


add comment to this method that any folder hierarchy on the input paths is flattened and duplicate filenames have a _N suffix added before the extension

Folder hierarchies actually aren't allowed, an exception is thrown instead. Still not sure if that's the right call however.

Ah never mind, we want to note that only the file names are extracted from the full folder paths.

ash211 · 2017-01-07T00:24:35Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/CompressionUtils.scala

+    )
+  }
+
+  def unpackAndWriteCompressedFiles(


scaladoc that the return value is a seq of absolute file paths in their written location

ash211 · 2017-01-07T00:28:14Z

...-jobs-helpers/src/main/java/org/apache/spark/deploy/kubernetes/integrationtest/PiHelper.java

+ */
+package org.apache.spark.deploy.kubernetes.integrationtest;
+
+public class PiHelper {


is this class primarily pulled out from SparkPiWithInfiniteWait.scala so there are multiple jars? Would be worth saying that in comments

ash211 · 2017-01-10T01:58:57Z

Merge conflicts now after the folder rename

erikerlandson · 2017-01-10T21:38:51Z

@mccheah, sanity-check: the merge from k8s-support-alternate-incremental was to resolve the change to resource-managers/kubernetes ?

erikerlandson · 2017-01-10T21:40:10Z

Generally, it is cleaner to re-base for keeping updated, than merging.

…cremental' into compress-jars

erikerlandson · 2017-01-11T21:31:11Z

Still LGTM
@ash211 did your requests get resolved?

ash211

LGTM!

* Use tar and gzip to archive shipped jars. * Address comments * Move files to resolve merge

) * Use tar and gzip to archive shipped jars. * Address comments * Move files to resolve merge

erikerlandson reviewed Dec 12, 2016

View reviewed changes

Use tar and gzip to archive shipped jars.

ee07588

mccheah force-pushed the k8s-support-alternate-incremental--compress-jars branch from aeb4351 to ee07588 Compare January 7, 2017 00:17

ash211 suggested changes Jan 7, 2017

View reviewed changes

ash211 mentioned this pull request Jan 7, 2017

Brief build and usage docs for PR #1 #3

Closed

mccheah added 2 commits January 6, 2017 17:47

Address comments

77f3ead

Merge branch 'k8s-support-alternate-incremental' into compress-jars

653351a

mccheah added 2 commits January 10, 2017 15:32

Merge remote-tracking branch 'remotes/origin/k8s-support-alternate-in…

af935a2

…cremental' into compress-jars

Move files to resolve merge

c50a91d

ash211 approved these changes Jan 11, 2017

View reviewed changes

ash211 merged commit 697a35a into k8s-support-alternate-incremental Jan 11, 2017

ash211 deleted the k8s-support-alternate-incremental--compress-jars branch January 11, 2017 22:36

ssuchter mentioned this pull request Jan 12, 2017

[SPARK-18278] Minimal support for submitting to Kubernetes. #1

Closed

ash211 pushed a commit that referenced this pull request Feb 8, 2017

Use tar and gzip to compress+archive shipped jars (#2)

7ae3add

* Use tar and gzip to archive shipped jars. * Address comments * Move files to resolve merge

ash211 pushed a commit that referenced this pull request Mar 8, 2017

Use tar and gzip to compress+archive shipped jars (#2)

178abc1

* Use tar and gzip to archive shipped jars. * Address comments * Move files to resolve merge

This was referenced Jun 9, 2017

Discuss naming schemes for pods, config maps, and secrets that are created for the job #335

Open

V0.2 dev #341

Closed

foxish pushed a commit that referenced this pull request Jul 24, 2017

Use tar and gzip to compress+archive shipped jars (#2)

728be0e

* Use tar and gzip to archive shipped jars. * Address comments * Move files to resolve merge

puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019

Use tar and gzip to compress+archive shipped jars (apache-spark-on-k8s#2

a6606e3

) * Use tar and gzip to archive shipped jars. * Address comments * Move files to resolve merge

Use tar and gzip to compress+archive shipped jars #2

Use tar and gzip to compress+archive shipped jars #2

Uh oh!

Conversation

mccheah commented Dec 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah Dec 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erikerlandson commented Dec 12, 2016

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ash211 commented Jan 10, 2017

Uh oh!

erikerlandson commented Jan 10, 2017

Uh oh!

erikerlandson commented Jan 10, 2017

Uh oh!

erikerlandson commented Jan 11, 2017

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mccheah Dec 12, 2016 •

edited

Loading