Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@mccheah
Copy link

@mccheah mccheah commented Dec 9, 2016

Augments #1 with compressing the user's jar uploads before they are sent over the wire. Note that while gzipping is the important part to compress things, there could be multiple possible choices over tar as the format for indicating "boundaries" between items in the stream. Encoding the file names is tricky to do by hand however so being able to get this for free using tar is helpful.

val usedFileNames = mutable.HashSet.empty[String]
for (path <- paths) {
val file = new File(path)
if (!file.isFile) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does leveraging the tar API make it easy to provide a directory, and have it send all the files underneath that directory? I can imagine that might be a nice feature, although it increases the likelihood of accidentally grabbing a lot of data that wasn't intended.

Related question: is useful to allow a configured hard limit on tarball size?

Copy link
Author

@mccheah mccheah Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this but we need to add the logic to recursively add the files in directories. I think what we have here will cover enough use cases; I recall that the jars API in general doesn't allow for adding directories.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the usefulness of a configured limit mostly because the size that matters is the size after compression which could be difficult for application submitters to predict.


def unpackAndWriteCompressedFiles(
compressedData: TarGzippedData,
rootOutputDir: File): Seq[String] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is root output directory a configuration parameter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're dynamically created in temp space.

@erikerlandson
Copy link
Member

We can wait to see if anybody else has questions, but LGTM

@mccheah mccheah force-pushed the k8s-support-alternate-incremental--compress-jars branch from aeb4351 to ee07588 Compare January 7, 2017 00:17
Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! Some requests for better comments and a warning log line but nothing significant

var deduplicationCounter = 1
while (usedFileNames.contains(resolvedFileName)) {
resolvedFileName = s"$nameWithoutExtension-$deduplicationCounter.$extension"
deduplicationCounter += 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be a fan of logging a warning here -- having multiple jars on a classpath with the same name is bad practice anyway, especially if their contents are different...


private[spark] object CompressionUtils {
private val BLOCK_SIZE = 10240
private val RECORD_SIZE = 512
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment that these are the defaults from TarArchiveOutputStream

val paths = mutable.Buffer.empty[String]
val compressedBytes = Base64.decodeBase64(compressedData.dataBase64)
if (!rootOutputDir.exists) {
rootOutputDir.mkdir
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be a mkdir -p -- create intermediate directories?

private val RECORD_SIZE = 512
private val ENCODING = CharsetNames.UTF_8

def createTarGzip(paths: Iterable[String]): TarGzippedData = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment to this method that any folder hierarchy on the input paths is flattened and duplicate filenames have a _N suffix added before the extension

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Folder hierarchies actually aren't allowed, an exception is thrown instead. Still not sure if that's the right call however.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah never mind, we want to note that only the file names are extracted from the full folder paths.

)
}

def unpackAndWriteCompressedFiles(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaladoc that the return value is a seq of absolute file paths in their written location

*/
package org.apache.spark.deploy.kubernetes.integrationtest;

public class PiHelper {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this class primarily pulled out from SparkPiWithInfiniteWait.scala so there are multiple jars? Would be worth saying that in comments

@ash211
Copy link

ash211 commented Jan 10, 2017

Merge conflicts now after the folder rename

@erikerlandson
Copy link
Member

@mccheah, sanity-check: the merge from k8s-support-alternate-incremental was to resolve the change to resource-managers/kubernetes ?

@erikerlandson
Copy link
Member

Generally, it is cleaner to re-base for keeping updated, than merging.

@erikerlandson
Copy link
Member

Still LGTM
@ash211 did your requests get resolved?

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ash211 ash211 merged commit 697a35a into k8s-support-alternate-incremental Jan 11, 2017
@ash211 ash211 deleted the k8s-support-alternate-incremental--compress-jars branch January 11, 2017 22:36
ash211 pushed a commit that referenced this pull request Feb 8, 2017
* Use tar and gzip to archive shipped jars.

* Address comments

* Move files to resolve merge
ash211 pushed a commit that referenced this pull request Mar 8, 2017
* Use tar and gzip to archive shipped jars.

* Address comments

* Move files to resolve merge
foxish pushed a commit that referenced this pull request Jul 24, 2017
* Use tar and gzip to archive shipped jars.

* Address comments

* Move files to resolve merge
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
)

* Use tar and gzip to archive shipped jars.

* Address comments

* Move files to resolve merge
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants