[WIP] [SPARK-11327] [MESOS] Dispatcher does not respect all args from the Submit request #9752

jayv · 2015-11-17T01:20:23Z

I've noticed some args don't get passed onto the driver spark-submit call from the Mesos Dispatcher.
Does this make sense or am I using it wrong? Espectially JVM args and Spark UI port are important to me.
I can make a JIRA ticket and add tests if I'm on the right track.

AmplabJenkins · 2015-11-17T01:22:10Z

Can one of the admins verify this patch?

vonnagy · 2015-11-17T06:18:16Z

@jayv Thank you very much for this as it has held us back from using Spark cluster mode with Mesos. I look forward to using this fix.

jayv · 2015-11-17T18:38:35Z

This problem is discussed here: https://issues.apache.org/jira/browse/SPARK-11327

tnachen · 2015-11-17T23:50:27Z

Please add [MESOS] and the jira ticket [SPARK-11327] on the title so it gets picked up the Spark pr tool.

tnachen · 2015-11-18T00:01:07Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala

I'm wondering if we should try to propogate anything that's spark.* instead of hand picking these. What's the reasoning behind these selected ones?

My app jar was duplicated in the classpath due to the spark.jars property, then I wondered about which settings you care about when "customizing" a job vs infrastructure implied settings from the config files. Stripping spark.jars would probably be fine, but I'm not familiar enough with the framework to know of any other potential conflicts.

I see, the idea is that we pass all configuration to run the spark job as if you're running it locally to the scheduler to forward, so when it runs somewhere in the cluster it should have the same configurations.

I think the right fix is to include everything and allow overrides like the ones you mentioned. I need to look into more if what other flags we need to consider, @andrewor14 do you any other flags we need to capture?

I agree with @tnachen that it's probably better to include all of them, and blacklist the ones that don't make sense. Either way it's a bit of a maintenance burden whenever a new property gets added, but it's more likely that we'd need it to be passed down than not.

yeah why not just ship everything? If we add a config in the future but forget to add it here this will fail in mysterious ways.

Agree on the maintenance burden and +1 for including all args minus spark.jars which causes class path issues.

dragos · 2015-11-19T13:45:58Z

I noticed this targets branch-1.5. Usually fixes go to master first and then they are backported if needed. I don't think there are plans for another 1.5 release at this point either, @andrewor14?

dragos · 2015-11-19T13:59:31Z

How did you test this? I'm trying to run this using a spark.executor.uri pointing to an FTP server, but tasks fail (they don't retrieve the URI). The driver launches fine though. Are you using Docker images?

andrewor14 · 2015-11-19T18:35:57Z

Yes, @jayv would you mind opening this against the master branch? Then committers will decide which branches to put in when we merge.

jayv · 2015-11-19T20:04:22Z

I needed a patch for our version, so branched off of that. I'll make a new PR.

@dragos no docker, puppet installs our spark build on all our mesos slaves in /opt/spark.

dragos · 2015-11-19T20:14:46Z

@jayv I see. I wonder if forwarding all spark.* properties will fix it for me.

jayv · 2015-11-19T20:20:39Z

I would assume so. When I used spark.*.extraJavaOptions to specify -Dfoo=bar it got applied to both my driver and tasks which it didn't before my patch.

dragos · 2015-11-21T22:31:02Z

@jayv will you have time to update this PR?

jayv · 2015-11-22T05:52:00Z

I will get to it on Monday.

Jo Voordeckers

On Sat, Nov 21, 2015 at 2:31 PM, Iulian Dragos [email protected]
wrote:

@jayv https://github.com/jayv will you have time to update this PR?

—
Reply to this email directly or view it on GitHub
#9752 (comment).

…ned by long column Check for partition column null-ability while building the partition spec. Author: Dilip Biswal <[email protected]> Closes apache#10001 from dilipbiswal/spark-11997.

Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side. There are two reasons that we should make this change: * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645) * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0. It's better to fix this issue before 1.6 release, otherwise we will make breaking API change. cc shivaram sun-rui Author: Yanbo Liang <[email protected]> Closes apache#10016 from yanboliang/SPARK-12025.

…ingListenerSuite In StreamingListenerSuite."don't call ssc.stop in listener", after the main thread calls `ssc.stop()`, `StreamingContextStoppingCollector` may call `ssc.stop()` in the listener bus thread, which is a dead-lock. This PR updated `StreamingContextStoppingCollector` to only call `ssc.stop()` in the first batch to avoid the dead-lock. Author: Shixiong Zhu <[email protected]> Closes apache#10011 from zsxwing/fix-test-deadlock.

… 2.0 test https://issues.apache.org/jira/browse/SPARK-12020 Author: Yin Huai <[email protected]> Closes apache#10010 from yhuai/SPARK-12020.

…the value is null literals When calling `get_json_object` for the following two cases, both results are `"null"`: ```scala val tuple: Seq[(String, String)] = ("5", """{"f1": null}""") :: Nil val df: DataFrame = tuple.toDF("key", "jstring") val res = df.select(functions.get_json_object($"jstring", "$.f1")).collect() ``` ```scala val tuple2: Seq[(String, String)] = ("5", """{"f1": "null"}""") :: Nil val df2: DataFrame = tuple2.toDF("key", "jstring") val res3 = df2.select(functions.get_json_object($"jstring", "$.f1")).collect() ``` Fixed the problem and also added a test case. Author: gatorsmile <[email protected]> Closes apache#10018 from gatorsmile/get_json_object.

…, tests, fix doc and add examples shivaram sun-rui Author: felixcheung <[email protected]> Closes apache#10019 from felixcheung/rfunctionsdoc.

Add support for for colnames, colnames<-, coltypes<- Also added tests for names, names<- which have no test previously. I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was apache#9218 shivaram sun-rui Author: felixcheung <[email protected]> Closes apache#9654 from felixcheung/colnamescoltypes.

Author: Sun Rui <[email protected]> Closes apache#9769 from sun-rui/SPARK-11781.

In apache#9409 we enabled multi-column counting. The approach taken in that PR introduces a bit of overhead by first creating a row only to check if all of the columns are non-null. This PR fixes that technical debt. Count now takes multiple columns as its input. In order to make this work I have also added support for multiple columns in the single distinct code path. cc yhuai Author: Herman van Hovell <[email protected]> Closes apache#10015 from hvanhovell/SPARK-12024.

… Parquet relation with decimal column". https://issues.apache.org/jira/browse/SPARK-12039 Since it is pretty flaky in hadoop 1 tests, we can disable it while we are investigating the cause. Author: Yin Huai <[email protected]> Closes apache#10035 from yhuai/SPARK-12039-ignore.

…form zk://host:port for a multi-master Mesos cluster using ZooKeeper * According to below doc and validation logic in [SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231), master URL for a mesos cluster should always start with `mesos://` http://spark.apache.org/docs/latest/running-on-mesos.html `The Master URLs for Mesos are in the form mesos://host:5050 for a single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos cluster using ZooKeeper.` * However, [SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749) fails the validation and can receive master URL in the form `zk://host:port` * For the master URLs in the form `zk:host:port`, the valid form should be `mesos://zk://host:port` * This PR restrict the validation in `SparkContext.scala`, and now only mesos master URLs prefixed with `mesos://` can be accepted. * This PR also updated corresponding unit test. Author: toddwan <[email protected]> Closes apache#9886 from toddwan/S11859.

Based on the suggestions from marmbrus cloud-fan in apache#10165 , this PR is to print the decoded values(user objects) in `Dataset.show` ```scala implicit val kryoEncoder = Encoders.kryo[KryoClassData] val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS() ds.show(20, false); ``` The current output is like ``` +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |value | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]| |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]| |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]| +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData` ```scala override def toString: String = s"KryoClassData($a, $b)" ``` ``` +-------------------+ |value | +-------------------+ |KryoClassData(a, 1)| |KryoClassData(b, 2)| |KryoClassData(c, 3)| +-------------------+ ``` If users do not override the `toString` function, the results will be like ``` +---------------------------------------+ |value | +---------------------------------------+ |org.apache.spark.sql.KryoClassData68ef| |org.apache.spark.sql.KryoClassData6915| |org.apache.spark.sql.KryoClassData693b| +---------------------------------------+ ``` Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values? Author: gatorsmile <[email protected]> Closes apache#10215 from gatorsmile/showDecodedValue.

…not pushed down. Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this. Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values. Author: hyukjinkwon <[email protected]> Closes apache#9687 from HyukjinKwon/SPARK-11677.

Extend CrossValidator with HasSeed in PySpark. This PR replaces [apache#7997] CC: yanboliang thunterdb mmenestret Would one of you mind taking a look? Thanks! Author: Joseph K. Bradley <[email protected]> Author: Martin MENESTRET <[email protected]> Closes apache#10268 from jkbradley/pyspark-cv-seed.

MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext. Author: Davies Liu <[email protected]> Closes apache#10338 from davies/create_context.

``` Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ``` Author: Andrew Or <[email protected]> Closes apache#10334 from andrewor14/rpc-typo.

This commit exists to close the following pull requests on Github: Closes apache#1217 (requested by ankurdave, srowen) Closes apache#4650 (requested by andrewor14) Closes apache#5307 (requested by vanzin) Closes apache#5664 (requested by andrewor14) Closes apache#5713 (requested by marmbrus) Closes apache#5722 (requested by andrewor14) Closes apache#6685 (requested by srowen) Closes apache#7074 (requested by srowen) Closes apache#7119 (requested by andrewor14) Closes apache#7997 (requested by jkbradley) Closes apache#8292 (requested by srowen) Closes apache#8975 (requested by andrewor14, vanzin) Closes apache#8980 (requested by andrewor14, davies)

jayv · 2015-12-17T02:26:30Z

I wasn't able to make time for this, but I should have time tomorrow.
Sorry for the delay.

Jo Voordeckers

On Mon, Dec 14, 2015 at 4:51 PM, andrewor14 [email protected]
wrote:

@jayv https://github.com/jayv have you had the chance to work on this
patch? If not, shall one of us take it over?

—
Reply to this email directly or view it on GitHub
#9752 (comment).

`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception. This was suggested by mateiz on apache#7699. It may have already turned up an issue in "zero split job". Author: Imran Rashid <[email protected]> Closes apache#8466 from squito/SPARK-10248.

…addShutdownHook() is called SPARK-9886 fixed ExternalBlockStore.scala This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook() Author: tedyu <[email protected]> Closes apache#10325 from ted-yu/master.

…ry string when redirecting. Author: Rohit Agarwal <[email protected]> Closes apache#10180 from mindprince/SPARK-12186.

Author: Marcelo Vanzin <[email protected]> Closes apache#10339 from vanzin/SPARK-12386.

No change in functionality is intended. This only changes internal API. Author: Andrew Or <[email protected]> Closes apache#10343 from andrewor14/clean-bm-serializer.

…nting when invFunc is None when invFunc is None, `reduceByKeyAndWindow(func, None, winsize, slidesize)` is equivalent to reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize, slidesize) and no checkpoint is necessary. The corresponding Scala code does exactly that, but Python code always creates a windowed stream with obligatory checkpointing. The patch fixes this. I do not know how to unit-test this. Author: David Tolpin <[email protected]> Closes apache#9888 from dtolpin/master.

This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on apache#10043. The last commit fixes the failed test and updates the logic of schema inference. Regarding the schema inference change, if we have something like ``` {"f1":1} [1,2,3] ``` originally, we will get a DF without any column. After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`. When merge this PR, please make sure that the author is simplyianm. JIRA: https://issues.apache.org/jira/browse/SPARK-12057 Closes apache#10043 Author: Ian Macalinao <[email protected]> Author: Yin Huai <[email protected]> Closes apache#10288 from yhuai/handleCorruptJson.

This commit is to resolve SPARK-12396. Author: echo2mei <[email protected]> Closes apache#10354 from echoTomei/master.

…er." This reverts commit 5a514b6.

For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null). The order of columns had been changed to match that with MySQL and PostgreSQL [1]. This PR also fix the nullability of output for outer join. [1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html Author: Davies Liu <[email protected]> Closes apache#10353 from davies/fix_join.

Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc. Author: Yanbo Liang <[email protected]> Closes apache#10349 from yanboliang/text-value.

…pecial characters This PR encodes and decodes the file name to fix the issue. Author: Shixiong Zhu <[email protected]> Closes apache#10208 from zsxwing/uri.

… server Fix problem with apache#10332, this one should fix Cluster mode on Mesos Author: Iulian Dragos <[email protected]> Closes apache#10359 from dragos/issue/fix-spark-12345-one-more-time.

…split String.split accepts a regular expression, so we should escape "." and "|". Author: Shixiong Zhu <[email protected]> Closes apache#10361 from zsxwing/reg-bug.

…are not found Point users to spark-packages.org to find them. Author: Reynold Xin <[email protected]> Closes apache#10351 from rxin/SPARK-12397.

…erInvariantEquals method org.apache.spark.streaming.Java8APISuite.java is failing due to trying to sort immutable list in assertOrderInvariantEquals method. Author: Evan Chen <[email protected]> Closes apache#10336 from evanyc15/SPARK-12376-StreamingJavaAPISuite.

This PR removes Hive windows functions from Spark and replaces them with (native) Spark ones. The PR is on par with Hive in terms of features. This has the following advantages: * Better memory management. * The ability to use spark UDAFs in Window functions. cc rxin / yhuai Author: Herman van Hovell <[email protected]> Closes apache#9819 from hvanhovell/SPARK-8641-2.

jayv · 2015-12-18T02:54:49Z

New PR against master: #10370

…bmit request Supersedes #9752 Author: Jo Voordeckers <[email protected]> Author: Iulian Dragos <[email protected]> Closes #10370 from jayv/mesos_cluster_params.

jayv changed the title ~~[WIPMessos scheduler does not respect all args from the Submit request~~ [WIP] Messos scheduler does not respect all args from the Submit request Nov 17, 2015

jayv changed the title ~~[WIP] Messos scheduler does not respect all args from the Submit request~~ [WIP] Mesos scheduler does not respect all args from the Submit request Nov 17, 2015

jayv changed the title ~~[WIP] Mesos scheduler does not respect all args from the Submit request~~ [WIP] Mesos Dispatcher does not respect all args from the Submit request Nov 17, 2015

jayv changed the title ~~[WIP] Mesos Dispatcher does not respect all args from the Submit request~~ [WIP] [SPARK-11327] Mesos Dispatcher does not respect all args from the Submit request Nov 17, 2015

jayv changed the title ~~[WIP] [SPARK-11327] Mesos Dispatcher does not respect all args from the Submit request~~ [WIP] [SPARK-11327] [MESOS] Dispatcher does not respect all args from the Submit request Nov 17, 2015

tnachen reviewed Nov 18, 2015
View reviewed changes

rxin and others added 12 commits November 26, 2015 19:36

Fix style violation for b63938a

10e315c

[SPARK-11997] [SQL] NPE when save a DataFrame as parquet and partitio…

a374e20

…ned by long column Check for partition column null-ability while building the partition spec. Author: Dilip Biswal <[email protected]> Closes apache#10001 from dilipbiswal/spark-11997.

[SPARK-12020][TESTS][TEST-HADOOP2.0] PR builder cannot trigger hadoop…

b992152

… 2.0 test https://issues.apache.org/jira/browse/SPARK-12020 Author: Yin Huai <[email protected]> Closes apache#10010 from yhuai/SPARK-12020.

[SPARK-12029][SPARKR] Improve column functions signature, param check…

28e46ab

…, tests, fix doc and add examples shivaram sun-rui Author: felixcheung <[email protected]> Closes apache#10019 from felixcheung/rfunctionsdoc.

[SPARK-11781][SPARKR] SparkR has problem in inferring type of raw type.

cc7a1bc

Author: Sun Rui <[email protected]> Closes apache#9769 from sun-rui/SPARK-11781.

gatorsmile and others added 6 commits December 16, 2015 13:22

[SPARK-12380] [PYSPARK] use SQLContext.getOrCreate in mllib

27b98e9

MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext. Author: Davies Liu <[email protected]> Closes apache#10338 from davies/create_context.

squito and others added 18 commits December 16, 2015 19:01

[SPARK-12186][WEB UI] Send the complete request URI including the que…

fdb3822

…ry string when redirecting. Author: Rohit Agarwal <[email protected]> Closes apache#10180 from mindprince/SPARK-12186.

[SPARK-12386][CORE] Fix NPE when spark.executor.port is set.

d1508dd

Author: Marcelo Vanzin <[email protected]> Closes apache#10339 from vanzin/SPARK-12386.

[SPARK-12390] Clean up unused serializer parameter in BlockManager

97678ed

No change in functionality is intended. This only changes internal API. Author: Andrew Or <[email protected]> Closes apache#10343 from andrewor14/clean-bm-serializer.

Once driver register successfully, stop it to connect to master.

5a514b6

This commit is to resolve SPARK-12396. Author: echo2mei <[email protected]> Closes apache#10354 from echoTomei/master.

Revert "Once driver register successfully, stop it to connect to mast…

cd3d937

…er." This reverts commit 5a514b6.

[SQL] Update SQLContext.read.text doc

6e07716

Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc. Author: Yanbo Liang <[email protected]> Closes apache#10349 from yanboliang/text-value.

[SPARK-12220][CORE] Make Utils.fetchFile support files that contain s…

86e405f

…pecial characters This PR encodes and decodes the file name to fix the issue. Author: Shixiong Zhu <[email protected]> Closes apache#10208 from zsxwing/uri.

[SPARK-12345][MESOS] Properly filter out SPARK_HOME in the Mesos REST…

8184568

… server Fix problem with apache#10332, this one should fix Cluster mode on Mesos Author: Iulian Dragos <[email protected]> Closes apache#10359 from dragos/issue/fix-spark-12345-one-more-time.

[SPARK-12410][STREAMING] Fix places that use '.' and '|' directly in …

540b5ae

…split String.split accepts a regular expression, so we should escape "." and "|". Author: Shixiong Zhu <[email protected]> Closes apache#10361 from zsxwing/reg-bug.

[SPARK-12397][SQL] Improve error messages for data sources when they …

e096a65

…are not found Point users to spark-packages.org to find them. Author: Reynold Xin <[email protected]> Closes apache#10351 from rxin/SPARK-12397.

Messos scheduler does not respect all args from the Submit request

cdde93d

jayv force-pushed the mesos_cluster_params branch from b2025dd to cdde93d Compare December 18, 2015 02:50

jayv mentioned this pull request Dec 18, 2015

[SPARK-11327] [MESOS] Dispatcher does not respect all args from the Submit request #10370

Closed

jayv closed this Dec 18, 2015

[WIP] [SPARK-11327] [MESOS] Dispatcher does not respect all args from the Submit request #9752

[WIP] [SPARK-11327] [MESOS] Dispatcher does not respect all args from the Submit request #9752

Uh oh!

Conversation

jayv commented Nov 17, 2015

Uh oh!

AmplabJenkins commented Nov 17, 2015

Uh oh!

vonnagy commented Nov 17, 2015

Uh oh!

jayv commented Nov 17, 2015

Uh oh!

tnachen commented Nov 17, 2015

Uh oh!

tnachen Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

jayv Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

tnachen Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

dragos Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

andrewor14 Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

jayv Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

dragos commented Nov 19, 2015

Uh oh!

dragos commented Nov 19, 2015

Uh oh!

andrewor14 commented Nov 19, 2015

Uh oh!

jayv commented Nov 19, 2015

Uh oh!

dragos commented Nov 19, 2015

Uh oh!

jayv commented Nov 19, 2015

Uh oh!

dragos commented Nov 21, 2015

Uh oh!

jayv commented Nov 22, 2015

Uh oh!

jayv commented Dec 17, 2015

Uh oh!

jayv commented Dec 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

76 participants