-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[WIP] [SPARK-11327] [MESOS] Dispatcher does not respect all args from the Submit request #9752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
@jayv Thank you very much for this as it has held us back from using Spark cluster mode with Mesos. I look forward to using this fix. |
|
This problem is discussed here: https://issues.apache.org/jira/browse/SPARK-11327 |
|
Please add [MESOS] and the jira ticket [SPARK-11327] on the title so it gets picked up the Spark pr tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should try to propogate anything that's spark.* instead of hand picking these. What's the reasoning behind these selected ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My app jar was duplicated in the classpath due to the spark.jars property, then I wondered about which settings you care about when "customizing" a job vs infrastructure implied settings from the config files. Stripping spark.jars would probably be fine, but I'm not familiar enough with the framework to know of any other potential conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the idea is that we pass all configuration to run the spark job as if you're running it locally to the scheduler to forward, so when it runs somewhere in the cluster it should have the same configurations.
I think the right fix is to include everything and allow overrides like the ones you mentioned. I need to look into more if what other flags we need to consider, @andrewor14 do you any other flags we need to capture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @tnachen that it's probably better to include all of them, and blacklist the ones that don't make sense. Either way it's a bit of a maintenance burden whenever a new property gets added, but it's more likely that we'd need it to be passed down than not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah why not just ship everything? If we add a config in the future but forget to add it here this will fail in mysterious ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on the maintenance burden and +1 for including all args minus spark.jars which causes class path issues.
|
I noticed this targets branch-1.5. Usually fixes go to master first and then they are backported if needed. I don't think there are plans for another 1.5 release at this point either, @andrewor14? |
|
How did you test this? I'm trying to run this using a |
|
Yes, @jayv would you mind opening this against the master branch? Then committers will decide which branches to put in when we merge. |
|
I needed a patch for our version, so branched off of that. I'll make a new PR. @dragos no docker, puppet installs our spark build on all our mesos slaves in |
|
@jayv I see. I wonder if forwarding all |
|
I would assume so. When I used |
|
@jayv will you have time to update this PR? |
|
I will get to it on Monday.
On Sat, Nov 21, 2015 at 2:31 PM, Iulian Dragos [email protected]
|
…ned by long column Check for partition column null-ability while building the partition spec. Author: Dilip Biswal <[email protected]> Closes apache#10001 from dilipbiswal/spark-11997.
Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side. There are two reasons that we should make this change: * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645) * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0. It's better to fix this issue before 1.6 release, otherwise we will make breaking API change. cc shivaram sun-rui Author: Yanbo Liang <[email protected]> Closes apache#10016 from yanboliang/SPARK-12025.
…ingListenerSuite In StreamingListenerSuite."don't call ssc.stop in listener", after the main thread calls `ssc.stop()`, `StreamingContextStoppingCollector` may call `ssc.stop()` in the listener bus thread, which is a dead-lock. This PR updated `StreamingContextStoppingCollector` to only call `ssc.stop()` in the first batch to avoid the dead-lock. Author: Shixiong Zhu <[email protected]> Closes apache#10011 from zsxwing/fix-test-deadlock.
… 2.0 test https://issues.apache.org/jira/browse/SPARK-12020 Author: Yin Huai <[email protected]> Closes apache#10010 from yhuai/SPARK-12020.
…the value is null literals
When calling `get_json_object` for the following two cases, both results are `"null"`:
```scala
val tuple: Seq[(String, String)] = ("5", """{"f1": null}""") :: Nil
val df: DataFrame = tuple.toDF("key", "jstring")
val res = df.select(functions.get_json_object($"jstring", "$.f1")).collect()
```
```scala
val tuple2: Seq[(String, String)] = ("5", """{"f1": "null"}""") :: Nil
val df2: DataFrame = tuple2.toDF("key", "jstring")
val res3 = df2.select(functions.get_json_object($"jstring", "$.f1")).collect()
```
Fixed the problem and also added a test case.
Author: gatorsmile <[email protected]>
Closes apache#10018 from gatorsmile/get_json_object.
…, tests, fix doc and add examples shivaram sun-rui Author: felixcheung <[email protected]> Closes apache#10019 from felixcheung/rfunctionsdoc.
Add support for for colnames, colnames<-, coltypes<- Also added tests for names, names<- which have no test previously. I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was apache#9218 shivaram sun-rui Author: felixcheung <[email protected]> Closes apache#9654 from felixcheung/colnamescoltypes.
Author: Sun Rui <[email protected]> Closes apache#9769 from sun-rui/SPARK-11781.
In apache#9409 we enabled multi-column counting. The approach taken in that PR introduces a bit of overhead by first creating a row only to check if all of the columns are non-null. This PR fixes that technical debt. Count now takes multiple columns as its input. In order to make this work I have also added support for multiple columns in the single distinct code path. cc yhuai Author: Herman van Hovell <[email protected]> Closes apache#10015 from hvanhovell/SPARK-12024.
… Parquet relation with decimal column". https://issues.apache.org/jira/browse/SPARK-12039 Since it is pretty flaky in hadoop 1 tests, we can disable it while we are investigating the cause. Author: Yin Huai <[email protected]> Closes apache#10035 from yhuai/SPARK-12039-ignore.
…form zk://host:port for a multi-master Mesos cluster using ZooKeeper * According to below doc and validation logic in [SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231), master URL for a mesos cluster should always start with `mesos://` http://spark.apache.org/docs/latest/running-on-mesos.html `The Master URLs for Mesos are in the form mesos://host:5050 for a single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos cluster using ZooKeeper.` * However, [SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749) fails the validation and can receive master URL in the form `zk://host:port` * For the master URLs in the form `zk:host:port`, the valid form should be `mesos://zk://host:port` * This PR restrict the validation in `SparkContext.scala`, and now only mesos master URLs prefixed with `mesos://` can be accepted. * This PR also updated corresponding unit test. Author: toddwan <[email protected]> Closes apache#9886 from toddwan/S11859.
Based on the suggestions from marmbrus cloud-fan in apache#10165 , this PR is to print the decoded values(user objects) in `Dataset.show` ```scala implicit val kryoEncoder = Encoders.kryo[KryoClassData] val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS() ds.show(20, false); ``` The current output is like ``` +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |value | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]| |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]| |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]| +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData` ```scala override def toString: String = s"KryoClassData($a, $b)" ``` ``` +-------------------+ |value | +-------------------+ |KryoClassData(a, 1)| |KryoClassData(b, 2)| |KryoClassData(c, 3)| +-------------------+ ``` If users do not override the `toString` function, the results will be like ``` +---------------------------------------+ |value | +---------------------------------------+ |org.apache.spark.sql.KryoClassData68ef| |org.apache.spark.sql.KryoClassData6915| |org.apache.spark.sql.KryoClassData693b| +---------------------------------------+ ``` Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values? Author: gatorsmile <[email protected]> Closes apache#10215 from gatorsmile/showDecodedValue.
…not pushed down. Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this. Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values. Author: hyukjinkwon <[email protected]> Closes apache#9687 from HyukjinKwon/SPARK-11677.
Extend CrossValidator with HasSeed in PySpark. This PR replaces [apache#7997] CC: yanboliang thunterdb mmenestret Would one of you mind taking a look? Thanks! Author: Joseph K. Bradley <[email protected]> Author: Martin MENESTRET <[email protected]> Closes apache#10268 from jkbradley/pyspark-cv-seed.
MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext. Author: Davies Liu <[email protected]> Closes apache#10338 from davies/create_context.
```
Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
```
Author: Andrew Or <[email protected]>
Closes apache#10334 from andrewor14/rpc-typo.
This commit exists to close the following pull requests on Github: Closes apache#1217 (requested by ankurdave, srowen) Closes apache#4650 (requested by andrewor14) Closes apache#5307 (requested by vanzin) Closes apache#5664 (requested by andrewor14) Closes apache#5713 (requested by marmbrus) Closes apache#5722 (requested by andrewor14) Closes apache#6685 (requested by srowen) Closes apache#7074 (requested by srowen) Closes apache#7119 (requested by andrewor14) Closes apache#7997 (requested by jkbradley) Closes apache#8292 (requested by srowen) Closes apache#8975 (requested by andrewor14, vanzin) Closes apache#8980 (requested by andrewor14, davies)
|
I wasn't able to make time for this, but I should have time tomorrow.
On Mon, Dec 14, 2015 at 4:51 PM, andrewor14 [email protected]
|
`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception. This was suggested by mateiz on apache#7699. It may have already turned up an issue in "zero split job". Author: Imran Rashid <[email protected]> Closes apache#8466 from squito/SPARK-10248.
…addShutdownHook() is called SPARK-9886 fixed ExternalBlockStore.scala This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook() Author: tedyu <[email protected]> Closes apache#10325 from ted-yu/master.
…ry string when redirecting. Author: Rohit Agarwal <[email protected]> Closes apache#10180 from mindprince/SPARK-12186.
Author: Marcelo Vanzin <[email protected]> Closes apache#10339 from vanzin/SPARK-12386.
No change in functionality is intended. This only changes internal API. Author: Andrew Or <[email protected]> Closes apache#10343 from andrewor14/clean-bm-serializer.
…nting when invFunc is None
when invFunc is None, `reduceByKeyAndWindow(func, None, winsize, slidesize)` is equivalent to
reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize, slidesize)
and no checkpoint is necessary. The corresponding Scala code does exactly that, but Python code always creates a windowed stream with obligatory checkpointing. The patch fixes this.
I do not know how to unit-test this.
Author: David Tolpin <[email protected]>
Closes apache#9888 from dtolpin/master.
This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on apache#10043. The last commit fixes the failed test and updates the logic of schema inference. Regarding the schema inference change, if we have something like ``` {"f1":1} [1,2,3] ``` originally, we will get a DF without any column. After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`. When merge this PR, please make sure that the author is simplyianm. JIRA: https://issues.apache.org/jira/browse/SPARK-12057 Closes apache#10043 Author: Ian Macalinao <[email protected]> Author: Yin Huai <[email protected]> Closes apache#10288 from yhuai/handleCorruptJson.
This commit is to resolve SPARK-12396. Author: echo2mei <[email protected]> Closes apache#10354 from echoTomei/master.
…er." This reverts commit 5a514b6.
For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null). The order of columns had been changed to match that with MySQL and PostgreSQL [1]. This PR also fix the nullability of output for outer join. [1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html Author: Davies Liu <[email protected]> Closes apache#10353 from davies/fix_join.
Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc. Author: Yanbo Liang <[email protected]> Closes apache#10349 from yanboliang/text-value.
…pecial characters This PR encodes and decodes the file name to fix the issue. Author: Shixiong Zhu <[email protected]> Closes apache#10208 from zsxwing/uri.
… server Fix problem with apache#10332, this one should fix Cluster mode on Mesos Author: Iulian Dragos <[email protected]> Closes apache#10359 from dragos/issue/fix-spark-12345-one-more-time.
…split String.split accepts a regular expression, so we should escape "." and "|". Author: Shixiong Zhu <[email protected]> Closes apache#10361 from zsxwing/reg-bug.
…are not found Point users to spark-packages.org to find them. Author: Reynold Xin <[email protected]> Closes apache#10351 from rxin/SPARK-12397.
…erInvariantEquals method org.apache.spark.streaming.Java8APISuite.java is failing due to trying to sort immutable list in assertOrderInvariantEquals method. Author: Evan Chen <[email protected]> Closes apache#10336 from evanyc15/SPARK-12376-StreamingJavaAPISuite.
This PR removes Hive windows functions from Spark and replaces them with (native) Spark ones. The PR is on par with Hive in terms of features. This has the following advantages: * Better memory management. * The ability to use spark UDAFs in Window functions. cc rxin / yhuai Author: Herman van Hovell <[email protected]> Closes apache#9819 from hvanhovell/SPARK-8641-2.
b2025dd to
cdde93d
Compare
|
New PR against master: #10370 |
…bmit request Supersedes #9752 Author: Jo Voordeckers <[email protected]> Author: Iulian Dragos <[email protected]> Closes #10370 from jayv/mesos_cluster_params.
I've noticed some args don't get passed onto the driver spark-submit call from the Mesos Dispatcher.
Does this make sense or am I using it wrong? Espectially JVM args and Spark UI port are important to me.
I can make a JIRA ticket and add tests if I'm on the right track.