Skip to content

Conversation

@jayv
Copy link
Contributor

@jayv jayv commented Nov 17, 2015

I've noticed some args don't get passed onto the driver spark-submit call from the Mesos Dispatcher.
Does this make sense or am I using it wrong? Espectially JVM args and Spark UI port are important to me.
I can make a JIRA ticket and add tests if I'm on the right track.

@jayv jayv changed the title [WIPMessos scheduler does not respect all args from the Submit request [WIP] Messos scheduler does not respect all args from the Submit request Nov 17, 2015
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jayv jayv changed the title [WIP] Messos scheduler does not respect all args from the Submit request [WIP] Mesos scheduler does not respect all args from the Submit request Nov 17, 2015
@jayv jayv changed the title [WIP] Mesos scheduler does not respect all args from the Submit request [WIP] Mesos Dispatcher does not respect all args from the Submit request Nov 17, 2015
@vonnagy
Copy link

vonnagy commented Nov 17, 2015

@jayv Thank you very much for this as it has held us back from using Spark cluster mode with Mesos. I look forward to using this fix.

@jayv jayv changed the title [WIP] Mesos Dispatcher does not respect all args from the Submit request [WIP] [SPARK-11327] Mesos Dispatcher does not respect all args from the Submit request Nov 17, 2015
@jayv
Copy link
Contributor Author

jayv commented Nov 17, 2015

This problem is discussed here: https://issues.apache.org/jira/browse/SPARK-11327

@tnachen
Copy link
Contributor

tnachen commented Nov 17, 2015

Please add [MESOS] and the jira ticket [SPARK-11327] on the title so it gets picked up the Spark pr tool.

@jayv jayv changed the title [WIP] [SPARK-11327] Mesos Dispatcher does not respect all args from the Submit request [WIP] [SPARK-11327] [MESOS] Dispatcher does not respect all args from the Submit request Nov 17, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should try to propogate anything that's spark.* instead of hand picking these. What's the reasoning behind these selected ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My app jar was duplicated in the classpath due to the spark.jars property, then I wondered about which settings you care about when "customizing" a job vs infrastructure implied settings from the config files. Stripping spark.jars would probably be fine, but I'm not familiar enough with the framework to know of any other potential conflicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the idea is that we pass all configuration to run the spark job as if you're running it locally to the scheduler to forward, so when it runs somewhere in the cluster it should have the same configurations.

I think the right fix is to include everything and allow overrides like the ones you mentioned. I need to look into more if what other flags we need to consider, @andrewor14 do you any other flags we need to capture?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @tnachen that it's probably better to include all of them, and blacklist the ones that don't make sense. Either way it's a bit of a maintenance burden whenever a new property gets added, but it's more likely that we'd need it to be passed down than not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah why not just ship everything? If we add a config in the future but forget to add it here this will fail in mysterious ways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the maintenance burden and +1 for including all args minus spark.jars which causes class path issues.

@dragos
Copy link
Contributor

dragos commented Nov 19, 2015

I noticed this targets branch-1.5. Usually fixes go to master first and then they are backported if needed. I don't think there are plans for another 1.5 release at this point either, @andrewor14?

@dragos
Copy link
Contributor

dragos commented Nov 19, 2015

How did you test this? I'm trying to run this using a spark.executor.uri pointing to an FTP server, but tasks fail (they don't retrieve the URI). The driver launches fine though. Are you using Docker images?

@andrewor14
Copy link
Contributor

Yes, @jayv would you mind opening this against the master branch? Then committers will decide which branches to put in when we merge.

@jayv
Copy link
Contributor Author

jayv commented Nov 19, 2015

I needed a patch for our version, so branched off of that. I'll make a new PR.

@dragos no docker, puppet installs our spark build on all our mesos slaves in /opt/spark.

@dragos
Copy link
Contributor

dragos commented Nov 19, 2015

@jayv I see. I wonder if forwarding all spark.* properties will fix it for me.

@jayv
Copy link
Contributor Author

jayv commented Nov 19, 2015

I would assume so. When I used spark.*.extraJavaOptions to specify -Dfoo=bar it got applied to both my driver and tasks which it didn't before my patch.

@dragos
Copy link
Contributor

dragos commented Nov 21, 2015

@jayv will you have time to update this PR?

@jayv
Copy link
Contributor Author

jayv commented Nov 22, 2015

I will get to it on Monday.

  • Jo Voordeckers

On Sat, Nov 21, 2015 at 2:31 PM, Iulian Dragos [email protected]
wrote:

@jayv https://github.com/jayv will you have time to update this PR?


Reply to this email directly or view it on GitHub
#9752 (comment).

rxin and others added 12 commits November 26, 2015 19:36
…ned by long column

Check for partition column null-ability while building the partition spec.

Author: Dilip Biswal <[email protected]>

Closes apache#10001 from dilipbiswal/spark-11997.
Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side.
There are two reasons that we should make this change:
* We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645)
* Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0.

It's better to fix this issue before 1.6 release, otherwise we will make breaking API change.
cc shivaram sun-rui

Author: Yanbo Liang <[email protected]>

Closes apache#10016 from yanboliang/SPARK-12025.
…ingListenerSuite

In StreamingListenerSuite."don't call ssc.stop in listener", after the main thread calls `ssc.stop()`,  `StreamingContextStoppingCollector` may call  `ssc.stop()` in the listener bus thread, which is a dead-lock. This PR updated `StreamingContextStoppingCollector` to only call `ssc.stop()` in the first batch to avoid the dead-lock.

Author: Shixiong Zhu <[email protected]>

Closes apache#10011 from zsxwing/fix-test-deadlock.
…the value is null literals

When calling `get_json_object` for the following two cases, both results are `"null"`:

```scala
    val tuple: Seq[(String, String)] = ("5", """{"f1": null}""") :: Nil
    val df: DataFrame = tuple.toDF("key", "jstring")
    val res = df.select(functions.get_json_object($"jstring", "$.f1")).collect()
```
```scala
    val tuple2: Seq[(String, String)] = ("5", """{"f1": "null"}""") :: Nil
    val df2: DataFrame = tuple2.toDF("key", "jstring")
    val res3 = df2.select(functions.get_json_object($"jstring", "$.f1")).collect()
```

Fixed the problem and also added a test case.

Author: gatorsmile <[email protected]>

Closes apache#10018 from gatorsmile/get_json_object.
…, tests, fix doc and add examples

shivaram sun-rui

Author: felixcheung <[email protected]>

Closes apache#10019 from felixcheung/rfunctionsdoc.
Add support for for colnames, colnames<-, coltypes<-
Also added tests for names, names<- which have no test previously.

I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was apache#9218

shivaram sun-rui

Author: felixcheung <[email protected]>

Closes apache#9654 from felixcheung/colnamescoltypes.
In apache#9409 we enabled multi-column counting. The approach taken in that PR introduces a bit of overhead by first creating a row only to check if all of the columns are non-null.

This PR fixes that technical debt. Count now takes multiple columns as its input. In order to make this work I have also added support for multiple columns in the single distinct code path.

cc yhuai

Author: Herman van Hovell <[email protected]>

Closes apache#10015 from hvanhovell/SPARK-12024.
… Parquet relation with decimal column".

https://issues.apache.org/jira/browse/SPARK-12039

Since it is pretty flaky in hadoop 1 tests, we can disable it while we are investigating the cause.

Author: Yin Huai <[email protected]>

Closes apache#10035 from yhuai/SPARK-12039-ignore.
…form zk://host:port for a multi-master Mesos cluster using ZooKeeper

* According to below doc and validation logic in [SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231), master URL for a mesos cluster should always start with `mesos://`

http://spark.apache.org/docs/latest/running-on-mesos.html
`The Master URLs for Mesos are in the form mesos://host:5050 for a single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos cluster using ZooKeeper.`

* However, [SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749) fails the validation and can receive master URL in the form `zk://host:port`

* For the master URLs in the form `zk:host:port`, the valid form should be `mesos://zk://host:port`

* This PR restrict the validation in `SparkContext.scala`, and now only mesos master URLs prefixed with `mesos://` can be accepted.

* This PR also updated corresponding unit test.

Author: toddwan <[email protected]>

Closes apache#9886 from toddwan/S11859.
gatorsmile and others added 6 commits December 16, 2015 13:22
Based on the suggestions from marmbrus cloud-fan in apache#10165 , this PR is to print the decoded values(user objects) in `Dataset.show`
```scala
    implicit val kryoEncoder = Encoders.kryo[KryoClassData]
    val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS()
    ds.show(20, false);
```
The current output is like
```
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData`
```scala
override def toString: String = s"KryoClassData($a, $b)"
```
```
+-------------------+
|value              |
+-------------------+
|KryoClassData(a, 1)|
|KryoClassData(b, 2)|
|KryoClassData(c, 3)|
+-------------------+
```

If users do not override the `toString` function, the results will be like
```
+---------------------------------------+
|value                                  |
+---------------------------------------+
|org.apache.spark.sql.KryoClassData68ef|
|org.apache.spark.sql.KryoClassData6915|
|org.apache.spark.sql.KryoClassData693b|
+---------------------------------------+
```

Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values?

Author: gatorsmile <[email protected]>

Closes apache#10215 from gatorsmile/showDecodedValue.
…not pushed down.

Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this.
Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values.

Author: hyukjinkwon <[email protected]>

Closes apache#9687 from HyukjinKwon/SPARK-11677.
Extend CrossValidator with HasSeed in PySpark.

This PR replaces [apache#7997]

CC: yanboliang thunterdb mmenestret  Would one of you mind taking a look?  Thanks!

Author: Joseph K. Bradley <[email protected]>
Author: Martin MENESTRET <[email protected]>

Closes apache#10268 from jkbradley/pyspark-cv-seed.
MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext.

Author: Davies Liu <[email protected]>

Closes apache#10338 from davies/create_context.
```
Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
```

Author: Andrew Or <[email protected]>

Closes apache#10334 from andrewor14/rpc-typo.
This commit exists to close the following pull requests on Github:

Closes apache#1217 (requested by ankurdave, srowen)
Closes apache#4650 (requested by andrewor14)
Closes apache#5307 (requested by vanzin)
Closes apache#5664 (requested by andrewor14)
Closes apache#5713 (requested by marmbrus)
Closes apache#5722 (requested by andrewor14)
Closes apache#6685 (requested by srowen)
Closes apache#7074 (requested by srowen)
Closes apache#7119 (requested by andrewor14)
Closes apache#7997 (requested by jkbradley)
Closes apache#8292 (requested by srowen)
Closes apache#8975 (requested by andrewor14, vanzin)
Closes apache#8980 (requested by andrewor14, davies)
@jayv
Copy link
Contributor Author

jayv commented Dec 17, 2015

I wasn't able to make time for this, but I should have time tomorrow.
Sorry for the delay.

  • Jo Voordeckers

On Mon, Dec 14, 2015 at 4:51 PM, andrewor14 [email protected]
wrote:

@jayv https://github.com/jayv have you had the chance to work on this
patch? If not, shall one of us take it over?


Reply to this email directly or view it on GitHub
#9752 (comment).

squito and others added 18 commits December 16, 2015 19:01
`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs).  However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception.

This was suggested by mateiz on apache#7699.  It may have already turned up an issue in "zero split job".

Author: Imran Rashid <[email protected]>

Closes apache#8466 from squito/SPARK-10248.
…addShutdownHook() is called

SPARK-9886 fixed ExternalBlockStore.scala

This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook()

Author: tedyu <[email protected]>

Closes apache#10325 from ted-yu/master.
…ry string when redirecting.

Author: Rohit Agarwal <[email protected]>

Closes apache#10180 from mindprince/SPARK-12186.
Author: Marcelo Vanzin <[email protected]>

Closes apache#10339 from vanzin/SPARK-12386.
No change in functionality is intended. This only changes internal API.

Author: Andrew Or <[email protected]>

Closes apache#10343 from andrewor14/clean-bm-serializer.
…nting when invFunc is None

when invFunc is None, `reduceByKeyAndWindow(func, None, winsize, slidesize)` is equivalent to

     reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize, slidesize)

and no checkpoint is necessary. The corresponding Scala code does exactly that, but Python code always creates a windowed stream with obligatory checkpointing. The patch fixes this.

I do not know how to unit-test this.

Author: David Tolpin <[email protected]>

Closes apache#9888 from dtolpin/master.
This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on apache#10043. The last commit fixes the failed test and updates the logic of schema inference.

Regarding the schema inference change, if we have something like
```
{"f1":1}
[1,2,3]
```
originally, we will get a DF without any column.
After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`.

When merge this PR, please make sure that the author is simplyianm.

JIRA: https://issues.apache.org/jira/browse/SPARK-12057

Closes apache#10043

Author: Ian Macalinao <[email protected]>
Author: Yin Huai <[email protected]>

Closes apache#10288 from yhuai/handleCorruptJson.
This commit is to resolve SPARK-12396.

Author: echo2mei <[email protected]>

Closes apache#10354 from echoTomei/master.
For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null).

The order of columns had been changed to match that with MySQL and PostgreSQL [1].

This PR also fix the nullability of output for outer join.

[1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html

Author: Davies Liu <[email protected]>

Closes apache#10353 from davies/fix_join.
Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc.

Author: Yanbo Liang <[email protected]>

Closes apache#10349 from yanboliang/text-value.
…pecial characters

This PR encodes and decodes the file name to fix the issue.

Author: Shixiong Zhu <[email protected]>

Closes apache#10208 from zsxwing/uri.
… server

Fix problem with apache#10332, this one should fix Cluster mode on Mesos

Author: Iulian Dragos <[email protected]>

Closes apache#10359 from dragos/issue/fix-spark-12345-one-more-time.
…split

String.split accepts a regular expression, so we should escape "." and "|".

Author: Shixiong Zhu <[email protected]>

Closes apache#10361 from zsxwing/reg-bug.
…are not found

Point users to spark-packages.org to find them.

Author: Reynold Xin <[email protected]>

Closes apache#10351 from rxin/SPARK-12397.
…erInvariantEquals method

org.apache.spark.streaming.Java8APISuite.java is failing due to trying to sort immutable list in assertOrderInvariantEquals method.

Author: Evan Chen <[email protected]>

Closes apache#10336 from evanyc15/SPARK-12376-StreamingJavaAPISuite.
This PR removes Hive windows functions from Spark and replaces them with (native) Spark ones. The PR is on par with Hive in terms of features.

This has the following advantages:
* Better memory management.
* The ability to use spark UDAFs in Window functions.

cc rxin / yhuai

Author: Herman van Hovell <[email protected]>

Closes apache#9819 from hvanhovell/SPARK-8641-2.
@jayv
Copy link
Contributor Author

jayv commented Dec 18, 2015

New PR against master: #10370

@jayv jayv closed this Dec 18, 2015
asfgit pushed a commit that referenced this pull request Mar 31, 2016
…bmit request

Supersedes #9752

Author: Jo Voordeckers <[email protected]>
Author: Iulian Dragos <[email protected]>

Closes #10370 from jayv/mesos_cluster_params.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.