[Spark-1134] only call ipython if no arguments are given; remove IPYTHONOPTS from call #227

dianacarroll · 2014-03-25T16:53:33Z

see comments on Pull Request #38
(i couldn't figure out how to modify an existing pull request, so I'm hoping I can withdraw that one and replace it with this one.)

…only call ipython if no command line arguments were supplied

AmplabJenkins · 2014-03-25T17:11:22Z

Merged build triggered.

AmplabJenkins · 2014-03-25T17:11:22Z

Merged build started.

AmplabJenkins · 2014-03-25T18:11:12Z

Merged build finished.

AmplabJenkins · 2014-03-25T18:11:12Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13436/

kayousterhout · 2014-03-25T18:16:46Z

You can modify your old pull request by pushing new code to the branch you made that pull request from (dianacarroll:master) and github will automatically add the new commits to the pull request.

dianacarroll · 2014-03-25T18:25:53Z

Thanks. However, the problem is that the pull request in question was
based on the master branch instead of separate branch. Anytime I try to
push something to my master branch, it gets pushed into that pull request.
Therefore I can't push anything to master anymore, otherwise it tries to
include all those merges into this pull request. I figured the only hope
was to abandon that pull request and create a new one that is based on a
specific branch instead of master. I think I may just have to wait for all
my current pull requests to get merged in, then delete my whole fork and
start over.

On Tue, Mar 25, 2014 at 2:16 PM, Kay Ousterhout [email protected]:

You can modify your old pull request by pushing new code to the branch you
made that pull request from (dianacarroll:master) and github will
automatically add the new commits to the pull request.

Reply to this email directly or view it on GitHubhttps://github.com//pull/227#issuecomment-38600984
.

kayousterhout · 2014-03-25T18:40:24Z

Gotcha -- yeah as you've said usually having a separate branch for each pull request is the way to go. You shouldn't need to delete your whole fork though -- now that you've closed the pull request that depends on your master branch you should be good to go!

AmplabJenkins · 2014-03-28T01:54:36Z

Can one of the admins verify this patch?

mateiz · 2014-04-02T02:27:52Z

Jenkins, test this please

mateiz · 2014-04-02T02:28:18Z

Actually I guess Jenkins already tested it. I'll merge it. Thanks for the patch!

AmplabJenkins · 2014-04-02T02:32:24Z

Merged build triggered.

AmplabJenkins · 2014-04-02T02:32:34Z

Merged build started.

mateiz · 2014-04-02T02:35:58Z

Actually sorry, I didn't look at this closely enough. I don't think removing IPYTHON_OPTS is right here -- what Josh wanted was to pass on the command-line options ($@) to IPython instead of leaving them out and passing $IPYTHON_OPTS. We do need to pass options to IPython occasionally, e.g. to launch the IPython Notebook (which happens when you do ipython --notebook).

I'll make a pull request that does that based on your branch. I've reverted the current one because I didn't want to disable IPython Notebook and other options at this moment.

AmplabJenkins · 2014-04-02T03:29:31Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-02T03:29:31Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13659/

@dianacarroll

This is based on @dianacarroll's previous pull request #227, and @JoshRosen's comments on #38. Since we do want to allow passing arguments to IPython, this does the following: * It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.) * If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`. * The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it. This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython. @JoshRosen you should probably take the final call on this. Author: Diana Carroll <[email protected]> Closes #294 from mateiz/spark-1134 and squashes the following commits: 747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied (cherry picked from commit a599e43) Signed-off-by: Matei Zaharia <[email protected]>

@dianacarroll

This is based on @dianacarroll's previous pull request #227, and @JoshRosen's comments on #38. Since we do want to allow passing arguments to IPython, this does the following: * It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.) * If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`. * The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it. This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython. @JoshRosen you should probably take the final call on this. Author: Diana Carroll <[email protected]> Closes #294 from mateiz/spark-1134 and squashes the following commits: 747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied

Fix small bug in web UI and minor clean-up. There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes. (cherry picked from commit 182f9ba) Signed-off-by: Patrick Wendell <[email protected]>

…HONOPTS from call see comments on Pull Request apache#38 (i couldn't figure out how to modify an existing pull request, so I'm hoping I can withdraw that one and replace it with this one.) Author: Diana Carroll <[email protected]> Closes apache#227 from dianacarroll/spark-1134 and squashes the following commits: ffe47f2 [Diana Carroll] [spark-1134] remove ipythonopts from ipython command b673bf7 [Diana Carroll] Merge branch 'master' of github.com:apache/spark 0309cf9 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied

@dianacarroll

This is based on @dianacarroll's previous pull request apache#227, and @JoshRosen's comments on apache#38. Since we do want to allow passing arguments to IPython, this does the following: * It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.) * If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`. * The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it. This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython. @JoshRosen you should probably take the final call on this. Author: Diana Carroll <[email protected]> Closes apache#294 from mateiz/spark-1134 and squashes the following commits: 747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied

## What changes were proposed in this pull request? Main changes: - Move FilterPushdown.scala under the pushdown package and make it reuse some of the helper functions there (e.g. wrap, block) - Add support for more expressions: StartsWith, EndsWith, Contains, AND, OR, NOT, IN - Add parenthesis around all basic predicates and reapprove affected tests. ## How was this patch tested? Ran all unit tests and `RedshiftReadIntegrationSuite.scala` Author: Adrian Ionescu <[email protected]> Closes apache#227 from adrian-ionescu/redshift-basic-pushdown.

…227)

…pache#227)

…n-client mode (apache#227)

* Add Octavia devstack configuration There is a lbaas devstack configuration that can enable Octavia through neutron-lbaas. However, the neutron-lbaas is deprecated so we need a new task for enabling Octavia as a standalone service. Related-Bug: theopenlab/openlab-zuul-jobs#143

…n-client mode (apache#227)

…or on driver (apache#227) * [HADP-43018] Disable rack resolve when registering executor on driver (apache#388) (apache#74) Make `YarnClusterScheduler` to extend `TaskSchedulerImpl` rather than `YarnScheduler` such that rack resolve is disabled. We've seen driver stuck in following thread with larger number of executors registering. Since we don't need rack info for locality, add a config to disable rack resolve by default, which could possibly eliminate the bottleneck in driver. ``` "dispatcher-event-loop-15" apache#50 daemon prio=5 os_prio=0 tid=0x00007f751a394000 nid=0x11953 runnable [0x00007f74c6290000] java.lang.Thread.State: RUNNABLE at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at java.net.InetAddress.getAllByName0(InetAddress.java:1277) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at java.net.InetAddress.getByName(InetAddress.java:1077) at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563) at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:580) at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109) at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101) at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81) at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:37) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:329) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:318) ``` No Add UT. I've run a test https://bdp.vip.ebay.com/job/detail/?cluster=apollorno&jobType=SPARK&jobId=application_1635906065713_321559&tab=0 on apollorno. The test succeeded with 16612 executors and many executor failed to register. This patch could improve driver performance but it will still run into bottleneck when there are too many executors registering at the same time. ``` 21/11/08 07:40:19 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 21/11/08 07:42:19 ERROR TransportChannelHandler: Connection to hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong. 21/11/08 07:42:19 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 is closed 21/11/08 07:42:19 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connection from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 closed 21/11/08 07:42:19 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 disassociated! Shutting down. 21/11/08 07:42:19 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Cannot register with driver: spark://CoarseGrainedScheduler@hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 in 120 seconds. This timeout is controlled by spark.network.timeout ``` Co-authored-by: tianlzhang <[email protected]> Co-authored-by: yujli <[email protected]> Co-authored-by: tianlzhang <[email protected]>

Diana Carroll added 3 commits February 27, 2014 17:39

SPARK-1134 bug with ipython prevents non-interactive use with spark; …

0309cf9

…only call ipython if no command line arguments were supplied

Merge branch 'master' of github.com:apache/spark

b673bf7

[spark-1134] remove ipythonopts from ipython command

ffe47f2

asfgit closed this in afb5ea6 Apr 2, 2014

mateiz mentioned this pull request Apr 2, 2014

[SPARK-1134] Fix and document passing of arguments to IPython #294

Closed

mccheah referenced this pull request in palantir/spark Apr 27, 2017

Driver submission with mounting dependencies from the staging server (#…

04afcf8

…227)

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017

Driver submission with mounting dependencies from the staging server (a…

7c29732

…pache#227)

mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 12, 2017

[SPARK-21400] Don't overwrite output committers on append (apache#227)

fac457d

jamesrgrinter pushed a commit to jamesrgrinter/spark that referenced this pull request Apr 22, 2018

MapR [SPARK-153] Exception in spark job with configured labels on yar…

cca410c

…n-client mode (apache#227)

Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018

-Dspark args no longer supported (apache#227)

456523a

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

MapR [SPARK-153] Exception in spark job with configured labels on yar…

d237601

…n-client mode (apache#227)

[Spark-1134] only call ipython if no arguments are given; remove IPYTHONOPTS from call #227

[Spark-1134] only call ipython if no arguments are given; remove IPYTHONOPTS from call #227

Uh oh!

Conversation

dianacarroll commented Mar 25, 2014

Uh oh!

AmplabJenkins commented Mar 25, 2014

Uh oh!

AmplabJenkins commented Mar 25, 2014

Uh oh!

AmplabJenkins commented Mar 25, 2014

Uh oh!

AmplabJenkins commented Mar 25, 2014

Uh oh!

kayousterhout commented Mar 25, 2014

Uh oh!

dianacarroll commented Mar 25, 2014

Uh oh!

kayousterhout commented Mar 25, 2014

Uh oh!

AmplabJenkins commented Mar 28, 2014

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

mateiz commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

AmplabJenkins commented Apr 2, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants