Skip to content

Conversation

@dianacarroll
Copy link

see comments on Pull Request #38
(i couldn't figure out how to modify an existing pull request, so I'm hoping I can withdraw that one and replace it with this one.)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13436/

@kayousterhout
Copy link
Contributor

You can modify your old pull request by pushing new code to the branch you made that pull request from (dianacarroll:master) and github will automatically add the new commits to the pull request.

@dianacarroll
Copy link
Author

Thanks. However, the problem is that the pull request in question was
based on the master branch instead of separate branch. Anytime I try to
push something to my master branch, it gets pushed into that pull request.
Therefore I can't push anything to master anymore, otherwise it tries to
include all those merges into this pull request. I figured the only hope
was to abandon that pull request and create a new one that is based on a
specific branch instead of master. I think I may just have to wait for all
my current pull requests to get merged in, then delete my whole fork and
start over.

On Tue, Mar 25, 2014 at 2:16 PM, Kay Ousterhout [email protected]:

You can modify your old pull request by pushing new code to the branch you
made that pull request from (dianacarroll:master) and github will
automatically add the new commits to the pull request.

Reply to this email directly or view it on GitHubhttps://github.com//pull/227#issuecomment-38600984
.

@kayousterhout
Copy link
Contributor

Gotcha -- yeah as you've said usually having a separate branch for each pull request is the way to go. You shouldn't need to delete your whole fork though -- now that you've closed the pull request that depends on your master branch you should be good to go!

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mateiz
Copy link
Contributor

mateiz commented Apr 2, 2014

Jenkins, test this please

@mateiz
Copy link
Contributor

mateiz commented Apr 2, 2014

Actually I guess Jenkins already tested it. I'll merge it. Thanks for the patch!

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@mateiz
Copy link
Contributor

mateiz commented Apr 2, 2014

Actually sorry, I didn't look at this closely enough. I don't think removing IPYTHON_OPTS is right here -- what Josh wanted was to pass on the command-line options ($@) to IPython instead of leaving them out and passing $IPYTHON_OPTS. We do need to pass options to IPython occasionally, e.g. to launch the IPython Notebook (which happens when you do ipython --notebook).

I'll make a pull request that does that based on your branch. I've reverted the current one because I didn't want to disable IPython Notebook and other options at this moment.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13659/

asfgit pushed a commit that referenced this pull request Apr 3, 2014
This is based on @dianacarroll's previous pull request #227, and @JoshRosen's comments on #38. Since we do want to allow passing arguments to IPython, this does the following:
* It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.)
* If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`.
* The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it.

This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython.

@JoshRosen you should probably take the final call on this.

Author: Diana Carroll <[email protected]>

Closes #294 from mateiz/spark-1134 and squashes the following commits:

747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied

(cherry picked from commit a599e43)
Signed-off-by: Matei Zaharia <[email protected]>
asfgit pushed a commit that referenced this pull request Apr 3, 2014
This is based on @dianacarroll's previous pull request #227, and @JoshRosen's comments on #38. Since we do want to allow passing arguments to IPython, this does the following:
* It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.)
* If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`.
* The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it.

This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython.

@JoshRosen you should probably take the final call on this.

Author: Diana Carroll <[email protected]>

Closes #294 from mateiz/spark-1134 and squashes the following commits:

747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied
jhartlaub referenced this pull request in jhartlaub/spark May 27, 2014
Fix small bug in web UI and minor clean-up.

There was a bug where sorting order didn't work correctly for write time metrics.

I also cleaned up some earlier code that fixed the same issue for read and
write bytes.
(cherry picked from commit 182f9ba)

Signed-off-by: Patrick Wendell <[email protected]>
jhartlaub referenced this pull request in jhartlaub/spark May 27, 2014
Fix small bug in web UI and minor clean-up.

There was a bug where sorting order didn't work correctly for write time metrics.

I also cleaned up some earlier code that fixed the same issue for read and
write bytes.
(cherry picked from commit 182f9ba)

Signed-off-by: Patrick Wendell <[email protected]>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…HONOPTS from call

see comments on Pull Request apache#38
(i couldn't figure out how to modify an existing pull request, so I'm hoping I can withdraw that one and replace it with this one.)

Author: Diana Carroll <[email protected]>

Closes apache#227 from dianacarroll/spark-1134 and squashes the following commits:

ffe47f2 [Diana Carroll] [spark-1134] remove ipythonopts from ipython command
b673bf7 [Diana Carroll] Merge branch 'master' of github.com:apache/spark
0309cf9 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This is based on @dianacarroll's previous pull request apache#227, and @JoshRosen's comments on apache#38. Since we do want to allow passing arguments to IPython, this does the following:
* It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see ipython/ipython#5226, but no released version has that fix.)
* If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`.
* The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it.

This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes ipython/ipython#5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython.

@JoshRosen you should probably take the final call on this.

Author: Diana Carroll <[email protected]>

Closes apache#294 from mateiz/spark-1134 and squashes the following commits:

747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied
liancheng pushed a commit to liancheng/spark that referenced this pull request Mar 17, 2017
## What changes were proposed in this pull request?

Main changes:

- Move FilterPushdown.scala under the pushdown package and make it reuse some of the helper functions there (e.g. wrap, block)
- Add support for more expressions: StartsWith, EndsWith, Contains, AND, OR, NOT, IN
- Add parenthesis around all basic predicates and reapprove affected tests.

## How was this patch tested?

Ran all unit tests and `RedshiftReadIntegrationSuite.scala`

Author: Adrian Ionescu <[email protected]>

Closes apache#227 from adrian-ionescu/redshift-basic-pushdown.
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 12, 2017
jamesrgrinter pushed a commit to jamesrgrinter/spark that referenced this pull request Apr 22, 2018
Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
* Add Octavia devstack configuration

There is a lbaas devstack configuration that can enable Octavia
through neutron-lbaas. However, the neutron-lbaas is deprecated
so we need a new task for enabling Octavia as a standalone service.

Related-Bug: theopenlab/openlab-zuul-jobs#143
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…or on driver (apache#227)

* [HADP-43018] Disable rack resolve when registering executor on driver (apache#388) (apache#74)

Make `YarnClusterScheduler` to extend `TaskSchedulerImpl` rather than `YarnScheduler` such that rack resolve is disabled.

We've seen driver stuck in following thread with larger number of executors registering. Since we don't need rack info for locality, add a config to disable rack resolve by default, which could possibly eliminate the bottleneck in driver.

```
"dispatcher-event-loop-15" apache#50 daemon prio=5 os_prio=0 tid=0x00007f751a394000 nid=0x11953 runnable [0x00007f74c6290000]
   java.lang.Thread.State: RUNNABLE
	at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at java.net.InetAddress.getByName(InetAddress.java:1077)
	at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563)
	at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:580)
	at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109)
	at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
	at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
	at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:37)
	at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:329)
	at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:318)
```

No

Add UT.

I've run a test https://bdp.vip.ebay.com/job/detail/?cluster=apollorno&jobType=SPARK&jobId=application_1635906065713_321559&tab=0 on apollorno. The test succeeded with 16612 executors and many executor failed to register. This patch could improve driver performance but it will still run into bottleneck when there are too many executors registering at the same time.

```
21/11/08 07:40:19 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201
21/11/08 07:42:19 ERROR TransportChannelHandler: Connection to hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong.
21/11/08 07:42:19 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 is closed
21/11/08 07:42:19 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connection from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 closed
21/11/08 07:42:19 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 disassociated! Shutting down.
21/11/08 07:42:19 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Cannot register with driver: spark://CoarseGrainedScheduler@hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 in 120 seconds. This timeout is controlled by spark.network.timeout
```

Co-authored-by: tianlzhang <[email protected]>

Co-authored-by: yujli <[email protected]>
Co-authored-by: tianlzhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants