Skip to content

Conversation

@techaddict
Copy link
Contributor

... nicer error messages

There are two improvements to Scheduler Mode:

  1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO).
  2. If an invalid mode is given we should print a better error message.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@techaddict
Copy link
Contributor Author

@pwendell can you review this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start, but if you look at the JIRA, this exception won't actually echo back to the user the name they provided, which is bad form. I think you should capture the argument the user provided first then echo it back to them:

private val schedulingModeConf = conf.get("spark.scheduler.mode", "FIFO")
val schedulingMode: SchedulingMode = try {
    SchedulingMode.withName(schedulingModeConf).toUpperCase)
  } catch {
   case e: java.util.NoSuchElementException =>
     throw new SparkException(s"unrecognized spark.scheduler.mode: $schedulingModeConf")
  }

Don't even bother re-sending the NoSuchElementException... it doesn't convey anything useful to the user.

…ave nicer error messages

There are  two improvements to Scheduler Mode:
1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO).
2. If an invalid mode is given we should print a better error message.
@pwendell
Copy link
Contributor

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14177/

@pwendell
Copy link
Contributor

Cool - thanks for this!

@pwendell
Copy link
Contributor

I've merged this.

@asfgit asfgit closed this in e269c24 Apr 16, 2014
asfgit pushed a commit that referenced this pull request Apr 16, 2014
…ave...

... nicer error messages

There are  two improvements to Scheduler Mode:
1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO).
2. If an invalid mode is given we should print a better error message.

Author: Sandeep <[email protected]>

Closes #388 from techaddict/1469 and squashes the following commits:

a31bbd5 [Sandeep] SPARK-1469: Scheduler mode should accept lower-case definitions and have nicer error messages There are  two improvements to Scheduler Mode: 1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO). 2. If an invalid mode is given we should print a better error message.
(cherry picked from commit e269c24)

Signed-off-by: Patrick Wendell <[email protected]>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…ave...

... nicer error messages

There are  two improvements to Scheduler Mode:
1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO).
2. If an invalid mode is given we should print a better error message.

Author: Sandeep <[email protected]>

Closes apache#388 from techaddict/1469 and squashes the following commits:

a31bbd5 [Sandeep] SPARK-1469: Scheduler mode should accept lower-case definitions and have nicer error messages There are  two improvements to Scheduler Mode: 1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO). 2. If an invalid mode is given we should print a better error message.
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Change flavor to boot server in FusionCloud job
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Mar 18, 2022
* KE-34191 replace partition Table path

* change pom version
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…or on driver (apache#227)

* [HADP-43018] Disable rack resolve when registering executor on driver (apache#388) (apache#74)

Make `YarnClusterScheduler` to extend `TaskSchedulerImpl` rather than `YarnScheduler` such that rack resolve is disabled.

We've seen driver stuck in following thread with larger number of executors registering. Since we don't need rack info for locality, add a config to disable rack resolve by default, which could possibly eliminate the bottleneck in driver.

```
"dispatcher-event-loop-15" apache#50 daemon prio=5 os_prio=0 tid=0x00007f751a394000 nid=0x11953 runnable [0x00007f74c6290000]
   java.lang.Thread.State: RUNNABLE
	at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at java.net.InetAddress.getByName(InetAddress.java:1077)
	at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563)
	at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:580)
	at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109)
	at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
	at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
	at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:37)
	at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:329)
	at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:318)
```

No

Add UT.

I've run a test https://bdp.vip.ebay.com/job/detail/?cluster=apollorno&jobType=SPARK&jobId=application_1635906065713_321559&tab=0 on apollorno. The test succeeded with 16612 executors and many executor failed to register. This patch could improve driver performance but it will still run into bottleneck when there are too many executors registering at the same time.

```
21/11/08 07:40:19 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201
21/11/08 07:42:19 ERROR TransportChannelHandler: Connection to hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong.
21/11/08 07:42:19 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 is closed
21/11/08 07:42:19 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connection from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 closed
21/11/08 07:42:19 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 disassociated! Shutting down.
21/11/08 07:42:19 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Cannot register with driver: spark://CoarseGrainedScheduler@hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 in 120 seconds. This timeout is controlled by spark.network.timeout
```

Co-authored-by: tianlzhang <[email protected]>

Co-authored-by: yujli <[email protected]>
Co-authored-by: tianlzhang <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…en syntax error for boolean type inputs (apache#388)

### What changes were proposed in this pull request?

This PR fixes an issue where `BitwiseCount` / `bit_count` of boolean inputs would cause codegen to generate syntactically invalid Java code that does not compile, triggering errors like

```
 java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Unexpected token "if" in primary
```

Even though this code has test cases in `bitwise.sql` via the query test framework, those existing test cases were insufficient to find this problem: I believe that is because the example queries were constant-folded using the interpreted path, leaving the codegen path without test coverage.

This PR fixes the codegen issue and adds explicit expression tests to ensure that the same tests run on both the codegen and interpreted paths.

### Why are the changes needed?

Fix a rare codegen to interpreted fallback issue, which may harm query performance.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added new test cases to BitwiseExpressionsSuite.scala, copied from the existing `bitwise.sql` query test case file.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46382 from JoshRosen/SPARK-48128-bit_count_codegen.

Authored-by: Josh Rosen <[email protected]>

(cherry picked from commit 96f65c9)

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Josh Rosen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants