-
Notifications
You must be signed in to change notification settings - Fork 28.9k
SPARK-1469: Scheduler mode should accept lower-case definitions and have... #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
@pwendell can you review this ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start, but if you look at the JIRA, this exception won't actually echo back to the user the name they provided, which is bad form. I think you should capture the argument the user provided first then echo it back to them:
private val schedulingModeConf = conf.get("spark.scheduler.mode", "FIFO")
val schedulingMode: SchedulingMode = try {
SchedulingMode.withName(schedulingModeConf).toUpperCase)
} catch {
case e: java.util.NoSuchElementException =>
throw new SparkException(s"unrecognized spark.scheduler.mode: $schedulingModeConf")
}
Don't even bother re-sending the NoSuchElementException... it doesn't convey anything useful to the user.
…ave nicer error messages There are two improvements to Scheduler Mode: 1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO). 2. If an invalid mode is given we should print a better error message.
|
Jenkins, test this please. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Cool - thanks for this! |
|
I've merged this. |
…ave... ... nicer error messages There are two improvements to Scheduler Mode: 1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO). 2. If an invalid mode is given we should print a better error message. Author: Sandeep <[email protected]> Closes #388 from techaddict/1469 and squashes the following commits: a31bbd5 [Sandeep] SPARK-1469: Scheduler mode should accept lower-case definitions and have nicer error messages There are two improvements to Scheduler Mode: 1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO). 2. If an invalid mode is given we should print a better error message. (cherry picked from commit e269c24) Signed-off-by: Patrick Wendell <[email protected]>
…ave... ... nicer error messages There are two improvements to Scheduler Mode: 1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO). 2. If an invalid mode is given we should print a better error message. Author: Sandeep <[email protected]> Closes apache#388 from techaddict/1469 and squashes the following commits: a31bbd5 [Sandeep] SPARK-1469: Scheduler mode should accept lower-case definitions and have nicer error messages There are two improvements to Scheduler Mode: 1. Made the built in ones case insensitive (fair/FAIR, fifo/FIFO). 2. If an invalid mode is given we should print a better error message.
…ernetes-g Branch 2.2 kubernetes
Change flavor to boot server in FusionCloud job
* KE-34191 replace partition Table path * change pom version
…or on driver (apache#227) * [HADP-43018] Disable rack resolve when registering executor on driver (apache#388) (apache#74) Make `YarnClusterScheduler` to extend `TaskSchedulerImpl` rather than `YarnScheduler` such that rack resolve is disabled. We've seen driver stuck in following thread with larger number of executors registering. Since we don't need rack info for locality, add a config to disable rack resolve by default, which could possibly eliminate the bottleneck in driver. ``` "dispatcher-event-loop-15" apache#50 daemon prio=5 os_prio=0 tid=0x00007f751a394000 nid=0x11953 runnable [0x00007f74c6290000] java.lang.Thread.State: RUNNABLE at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at java.net.InetAddress.getAllByName0(InetAddress.java:1277) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at java.net.InetAddress.getByName(InetAddress.java:1077) at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563) at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:580) at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109) at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101) at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81) at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:37) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:329) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:318) ``` No Add UT. I've run a test https://bdp.vip.ebay.com/job/detail/?cluster=apollorno&jobType=SPARK&jobId=application_1635906065713_321559&tab=0 on apollorno. The test succeeded with 16612 executors and many executor failed to register. This patch could improve driver performance but it will still run into bottleneck when there are too many executors registering at the same time. ``` 21/11/08 07:40:19 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 21/11/08 07:42:19 ERROR TransportChannelHandler: Connection to hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong. 21/11/08 07:42:19 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 is closed 21/11/08 07:42:19 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connection from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com/10.78.173.174:30201 closed 21/11/08 07:42:19 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 disassociated! Shutting down. 21/11/08 07:42:19 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Cannot register with driver: spark://CoarseGrainedScheduler@hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply from hdc42-mcc10-01-0910-2704-050-tess0028.stratus.rno.ebay.com:30201 in 120 seconds. This timeout is controlled by spark.network.timeout ``` Co-authored-by: tianlzhang <[email protected]> Co-authored-by: yujli <[email protected]> Co-authored-by: tianlzhang <[email protected]>
…en syntax error for boolean type inputs (apache#388) ### What changes were proposed in this pull request? This PR fixes an issue where `BitwiseCount` / `bit_count` of boolean inputs would cause codegen to generate syntactically invalid Java code that does not compile, triggering errors like ``` java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Unexpected token "if" in primary ``` Even though this code has test cases in `bitwise.sql` via the query test framework, those existing test cases were insufficient to find this problem: I believe that is because the example queries were constant-folded using the interpreted path, leaving the codegen path without test coverage. This PR fixes the codegen issue and adds explicit expression tests to ensure that the same tests run on both the codegen and interpreted paths. ### Why are the changes needed? Fix a rare codegen to interpreted fallback issue, which may harm query performance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added new test cases to BitwiseExpressionsSuite.scala, copied from the existing `bitwise.sql` query test case file. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46382 from JoshRosen/SPARK-48128-bit_count_codegen. Authored-by: Josh Rosen <[email protected]> (cherry picked from commit 96f65c9) Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: Josh Rosen <[email protected]>
... nicer error messages
There are two improvements to Scheduler Mode: