Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jan 22, 2023

What changes were proposed in this pull request?

Like our documentation, spark.sheduler.mode=FAIR should provide a FAIR Scheduling Within an Application.

https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

Screenshot 2023-01-22 at 2 59 22 PM

This bug is hidden in our CI because we have fairscheduler.xml always as one of test resources.

Why are the changes needed?

Currently, when spark.scheduler.mode=FAIR is given without scheduler allocation file, Spark creates Fair Scheduler Pools with FIFO scheduler which is wrong. We need to switch the mode to FAIR from FIFO.

BEFORE

$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:47:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/01/22 14:47:38 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
Spark context Web UI available at http://localhost:4040

Screenshot 2023-01-22 at 2 50 38 PM

AFTER

$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:48:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040

Screenshot 2023-01-22 at 2 50 14 PM

Does this PR introduce any user-facing change?

Yes, but this is a bug fix to match with Apache Spark official documentation.

How was this patch tested?

Pass the CIs.

@github-actions github-actions bot added the CORE label Jan 22, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-42157][CORE] spark.scheduler.mode=FAIR should provide FAIR scheduler [SPARK-42157][CORE] spark.scheduler.mode=FAIR should provide FAIR scheduler Jan 22, 2023
-->

<allocations>
<pool name="default">
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<allocations>
<pool name="default">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<pool name="default">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>0</minShare>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


val schedulerAllocFile = sc.conf.get(SCHEDULER_ALLOCATION_FILE)
val DEFAULT_SCHEDULER_FILE = "fairscheduler.xml"
val DEFAULT_SCHEDULER_TEMPLATE_FILE = "fairscheduler-default.xml.template"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid any conflicts in the existing production jobs, this PR provide and use new file as .xml.template.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review this, @mridulm ? This bug was hidden and difficult to test in the unit test environment because we have fairscheduler.xml test resource.

https://github.com/apache/spark/blob/master/core/src/test/resources/fairscheduler.xml

<weight>1</weight>
<minShare>0</minShare>
</pool>
</allocations>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a conf/fairscheduler.xml.template - why do we need this ?
If it is for testing, move it as a resource there instead of in conf ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not for testing, @mridulm . As mentioned in #39703 (review), we already have a testing resource, fairscheduler.xml, not a template.

In addition, the content of conf/fairscheduler.xml.template is not matched with the expected default behavior.

s"FIFO order. To use fair scheduling, configure pools in $DEFAULT_SCHEDULER_FILE " +
s"or set ${SCHEDULER_ALLOCATION_FILE.key} to a file that contains the configuration.")
None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be relying on template file - in deployments, template file can be invalid - admin's are not expecting it to be read by spark.

Instead, why not simply rely on returning None here ?

Note - if this is only for testing, we can special case it that way via spark.testing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, this is not a testing issue. As I wrote in the PR description, our documentation is wrong. It says spark.scheduler.mode=FAIR will return a FAIR scheduler. However, we are getting FIFO scheduler now.

Note - if this is only for testing, we can special case it that way via spark.testing

None is the previous behavior which ends up with FIFO scheduler with the WARNING message, 23/01/22 14:47:38 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.

Instead, why not simply rely on returning None here ?

Got it. I understand your point about the template file. The reason why I tried to use template file is that I cannot put the real fairscheduler.xml file because it can be used already in the production.

We should not be relying on template file - in deployments, template file can be invalid - admin's are not expecting it to be read by spark.

@mridulm
Copy link
Contributor

mridulm commented Jan 23, 2023

Looks like I misunderstood the PR, I see what you mean @dongjoon-hyun.
I am not sure what is a good way to make progress here ... let me think about it more.

+CC @tgravescs, @Ngone51 in case you have thoughts.

@dongjoon-hyun
Copy link
Member Author

No problem. I totally understand your concern on the usage of template file. I'll also think about a new way. Thank you for your thoughtful review, @mridulm .

@tgravescs
Copy link
Contributor

I haven't used FAIR Scheduler much, but was wondering can we just have the defaults be in the code vs having to read a separate template file?

ie if no file
rootPool.addSchedulable(new Pool("default", DEFAULT_SCHEDULING_MODE, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT))

@dongjoon-hyun
Copy link
Member Author

Thank you, @tgravescs . Yes, I agree with you to have the defaults in the code.

@dongjoon-hyun
Copy link
Member Author

I address the comments. Could you review this once more, @mridulm and @tgravescs ?

@dongjoon-hyun
Copy link
Member Author

All tests passed.
Screenshot 2023-01-23 at 8 04 29 PM

@dongjoon-hyun
Copy link
Member Author

Could you review this, @Ngone51 ?

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I was essentially toying with the same idea Tom had - but wanted to explore alternatives.
Unfortunately, could not come up with anything better

@dongjoon-hyun
Copy link
Member Author

Thank you, @mridulm !

dongjoon-hyun added a commit that referenced this pull request Jan 24, 2023
…cheduler

### What changes were proposed in this pull request?

Like our documentation, `spark.sheduler.mode=FAIR` should provide a `FAIR Scheduling Within an Application`.

https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

![Screenshot 2023-01-22 at 2 59 22 PM](https://user-images.githubusercontent.com/9700541/213944956-931e3a3c-d094-4455-8990-233c7966194b.png)

This bug is hidden in our CI because we have `fairscheduler.xml` always as one of test resources.
- https://github.com/apache/spark/blob/master/core/src/test/resources/fairscheduler.xml

### Why are the changes needed?

Currently, when `spark.scheduler.mode=FAIR` is given without scheduler allocation file, Spark creates `Fair Scheduler Pools` with `FIFO` scheduler which is wrong. We need to switch the mode to `FAIR` from `FIFO`.

**BEFORE**
```
$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:47:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/01/22 14:47:38 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
Spark context Web UI available at http://localhost:4040
```
![Screenshot 2023-01-22 at 2 50 38 PM](https://user-images.githubusercontent.com/9700541/213944555-6e367a33-ca58-4daf-9ba4-b0319fbe4516.png)

**AFTER**
```
$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:48:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
```
![Screenshot 2023-01-22 at 2 50 14 PM](https://user-images.githubusercontent.com/9700541/213944551-660aa298-638b-450c-ad61-db9e42a624b0.png)

### Does this PR introduce _any_ user-facing change?

Yes, but this is a bug fix to match with Apache Spark official documentation.

### How was this patch tested?

Pass the CIs.

Closes #39703 from dongjoon-hyun/SPARK-42157.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 4d51bfa)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Jan 24, 2023
…cheduler

### What changes were proposed in this pull request?

Like our documentation, `spark.sheduler.mode=FAIR` should provide a `FAIR Scheduling Within an Application`.

https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

![Screenshot 2023-01-22 at 2 59 22 PM](https://user-images.githubusercontent.com/9700541/213944956-931e3a3c-d094-4455-8990-233c7966194b.png)

This bug is hidden in our CI because we have `fairscheduler.xml` always as one of test resources.
- https://github.com/apache/spark/blob/master/core/src/test/resources/fairscheduler.xml

### Why are the changes needed?

Currently, when `spark.scheduler.mode=FAIR` is given without scheduler allocation file, Spark creates `Fair Scheduler Pools` with `FIFO` scheduler which is wrong. We need to switch the mode to `FAIR` from `FIFO`.

**BEFORE**
```
$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:47:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/01/22 14:47:38 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
Spark context Web UI available at http://localhost:4040
```
![Screenshot 2023-01-22 at 2 50 38 PM](https://user-images.githubusercontent.com/9700541/213944555-6e367a33-ca58-4daf-9ba4-b0319fbe4516.png)

**AFTER**
```
$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:48:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
```
![Screenshot 2023-01-22 at 2 50 14 PM](https://user-images.githubusercontent.com/9700541/213944551-660aa298-638b-450c-ad61-db9e42a624b0.png)

### Does this PR introduce _any_ user-facing change?

Yes, but this is a bug fix to match with Apache Spark official documentation.

### How was this patch tested?

Pass the CIs.

Closes #39703 from dongjoon-hyun/SPARK-42157.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 4d51bfa)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member Author

Merged to master/3.3/3.2.

@dongjoon-hyun
Copy link
Member Author

Thank you again, @mridulm and @tgravescs .

@dongjoon-hyun
Copy link
Member Author

cc @kazuyukitanimura since this lands at branch-3.2

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…cheduler

### What changes were proposed in this pull request?

Like our documentation, `spark.sheduler.mode=FAIR` should provide a `FAIR Scheduling Within an Application`.

https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

![Screenshot 2023-01-22 at 2 59 22 PM](https://user-images.githubusercontent.com/9700541/213944956-931e3a3c-d094-4455-8990-233c7966194b.png)

This bug is hidden in our CI because we have `fairscheduler.xml` always as one of test resources.
- https://github.com/apache/spark/blob/master/core/src/test/resources/fairscheduler.xml

### Why are the changes needed?

Currently, when `spark.scheduler.mode=FAIR` is given without scheduler allocation file, Spark creates `Fair Scheduler Pools` with `FIFO` scheduler which is wrong. We need to switch the mode to `FAIR` from `FIFO`.

**BEFORE**
```
$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:47:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/01/22 14:47:38 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
Spark context Web UI available at http://localhost:4040
```
![Screenshot 2023-01-22 at 2 50 38 PM](https://user-images.githubusercontent.com/9700541/213944555-6e367a33-ca58-4daf-9ba4-b0319fbe4516.png)

**AFTER**
```
$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:48:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
```
![Screenshot 2023-01-22 at 2 50 14 PM](https://user-images.githubusercontent.com/9700541/213944551-660aa298-638b-450c-ad61-db9e42a624b0.png)

### Does this PR introduce _any_ user-facing change?

Yes, but this is a bug fix to match with Apache Spark official documentation.

### How was this patch tested?

Pass the CIs.

Closes apache#39703 from dongjoon-hyun/SPARK-42157.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 4d51bfa)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants