-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-47051][INFRA] Create a new test pipeline for yarn and connect
#45107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you review this test pipeline adjustment PR, @HyukjinKwon ? |
| connect, protobuf | ||
| kubernetes, hadoop-cloud, spark-ganglia-lgpl, protobuf | ||
| - >- | ||
| yarn, connect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm ... Does separating the build help the stabilization? I am not against this change but splitting this will cause 30mins time for building. Although it runs in parallel, individual fork has some resource limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I also was reluctant due to that building time. Specifically, It will take 34 minutes in total run time. The second goal is to reduce the re-trigger time. Previously, YARN/Connect/Kafka modules build a bad synergy because their failure rates are multiplied. And, I needed to re-trigger and wait over 70 ~ 90minutes for this pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm.. I think we should actually better fix the flakiness .. I am fine with this as a temporary change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I hoped so too. For now, SparkSessionE2ESuite flakiness and YarnClusterSuite seems to be abandoned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side, we don't use YARN and Spark Connect.
|
Merged to master for now. |
### What changes were proposed in this pull request? We have been providing a dedicated test environment for `yarn` and `connect` module because they are flaky. - #45107 However, they are still flaky. So, this PR aims to run `yarn` test only in PR builders (if needed) and Daily CIs (always). - Reduce the irrelevant re-tries by triggering `YARN CI` only when we need to test `YARN` module. - Protect YARN CI from `connect` flakiness by providing an independent GitHub Action environment in PR Builders and Daily CIs. - Lastly, commit builder will offload YARN module tests to the daily CIs ### Why are the changes needed? - PR builders provide an extensive test coverage with YARN testing. - Daily CIs with YARN tests - NON-ANSI CI: https://github.com/apache/spark/actions/workflows/build_non_ansi.yml (1AM) - Java 21 SBT CI: https://github.com/apache/spark/actions/workflows/build_java21.yml (4AM) - RockDB UI CI: https://github.com/apache/spark/actions/workflows/build_rockdb_as_ui_backend.yml (6AM) - Maven Java 17 CI: https://github.com/apache/spark/actions/workflows/build_maven.yml (1PM) - Maven Java 21 CI: https://github.com/apache/spark/actions/workflows/build_maven_java21.yml (2PM) - Maven Java 21 on AppleSilicon CI: https://github.com/apache/spark/actions/workflows/build_maven_java21_macos14.yml (8PM every two days) - YARN CI has been flaky in GitHub Action environment and requires irrelevant re-tries very frequently. - https://github.com/apache/spark/actions/runs/8962451417/job/24611353908 (2024-05-05) - https://github.com/apache/spark/actions/runs/8962440192/job/24611326971 (2024-05-05) ``` [info] *** 6 TESTS FAILED *** [error] Failed tests: [error] org.apache.spark.deploy.yarn.YarnClusterSuite [error] (yarn / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` <img width="544" alt="Screenshot 2024-05-05 at 20 12 28" src="https://github.com/apache/spark/assets/9700541/cbf9fb03-fc4c-4513-b5e5-158c3c9a085a"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46395 from dongjoon-hyun/SPARK-48137. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to merge connect back to the original test pipeline to reduce the maximum concurrency of GitHub Action by one. - https://infra.apache.org/github-actions-policy.html > All workflows SHOULD have a job concurrency level less than or equal to 15. ### Why are the changes needed? This is a partial recover from the following. - #45107 We stabilized the root cause of #45107 via the following PRs. In addition we will disable a flaky test case if exists. - #46395 - #46396 - #46425 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46441 from dongjoon-hyun/SPARK-48174. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>


What changes were proposed in this pull request?
This PR aims to spin off
yarnandconnectas a new test pipeline for the following:yarnmodule change and avoid triggering other module's tests like Kafka module.connectmodule change and avoid triggering other module's tests like Kafka module.Why are the changes needed?
These two modules are known to be flaky in various GitHub Action CI pipelines.
YarnClusterSuite)SparkSessionE2ESuite)SparkSessionE2ESuite)SparkSessionE2ESuite)SparkSessionE2ESuite)SparkConnectServiceSuite)Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the CIs in this PR.
Was this patch authored or co-authored using generative AI tooling?
No.