-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-51243][CORE][ML] Configurable allow native BLAS #49986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
the current approach works with user should modify their Java command options for cases that create embedded |
|
cc @zhengruifeng @panbingkun, could you please take a look? and do you have a better idea of how to implement the configuration? |
|
I think this PR needs reviews from @srowen @WeichenXu123 and @luhenry |
| } | ||
|
|
||
| private def supplementBlasOptions(conf: SparkConf): Unit = { | ||
| conf.getOption("spark.ml.allowNativeBlas").foreach { allowNativeBlas => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether we can use a env variable like MKL_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed the codebase https://github.com/luhenry/netlib and found neither sys prop nor env to disable native blas loading, so I need to introduce a new one. do you mean env var is preferred over sys prop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I mean, does this have to propagate via sys property at all versus just being used as a config in the code directly? But probably it is hard to plumb through access to the config object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen I replied in #49986 (comment)
|
|
||
| private def supplementBlasOptions(conf: SparkConf): Unit = { | ||
| conf.getOption("spark.ml.allowNativeBlas").foreach { allowNativeBlas => | ||
| def supplement(key: OptionalConfigEntry[String]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is repeated so many times now - I wonder if a simple refactor is in order to have one 'supplement' function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for suggestion, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored by extracting a common method supplementJavaOpts
| javaOpts += s"-Djava.net.preferIPv6Addresses=${Utils.preferIPv6}" | ||
|
|
||
| sparkConf.getOption("spark.ml.allowNativeBlas").foreach { allowNativeBlas => | ||
| javaOpts += s"-Dspark.ml.allowNativeBlas=$allowNativeBlas" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do other resource managers like k8s need this? not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the impl almost follows how we process java.net.preferIPv6Addresses, I will do test on K8s and back here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do other resource managers like k8s need this? not sure
K8s does not need that change.
Spark on YARN
the code appends -Dspark.ml.allowNativeBlas=... to YARN AM process command, we should assemble a java command to let YARN RM know how to bootstrap the process
Spark on K8s
- client mode, no driver Pod
- cluster mode, run
spark-submit(which carries all Java options from localspark-submit) in the driver Pod
so it does not need to append those Java options again.
| if (_nativeBLAS == null) { | ||
| _nativeBLAS = | ||
| try { NetlibNativeBLAS.getInstance } catch { case _: Throwable => javaBLAS } | ||
| _nativeBLAS = System.getProperty("spark.ml.allowNativeBlas", "true") match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be a sys property because of how early it has to be initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I think we should propagate the conf via SparkConf, and change the method signature to
- def nativeBLAS: NetlibBLAS
+ def nativeBLAS(allowNative: Boolean): NetlibBLASbut I found many places call BLAS.nativeBLAS where SparkConf is unavailable, so I propose to use a sys property.
| ``` | ||
| sudo apt-get install libopenblas-base | ||
| sudo update-alternatives --config libblas.so.3 | ||
| sudo apt-get install libopenblas-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libopenblas-base is removed in Debian 12 and Ubuntu 24.04, libopenblas-dev should be used instead.
also update-alternatives --config libblas.so.3 does not work, and it's variant in different CPU-arch OS.
root@0bef5c80cdaa:/# update-alternatives --config libblas.so.3
update-alternatives: error: no alternatives for libblas.so.3
Given it already allows using -Ddev.ludovic.netlib.lapack.nativeLib=... to choose the native libraries, I would suggest eliminating how to use alternatives to manage the OS default library in our docs.
| private[spark] object BLAS extends Serializable with Logging { | ||
|
|
||
| @transient private var _javaBLAS: NetlibBLAS = _ | ||
| @transient private var _nativeBLAS: NetlibBLAS = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the duplicated instance creation and call org.apache.spark.ml.linalg.BLAS
| * ARPACK routines for MLlib's vectors and matrices. | ||
| */ | ||
| private[spark] object ARPACK extends Serializable { | ||
| private[spark] object ARPACK extends Serializable with Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I move ARPACK and LAPACK to mllib-local, to align with BLAS?
|
@srowen I addressed your previous comments(replied inline in each thread) and found some other issues during work on this PR:
would be great if you could have another look, thank you in advance. |
luhenry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while I haven't looked at this code, but overall it LGTM. Thanks for taking care of that!
|
@luhenry thanks for your review and approval. |
mllib-local/pom.xml
Outdated
| <groupId>org.apache.spark</groupId> | ||
| <artifactId>spark-tags_${scala.binary.version}</artifactId> | ||
| </dependency> | ||
| <dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still a bit worried about the dependency change.
Defer to @WeichenXu123 's review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dep change can be eliminated if logging is unnecessary.
|
Kindly ping @WeichenXu123 |
415443a to
f1b4f65
Compare
What changes were proposed in this pull request?
This PR proposes introducing a new configuration
spark.ml.allowNativeBlas, when setting tofalse, Spark always uses Java BLAS even when the native libraryopenblasormklis available on the machine.Why are the changes needed?
Currently, there are many places in the Spark codebase hardcoded to call
BLAS.nativeBLAS, whenNativeBLASis available, it always usesJNIBLAS, this generally is a good idea, but I found some negative cases in our internal ML workloads, thatJavaBLASis faster thanJNIBLAS, this might be caused by the native library (e.g.mkloropenblas) bugs or does not optimized for some hardware. Given that, I think we should allow users to disableNativeBLASexplicitly.The proposed
spark.ml.allowNativeBlasconfiguration does not strictly follow the Spark configuration system because thesparkConfis not always available on the caller side ofBLAS.nativeBLASDoes this PR introduce any user-facing change?
Add a new feature, but the default value keeps the current behavior, also update the docs to mention the added conf.
How was this patch tested?
Manual tests, will supply more context after reaching a consensus on how to expose the configuration to end users.
Was this patch authored or co-authored using generative AI tooling?
No.