-
Notifications
You must be signed in to change notification settings - Fork 28.9k
SPARK-5665 [DOCS] Update netlib-java documentation #4448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
@fommil Generally speaking you're best placed to document this aspect. I think some of the changes need to be backed out though:
|
|
@srowen can you please create the JIRA? I will inevitably fill out all the wrong fields anyway. I'll push a change to this PR in a moment. Re: JBLAS oh I didn't realise that. Why? Does it do anything that netlib-java/Breeze doesn't? My understanding is that it absolutely requires natives to be available (which is kind of weird for a java application and hence makes Spark a lot harder to setup/use), and packages its own builds of an unspecified version of BLAS (possibly ATLAS? which is even weirder). i.e. it doesn't use system optimised binaries (see my talk to find out why this is absolutely critical for high performance). FYI, I have Intel MKL on all my machines because it is ridiculously faster than the alternatives, and I am experimenting with GPU/APU/FPGA backends. Re: lpgl flag. ok, I'll add this back but it does sound like the flag is incorrectly named. "natives" would perhaps have been better, but, legacy, meh. Re: "OpenBLAS", no JBLAS packages its own (suboptimal) binaries. No runtime link to |
I am the author of netlib-java and I found this documentation to be out of date. Some main points: 1. Breeze has not depended on jBLAS for some time 2. netlib-java provides a pure JVM implementation 3. The licensing issue is not just about LGPL: optimised natives have proprietary licenses. 4. I really think it's best to direct people to my detailed setup guide instead of trying to compress it into one sentence. It is different for each architecture, each OS, and for each backend. I hope this helps to clear things up 😄
|
OK, I've just force pushed (or, I also had a look at mllib's use of JBLAS, and, well... I really wish somebody would pay me to look and this and optimise it because it could be a lot easier to setup AND faster by using Breeze / |
|
I opened https://issues.apache.org/jira/browse/SPARK-5665. If you would, update the title of this PR to JBLAS is used for |
|
@fommil big talk, I like it! I don't think pay is on offer, but glory is. Why not try a small improvement to demonstrate? |
|
@srowen updated title. Re: natives, absolutely not! The whole point of netlib-java is that it is a common API that is swapped at runtime for Java (never leave the JVM), Natives (self contained) or System Optimised Natives (requiring runtime I'd love to work on this, I really feel I could make a difference and I see no reason for JBLAS unless it is part of your API, but my list of projects is ever-increasing and I recently had to take on another project that I don't want because of a decision by typesafe to close down some sources that I was about to roll out to production. |
|
My primary FOSS focus right now is getting ENSIME to 1.0 and after that I have a few private projects demanding my time. |
|
It's not part of the API. I can't evaluate your claims but you seem pretty sure about them, but am not aware of the history why netlib-java wasn't used. I suppose these things don't change unless someone thinks they are worth changing. Will let @mengxr weigh in on the changes and ad for your presentation in the docs. |
|
Glory, heh, yeah right. I think it'd just get me in more trouble with my wife: I already spend too much of my personal time writing FOSS. In light of slick/slick#1052 I really don't feel like giving any more of my personal time to advancing the commercial gains of Typesafe without renumeration (recently I've lost two full weekends and two weeknights working around this). Perhaps @derekhenninger / @dickwall would be able to find a funder to pay for this work, which I would be happy to do in stages. |
|
@srowen I can guess at the history. I've done a lot of work benchmarking different BLAS/LAPACK routines (and Intel provide me with a gratis license, bless their souls, to aid with this) as well as other mathematics routines. I strongly recommend you watch my ScalaX talk to get a feel for what is behind this. I would also like to use the same maven build process to make more natively-enhanced mathematics routines available on the JVM (with a Java fallback, always), such as the random number generators that we use extensively in the financial services / investment sector. That's typically the bottleneck in option pricing (i.e. Black-Scholes) for example. |
|
@fommil Thanks for the update! As @srowen pointed out, MLlib still has some JBLAS code in the NNLS solver and data generators. We should be able to remove it in the next release to simplify our dependency. Does I would recommend removing the Sorry about the |
|
@mengxr I'm pretty sure that talk link is supposed to be permanent. If it isn't, I'll be having words with skillsmatters! 😄 I don't have a youtube link or anything for it, they want people to sign up for it. Re: "ad" for my talk, I definitely didn't make any money out of that talk, nor will I! I lost 3 days of good contracting rate to attend ScalaX 😉 You really want me to make that change or have I changed your mind? 😄 |
|
@mengxr sorry, forgot to answer re: gfortran. That's needed only for natives. All is documented on the netlib-java webpage. If I get a lot of Spark traffic, I'll create a custom wiki page for them. |
|
@mengxr "warn people about LGPL"! Ha! That old Apache FUD. Without being aware of the licenses in place, distributors should be more concerned about Intel / Apple / AMD / NVIDIA suing them for distributing their IPR... not worried about bundling some free software. BTW, are you aware that JBLAS links to libgfortran at runtime, which is also LGPL? |
|
@fommil I think you're misrepresenting the nature of the LGPL issue. It is materially different from the Apache License 2, and so bundling LGPL with Apache Licensed code means downstream consumers of Spark suddenly are bound by more than the restrictions of the AL2, and that would be surprising. It's a recipe for helping people accidentally violate the license of the LGPL code that would then be included, as it sure looks like the whole thing is AL2 (or work-alike licenses). It's out of respect for others' choice of licensing terms, which put restrictions on redistribution that an Apache project does not meet, that LGPL code can't be redistributed. Spark does not redistribute libgfortran. That's fine, the LGPL allows linking. Clearly there's no aversion to using LGPL software per se here. (Or do you think that the LGPL permits redistribution in a derivative work on the same terms as the AL2 -- is that what you mean the "FUD" is?) People should be aware of licenses and either comply with software's terms or not use it. As a vendor, we most certainly care quite a lot about getting it right. I can't agree that we should all bundle whatever free software we want, because you think it's only commercial software we have to worry about. I hope that people out there wouldn't disregard the terms under which I license my free software. |
|
@fommil I didn't mean that you were advertising netlib-java. I'm only worried about the link being "permanent" or not. We put two contracts here: 1) skillsmatters won't change the talk link, 2) you won't change the slides link. I would rather keep only one, given the fact that the talk page doesn't provide more content than what is in your slides. If later it hosts talk videos, you can simply put that link in your slides and redirect people there. Is it okay? For gfortran, is it true that if a user has native blas/lapack installed they must have gfortran installed already? If this is true, the current statement looks good to me. |
|
@srowen I've had a big discussion with Apache about this and it went nowhere, even though IP laywers stepped up to back up my stance. I firmly believe the Apache stance on LGPL is FUD and driven entirely by politics. In any case, my point in this particular case is that I find it truly ridiculous that Apache cared more about the LGPL than the genuine threat of distributing proprietary binaries with Spark... which somebody might do inadvertently because they "just copied all the libs that were needed for it to run on my machine" after checking that the Spark licensing rules were solid. @mengxr I'm teasing re: advertising 😛 I'll update to have only one link, but it'll be a real PITA for me to update the talk slides to point to the video (the org-mode/revealjs format doesn't allow arbitrary fields in the title slide). I should probably do it anyway. Re: gfortran... no there is absolutely no requirement for a system BLAS/LAPACK to depend on the GNU Fortran library. This is because JBLAS uses GFORTRAN-compiled binaries. I have Intel MKL and it definitely doesn't require gfortran to operate. |
|
@srowen btw, have you looked inside the JBLAS jars? You know there are LGPL binaries in there, right? Look for the 32 and 64 bit versions of |
|
@fommil I don't agree with how you think about licenses, but it's off topic for purposes here. But it's important if you're saying the JBLAS distro contains LGPL code. Are these shims that link to libgfortran, or libgfortran itself? I had assumed it is the former, or else, why does one have to install the lib locally? |
|
@srowen nope, it's the full LGPL thing. Those aren't the only LGPL DLLs either, there are a few others from GCC. If you remove the JBLAS dependency, this "problem" goes away and our mutual concern for bad licenses is already taken care of in netlib-java since JNILoader is a runtime dependency (my native layers don't actually have any LGPL code in them, although they do potentially link to LGPL code at runtime). |
|
I agree, looks like these are just for Windows, but they are separate from the jblas shims. They need to be removed since they can't be redistributed. I will make a JIRA for this. I don't know if that means it can't run on Windows but in any event it can't be left like that IMHO. Yes it's another reason to try to standardize on one framework. I don't think anyone opposes that; someone just has to investigate and do the legwork. |
|
It's trivial work, just needs time. I'd actually really like to do it and get more involved in optimising spark, but I really can't justify it on top of other FOSS/private commitments 😭
|
|
@fommil Yes. I mean that it can't be distributed at all in the Spark assembly. Spark does not literally redistribute the jblas artifacts. |
|
The current version looks good to me. I'm going to merge it into master and branch-1.3. Thanks for the discussion about JBLAS' license. I think we should remove JBLAS from the Spark dependency. @srowen We use JBLAS in NNLS and data generators. Those should be easy to replace. But we might also use it in examples. Let's make a JIRA to track the progress. |
I am the author of netlib-java and I found this documentation to be out of date. Some main points: 1. Breeze has not depended on jBLAS for some time 2. netlib-java provides a pure JVM implementation as the fallback (the original docs did not appear to be aware of this, claiming that gfortran was necessary) 3. The licensing issue is not just about LGPL: optimised natives have proprietary licenses. Building with the LGPL flag turned on really doesn't help you get past this. 4. I really think it's best to direct people to my detailed setup guide instead of trying to compress it into one sentence. It is different for each architecture, each OS, and for each backend. I hope this helps to clear things up 😄 Author: Sam Halliday <[email protected]> Author: Sam Halliday <[email protected]> Closes #4448 from fommil/patch-1 and squashes the following commits: 18cda11 [Sam Halliday] remove link to skillsmatters at request of @mengxr a35e4a9 [Sam Halliday] reword netlib-java/breeze docs (cherry picked from commit 56aff4b) Signed-off-by: Xiangrui Meng <[email protected]>
|
@mengxr NNLS is added to Breeze in this PR scalanlp/breeze#321 once David merges it, I believe you can clean up jblas dependencies completely... |
|
@debasish83 Thanks for the update! Please ping me when it gets merged. |
|
Sweet! 😄 |
|
I was looking into using openBLAS and found that the doc has been changed, especially on the sentence for
Is this no longer true since Spark 1.3.0 that I don't need to build OpenBLAS with Can you also comment on the usage on Intel MKL? |
|
It's not needed because it's all covered in the linked netlib-java instructions. I recommend you click through. |
|
I couldn't find any specific information with regards to this change on netlib-java instructions. I thought there was a specific reason that we need to to build OpenBLAS that do not use as many threads as cores because of how Spark work in terms of N tasks(threads) per executor, to avoid the stampede problem? Can one of the Spark developers comment on this? |
|
This is an OpenBLAS problem, it's not specific to Spark. |
I am the author of netlib-java and I found this documentation to be out of date. Some main points:
I hope this helps to clear things up 😄