-
Notifications
You must be signed in to change notification settings - Fork 480
Add safe scaling (c,z)ROT routines from https://doi.org/10.1145/3061665 #496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add safe scaling (c,z)ROT routines from https://doi.org/10.1145/3061665 #496
Conversation
Codecov Report
@@ Coverage Diff @@
## master #496 +/- ##
==========================================
- Coverage 83.36% 83.33% -0.03%
==========================================
Files 1820 1820
Lines 185545 170847 -14698
==========================================
- Hits 154673 142378 -12295
+ Misses 30872 28469 -2403
Continue to review full report at Codecov.
|
zerothi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comment in the code. I think introducing f90 modules require more careful thinking in terms of the namespace etc.
SRC/la_constants.f90
Outdated
| ! | ||
| ! Standard constants | ||
| ! | ||
| integer, parameter :: wp = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a good reason for having two different modules for precision constants.
Also, some compilers may not adhere to the double = 8. Although this is BLAS nomenclature, I think it should be adopted in case f90 is introduced anyways
Why not go with the same BLAS/LAPACK naming convention:
integer, parameter :: sp = kind(1.)
integer, parameter :: dp = kind(1.d0)
real(sp), parameter :: szero = 0.0_sp
real(dp), parameter :: dzero = 0.0_dpIt becomes a bit messy when using the same variables for different values when mixing modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such naming convention that I know of in the BLAS and LAPACK. Zero means a single precision zero in 32-bit routines and a double precision zero in 64-bit routines. The modules can not be combined without making extensive changes to all the code that includes them, and would make it more difficult to produce new routines in other precisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I commented about the modules and double precision nomenclature at #493 (comment). Note the similarity between #493, #494 and #496. The 3 PRs need similar fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such naming convention that I know of in the BLAS and LAPACK. Zero means a single precision zero in 32-bit routines and a double precision zero in 64-bit routines. The modules can not be combined without making extensive changes to all the code that includes them, and would make it more difficult to produce new routines in other precisions.
Agreed, the naming convention was just a suggestion, but based on the routine calls of LAPACK it-self.
I just only saw this PR adding the la_constants module I think it would be appropriate to do it right the first time.
Agreed this would require changes to the @weslleyspereira referenced issues as well.
Consider the LAPACK zcgesv.f with mixed precision. That would require an interface:
use la_constants, only: dp => wp
use la_constants32, only: sp => wpwhich just obfuscates even more.
I don't really mind any other nomenclature for the real parameters. But the kind parameters I would really not like to be the same in two different modules. And also, I hope that they could get merged into one interface module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LAPACK95 had an LA_PRECISION module which defined sp=kind(1.0), dp=kind(1.0d0). If LA_PRECISION were included, LA_CONSTANTS could use LA_PRECISION :: wp => dp, LA_CONSTANTS32 could use LA_PRECISION :: wp => sp, and the program units needing both could get them from LA_PRECISION directly. When LAPACK95 came out, this abstraction was needed because kind(1.0) was 8 on a Cray vector system. Defining sp and dp in terms of kind implies that testing was done on systems where kind is defined differently. No such testing was done because to my knowledge no such systems exist anymore, so it seems more honest to just define sp=4 and dp=8 and inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that lapack ships a make.inc enabling quad precision library using -freal-8-real-16 etc.
Here the la_constants* do not follow the kind specification of the used variables. For instance ulp is hard-coded to epsilon(..) for the 32 and 64 bit real's. If these variables are used in a quad library, bad stuff could happen ;)
Please see this small snippet which tests the flags + kind output + epsilon
#!/bin/bash
cat <<EOF > test.f90
program test
real :: r1
real*4:: r2
real(4) :: r3
real(selected_real_kind(p=6)) :: r4
double precision :: d1
real*8 :: d2
real(8) :: d3
real(kind(1.d0)) :: d4
real(selected_real_kind(p=15)) :: d5
print '(a10,10(tr1,i2))', 'single', kind(r1), kind(r2), kind(r3), kind(r4)
print '(a10,10(tr1,i2))', 'double', kind(d1), kind(d2), kind(d3), kind(d4), kind(d5)
print *, epsilon(r1), epsilon(d1)
end program test
EOF
function crun {
echo "$@"
gfortran $@ -o a.out test.f90 && ./a.out
shift $#
echo
}
crun
crun -freal-4-real-8
crun -freal-8-real-16
crun -freal-8-real-16 -freal-4-real-8
crun -fdefault-real-8
crun -fdefault-real-16There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that modules are the best way to import parameter values consistently.
Various kind values are predefined in the standard iso_fortran_env module which could be used to set the precision. Using 4 instead of real32 and 8 instead of real64 as in the prototype la_constants modules, as I did, simply states the obvious that 4 bytes = 32 bits.
If the constants module defines each constant with a unique name akin to FLT_EPSILON and DBL_EPSILON in C's float.h, then as @zerothi says, we would only need one constants module, and the variable names in the existing code could be maintained with assignments like wp => sp on the use line. There is nothing wrong with that except that if the module names come out of committee as something like real32_one, then those use lines will get really long really fast.
I would strongly advise against renaming the variables in all the existing code to match the new type-specific constant values. The advantages of having common code between precisions should not be underestimated.
Ugh, double precision and double complex are so last century. Please reconsider using this archaic construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi.
@zerothi :
2. Could you give an example? I haven't experienced any architectures that requires fortran modules to be handled specifically if using `kind`, or are you here talking about the machine precision etc. constants? Specifying a kind via `selected_real_kind` will use the compilers architecture knowledge to ensure the precision specification requested. I.e. at least operate as a float/double.
I was talking about the machine precision constants specifically, which depend on the machine floating-point radix and min and max exponents. I don't have a practical example of a non-IEEE machine still in use for linear algebra, so maybe I am being too cautious here. But I am not a specialist on this subject.
Nice code snippet in #496 (comment) . Thanks! Based on that, I am not sure how to use the hard-coded scaling constants. I think It does not matter if we use a single or multiple la_constans* files. Any suggestions?
@ecanesc :
I would strongly advise against renaming the variables in all the existing code to match the new type-specific constant values. The advantages of having common code between precisions should not be underestimated.
Yes, I am also against renaming all variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi everybody, does anyone have a good way to allow users to use flags such as -freal-4-real-8 (see @zerothi's comment above) in LAPACK? @weslleyspereira and I are brainstorming right now and we are not sure how to do this in a nice way. Suggestions welcome. Julien.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about the machine precision constants specifically, which depend on the machine floating-point radix and min and max exponents. I don't have a practical example of a non-IEEE machine still in use for linear algebra, so maybe I am being too cautious here. But I am not a specialist on this subject.
Nice code snippet in #496 (comment) . Thanks! Based on that, I am not sure how to use the hard-coded scaling constants. I think It does not matter if we use a single or multiple
la_constans*files. Any suggestions?
This would go against hard-coding the constants in the la_constants* files. The problem is that:
real(wp), parameter :: eps = 0.11102230246251565404E-015_wp
 real(wp), parameter :: ulp = 0.22204460492503130808E-015_wp
 real(wp), parameter :: safmin = 0.22250738585072013831E-307_wp
 real(wp), parameter :: safmax = 0.44942328371557897693E+308_wp
 real(wp), parameter :: smlnum = 0.10020841800044863890E-291_wp
 real(wp), parameter :: bignum = 0.99792015476735990583E+292_wp
 real(wp), parameter :: rtmin = 0.10010415475915504622E-145_wp
 real(wp), parameter :: rtmax = 0.99895953610111751404E+146_wpare hard-coded. So you can't use these in a quad-precision library. You really want to compute these at compile time.
Something like this:
program test
integer, parameter :: dp = selected_real_kind(p=15)
real(dp), parameter :: ulp = epsilon(0._dp)
real(dp), parameter :: eps = ulp * 0.5_dp
real(dp), parameter :: safmin = tiny(0._dp)
real(dp), parameter :: safmax = huge(0._dp)
real(dp), parameter :: smlnum = safmin / ulp
real(dp), parameter :: bignum = safmax * ulp ! have no clue how this one is calculated
real(dp), parameter :: rtmin = sqrt(smlnum)
real(dp), parameter :: rtmax = sqrt(bignum)
real(dp), parameter :: tsml = sqrt(safmin)
real(dp), parameter :: tbig = sqrt(safmax)
real(dp), parameter :: ssml = rtmax / eps ! looks reversed? Is this correct?
real(dp), parameter :: sbig = rtmin * eps
print *, ulp
print *, eps
print *, safmin
print *, safmax
print *, smlnum
print *, bignum
print *, rtmin
print *, rtmax
print *, 'blue'
print *, tsml
print *, tbig
print *, ssml
print *, sbig
end program testeverything computed at compile time. That would also make the 32 vs 64 go away if you want low/high?.
|
Comments:
|
Wouldn't it then make sense to have the new precisions as separate codes? For instance I could easily foresee use cases where some parts of an algorithm requires quad precision, while the rest requires single/double. Linking such programs/libraries would require fancy linker lines with groups for each object required quad precision while also forcing this library to be placed in two different locations.
Even this calls for adding all the code in more precisions.
If what you want is all precisions. I think it would be safer to add all codes as separate precisions. And then disable compiling lapack with |
| real(wp), parameter :: tsml = 0.1084202172E-18_wp | ||
| real(wp), parameter :: tbig = 0.4503599627E+16_wp | ||
| real(wp), parameter :: ssml = 0.3777893186E+23_wp | ||
| real(wp), parameter :: sbig = 0.1323488980E-22_wp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these reversed? all other small/small min/max refer to e-X and e+X whereas ssml and sbig have opposite exponent sizes? This is confusing to me, at least without more proper comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they are not reversed. They are based on the Blue's constants s (small number) and S (large number) (https://doi.org/10.1145/355769.355771). The main reference of this PR (https://doi.org/10.1145/3061665) uses ssml = 1/s, and sbig = 1/S .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok,
|
Thanks @zerothi ! I like your solution here #496 (comment). It satisfies the low/high precision standard that @langou mentioned here #496 (comment) . I will try it. A routine needing more than two precisions, e.g., https://epubs.siam.org/doi/abs/10.1137/17M1140819, could verify the Fortran single/double precisions at runtime and should have its own constants in place. |
If you want this, I would suggest you mark #493 #494 and #495 as work in progress, make a new pr fixing the
Ok, but how should it handle in case the library is compiled in quad, then the code basically doesn't do what it tells it to do? :) |
Sure! It sounds like a good idea 👍
I see your point. Maybe it could be a good idea to track the compiled precision and inform it to a preprocessor variable. For instance, a code needing the third precision as "one precision above the double precision" would use: real p3
#ifdef _dp_eq4
parameter ( p3 = 8 )
#else
parameter ( p3 = 16 )
#endif
real u, a, ...
double precision u_dp, a_dp, ...
real(p3) u_p3, a_p3, ...Preprocessor variables like |
While pre-processors may be useful in fortran. I would heavily suggest this gets careful inputs from many more persons. :) Pre-processors are nice, since you can do lots of fancy stuff with them, but for more advanced stuff one will easily fall into incompatibilities. I say pre-processors should be left out for now, alternatively I think I guess the check in the code could be something like: integer, parameter :: qp = selected_real_kind(p=33)
real :: r_sp
double precision :: r_dp
if ( kind(r_dp) == qp ) then
! something should be donehere is a small kind explorer: program kind_prog
integer :: i, k
k = -1
do i = 100, 1, -1
if ( selected_real_kind(p=i) /= k ) then
k = selected_real_kind(p=i)
print *,'real', i, k
end if
end do
k = -1
do i = 100, 1, -1
if ( selected_int_kind(i) /= k ) then
k = selected_int_kind(i)
print *, 'int', i, k
end if
end do
end program kind_progA small test-program at compile time could also error out if these kinds are not unique... I don't really mind :) |
|
I have just added Doxygen preambles to all new files. |
|
Still missing the ROTG subroutines, Thanks to Professor James Demmel for noticing that! |
6f44e83 to
f25e56d
Compare
f25e56d to
d8b364d
Compare
I included the ROTG routines in #527, and the |
Edward Anderson stated in his paper https://doi.org/10.1145/3061665 the following:
The square root of a sum of squares is well known to be prone to overflow and underflow. Ad hoc scaling
of intermediate results, as has been done in numerical software such as the BLAS and LAPACK, mostly
avoids the problem, but it can still occur at extreme values in the range of representable numbers. More
careful scaling, as has been implemented in recent versions of the standard algorithms, may come at the
expense of performance or clarity. This work reimplements the vector 2-norm and the generation of Givens
rotations from the Level 1 BLAS to improve their performance and design. In addition, support for negative
increments is extended to the Level 1 BLAS operations on a single vector, and a comprehensive test suite
for all the Level 1 BLAS is included.
This PR replaces the original cROT and zROT with Edward's safe scaling codes.
Since the codes were written in the Fortran90 standard, I removed
from the
CMakeLists.txt. The compiler now determines the Fortran layout by the file extension.