Add safe scaling (c,z)ROT routines from https://doi.org/10.1145/3061665 #496

weslleyspereira · 2021-02-22T20:21:15Z

Edward Anderson stated in his paper https://doi.org/10.1145/3061665 the following:

The square root of a sum of squares is well known to be prone to overflow and underflow. Ad hoc scaling
of intermediate results, as has been done in numerical software such as the BLAS and LAPACK, mostly
avoids the problem, but it can still occur at extreme values in the range of representable numbers. More
careful scaling, as has been implemented in recent versions of the standard algorithms, may come at the
expense of performance or clarity. This work reimplements the vector 2-norm and the generation of Givens
rotations from the Level 1 BLAS to improve their performance and design. In addition, support for negative
increments is extended to the Level 1 BLAS operations on a single vector, and a comprehensive test suite
for all the Level 1 BLAS is included.

This PR replaces the original cROT and zROT with Edward's safe scaling codes.
Since the codes were written in the Fortran90 standard, I removed

# Tell CMake that our Fortran sources are written in fixed format.
set(CMAKE_Fortran_FORMAT FIXED)

from the CMakeLists.txt. The compiler now determines the Fortran layout by the file extension.

codecov · 2021-02-22T21:00:31Z

Codecov Report

Merging #496 (6f44e83) into master (77a0ceb) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 6f44e83 differs from pull request most recent head d8b364d. Consider uploading reports for the commit d8b364d to get more accurate results

@@            Coverage Diff             @@
##           master     #496      +/-   ##
==========================================
- Coverage   83.36%   83.33%   -0.03%     
==========================================
  Files        1820     1820              
  Lines      185545   170847   -14698     
==========================================
- Hits       154673   142378   -12295     
+ Misses      30872    28469    -2403

Impacted Files	Coverage Δ
SRC/crot.f90	`100.00% <100.00%> (ø)`
SRC/zrot.f90	`100.00% <100.00%> (ø)`
SRC/dlaqr1.f	`47.61% <0.00%> (-6.23%)`	⬇️
SRC/slaqr1.f	`47.61% <0.00%> (-6.23%)`	⬇️
SRC/dlals0.f	`74.73% <0.00%> (-4.76%)`	⬇️
SRC/slals0.f	`74.73% <0.00%> (-4.76%)`	⬇️
SRC/chegst.f	`86.44% <0.00%> (-4.58%)`	⬇️
SRC/zhegst.f	`86.44% <0.00%> (-4.58%)`	⬇️
SRC/dsygst.f	`86.44% <0.00%> (-4.04%)`	⬇️
SRC/ssygst.f	`86.44% <0.00%> (-4.04%)`	⬇️
... and 1812 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77a0ceb...d8b364d. Read the comment docs.

zerothi

Please see the comment in the code. I think introducing f90 modules require more careful thinking in terms of the namespace etc.

zerothi · 2021-02-23T13:22:02Z

SRC/la_constants.f90

+!
+!  Standard constants
+!
+   integer, parameter :: wp =    8


I don't see a good reason for having two different modules for precision constants.

Also, some compilers may not adhere to the double = 8. Although this is BLAS nomenclature, I think it should be adopted in case f90 is introduced anyways

Why not go with the same BLAS/LAPACK naming convention:

integer, parameter :: sp = kind(1.) integer, parameter :: dp = kind(1.d0) real(sp), parameter :: szero = 0.0_sp real(dp), parameter :: dzero = 0.0_dp

It becomes a bit messy when using the same variables for different values when mixing modules.

There is no such naming convention that I know of in the BLAS and LAPACK. Zero means a single precision zero in 32-bit routines and a double precision zero in 64-bit routines. The modules can not be combined without making extensive changes to all the code that includes them, and would make it more difficult to produce new routines in other precisions.

Thanks for the feedback! I commented about the modules and double precision nomenclature at #493 (comment). Note the similarity between #493, #494 and #496. The 3 PRs need similar fixes.

There is no such naming convention that I know of in the BLAS and LAPACK. Zero means a single precision zero in 32-bit routines and a double precision zero in 64-bit routines. The modules can not be combined without making extensive changes to all the code that includes them, and would make it more difficult to produce new routines in other precisions.

Agreed, the naming convention was just a suggestion, but based on the routine calls of LAPACK it-self.
I just only saw this PR adding the la_constants module I think it would be appropriate to do it right the first time.
Agreed this would require changes to the @weslleyspereira referenced issues as well.

Consider the LAPACK zcgesv.f with mixed precision. That would require an interface:

use la_constants, only: dp => wp use la_constants32, only: sp => wp

which just obfuscates even more.

I don't really mind any other nomenclature for the real parameters. But the kind parameters I would really not like to be the same in two different modules. And also, I hope that they could get merged into one interface module.

LAPACK95 had an LA_PRECISION module which defined sp=kind(1.0), dp=kind(1.0d0). If LA_PRECISION were included, LA_CONSTANTS could use LA_PRECISION :: wp => dp, LA_CONSTANTS32 could use LA_PRECISION :: wp => sp, and the program units needing both could get them from LA_PRECISION directly. When LAPACK95 came out, this abstraction was needed because kind(1.0) was 8 on a Cray vector system. Defining sp and dp in terms of kind implies that testing was done on systems where kind is defined differently. No such testing was done because to my knowledge no such systems exist anymore, so it seems more honest to just define sp=4 and dp=8 and inline.

Also note that lapack ships a make.inc enabling quad precision library using -freal-8-real-16 etc.

Here the la_constants* do not follow the kind specification of the used variables. For instance ulp is hard-coded to epsilon(..) for the 32 and 64 bit real's. If these variables are used in a quad library, bad stuff could happen ;)

Please see this small snippet which tests the flags + kind output + epsilon

#!/bin/bash cat <<EOF > test.f90 program test real :: r1 real*4:: r2 real(4) :: r3 real(selected_real_kind(p=6)) :: r4 double precision :: d1 real*8 :: d2 real(8) :: d3 real(kind(1.d0)) :: d4 real(selected_real_kind(p=15)) :: d5 print '(a10,10(tr1,i2))', 'single', kind(r1), kind(r2), kind(r3), kind(r4) print '(a10,10(tr1,i2))', 'double', kind(d1), kind(d2), kind(d3), kind(d4), kind(d5) print *, epsilon(r1), epsilon(d1) end program test EOF function crun { echo "$@" gfortran $@ -o a.out test.f90 && ./a.out shift $# echo } crun crun -freal-4-real-8 crun -freal-8-real-16 crun -freal-8-real-16 -freal-4-real-8 crun -fdefault-real-8 crun -fdefault-real-16

I agree that modules are the best way to import parameter values consistently.

Various kind values are predefined in the standard iso_fortran_env module which could be used to set the precision. Using 4 instead of real32 and 8 instead of real64 as in the prototype la_constants modules, as I did, simply states the obvious that 4 bytes = 32 bits.

If the constants module defines each constant with a unique name akin to FLT_EPSILON and DBL_EPSILON in C's float.h, then as @zerothi says, we would only need one constants module, and the variable names in the existing code could be maintained with assignments like wp => sp on the use line. There is nothing wrong with that except that if the module names come out of committee as something like real32_one, then those use lines will get really long really fast.

I would strongly advise against renaming the variables in all the existing code to match the new type-specific constant values. The advantages of having common code between precisions should not be underestimated.

Ugh, double precision and double complex are so last century. Please reconsider using this archaic construction.

Hi.

@zerothi :

2. Could you give an example? I haven't experienced any architectures that requires fortran modules to be handled specifically if using `kind`, or are you here talking about the machine precision etc. constants? Specifying a kind via `selected_real_kind` will use the compilers architecture knowledge to ensure the precision specification requested. I.e. at least operate as a float/double.

I was talking about the machine precision constants specifically, which depend on the machine floating-point radix and min and max exponents. I don't have a practical example of a non-IEEE machine still in use for linear algebra, so maybe I am being too cautious here. But I am not a specialist on this subject.

Nice code snippet in #496 (comment) . Thanks! Based on that, I am not sure how to use the hard-coded scaling constants. I think It does not matter if we use a single or multiple la_constans* files. Any suggestions?

@ecanesc :

I would strongly advise against renaming the variables in all the existing code to match the new type-specific constant values. The advantages of having common code between precisions should not be underestimated.

Yes, I am also against renaming all variables.

Hi everybody, does anyone have a good way to allow users to use flags such as -freal-4-real-8 (see @zerothi's comment above) in LAPACK? @weslleyspereira and I are brainstorming right now and we are not sure how to do this in a nice way. Suggestions welcome. Julien.

I was talking about the machine precision constants specifically, which depend on the machine floating-point radix and min and max exponents. I don't have a practical example of a non-IEEE machine still in use for linear algebra, so maybe I am being too cautious here. But I am not a specialist on this subject.

Nice code snippet in #496 (comment) . Thanks! Based on that, I am not sure how to use the hard-coded scaling constants. I think It does not matter if we use a single or multiple la_constans* files. Any suggestions?

This would go against hard-coding the constants in the la_constants* files. The problem is that:

real(wp), parameter :: eps = 0.11102230246251565404E-015_wp real(wp), parameter :: ulp = 0.22204460492503130808E-015_wp real(wp), parameter :: safmin = 0.22250738585072013831E-307_wp real(wp), parameter :: safmax = 0.44942328371557897693E+308_wp real(wp), parameter :: smlnum = 0.10020841800044863890E-291_wp real(wp), parameter :: bignum = 0.99792015476735990583E+292_wp real(wp), parameter :: rtmin = 0.10010415475915504622E-145_wp real(wp), parameter :: rtmax = 0.99895953610111751404E+146_wp

are hard-coded. So you can't use these in a quad-precision library. You really want to compute these at compile time.
Something like this:

program test integer, parameter :: dp = selected_real_kind(p=15) real(dp), parameter :: ulp = epsilon(0._dp) real(dp), parameter :: eps = ulp * 0.5_dp real(dp), parameter :: safmin = tiny(0._dp) real(dp), parameter :: safmax = huge(0._dp) real(dp), parameter :: smlnum = safmin / ulp real(dp), parameter :: bignum = safmax * ulp ! have no clue how this one is calculated real(dp), parameter :: rtmin = sqrt(smlnum) real(dp), parameter :: rtmax = sqrt(bignum) real(dp), parameter :: tsml = sqrt(safmin) real(dp), parameter :: tbig = sqrt(safmax) real(dp), parameter :: ssml = rtmax / eps ! looks reversed? Is this correct? real(dp), parameter :: sbig = rtmin * eps print *, ulp print *, eps print *, safmin print *, safmax print *, smlnum print *, bignum print *, rtmin print *, rtmax print *, 'blue' print *, tsml print *, tbig print *, ssml print *, sbig end program test

everything computed at compile time. That would also make the 32 vs 64 go away if you want low/high?.

langou · 2021-02-26T22:54:02Z

Comments:

Need to support single, double, want to support quad, want to support 16-bit arithmetic (half) and such
Need to support mixed precision (single/double). There are some three-precisions algorithms out there. (See, for example, "Accelerating the solution of linear systems by iterative refinement in three precisions," from Erin Carson and Nicholas J. Higham, SIAM J. Sci. Comput., 40(2), A817–A847, 2018.)
If we abstract for mixed precision, my preference would be something like "low precision" / "high precision", as opposed to "single" / "double".

zerothi · 2021-03-01T09:57:21Z

Need to support single, double, want to support quad, want to support 16-bit arithmetic (half) and such

Wouldn't it then make sense to have the new precisions as separate codes? For instance I could easily foresee use cases where some parts of an algorithm requires quad precision, while the rest requires single/double. Linking such programs/libraries would require fancy linker lines with groups for each object required quad precision while also forcing this library to be placed in two different locations.

Need to support mixed precision (single/double). There are some three-precisions algorithms out there. (See, for example, "Accelerating the solution of linear systems by iterative refinement in three precisions," from Erin Carson and Nicholas J. Higham, SIAM J. Sci. Comput., 40(2), A817–A847, 2018.)

Even this calls for adding all the code in more precisions.

If we abstract for mixed precision, my preference would be something like "low precision" / "high precision", as opposed to "single" / "double".

If what you want is all precisions. I think it would be safer to add all codes as separate precisions. And then disable compiling lapack with -freal-4-real-8 etc. That would be more stable, I guess.

zerothi · 2021-03-01T09:59:44Z

SRC/la_constants32.f90

+   real(wp), parameter :: tsml =  0.1084202172E-18_wp
+   real(wp), parameter :: tbig =  0.4503599627E+16_wp
+   real(wp), parameter :: ssml =  0.3777893186E+23_wp
+   real(wp), parameter :: sbig =  0.1323488980E-22_wp


are these reversed? all other small/small min/max refer to e-X and e+X whereas ssml and sbig have opposite exponent sizes? This is confusing to me, at least without more proper comments?

No, they are not reversed. They are based on the Blue's constants s (small number) and S (large number) (https://doi.org/10.1145/355769.355771). The main reference of this PR (https://doi.org/10.1145/3061665) uses ssml = 1/s, and sbig = 1/S .

weslleyspereira · 2021-03-01T11:03:21Z

Thanks @zerothi ! I like your solution here #496 (comment). It satisfies the low/high precision standard that @langou mentioned here #496 (comment) . I will try it.

A routine needing more than two precisions, e.g., https://epubs.siam.org/doi/abs/10.1137/17M1140819, could verify the Fortran single/double precisions at runtime and should have its own constants in place.

zerothi · 2021-03-01T11:10:24Z

Thanks @zerothi ! I like your solution here #496 (comment). It satisfies the low/high precision standard that @langou mentioned here #496 (comment) . I will try it.

If you want this, I would suggest you mark #493 #494 and #495 as work in progress, make a new pr fixing the la_constant and precision modules and then re-adjust the PR's against that PR. These 3 contain duplicate changes. :)

A routine needing more than two precisions, e.g., https://epubs.siam.org/doi/abs/10.1137/17M1140819, could verify the Fortran single/double precisions at runtime and should have its own constants in place.

Ok, but how should it handle in case the library is compiled in quad, then the code basically doesn't do what it tells it to do? :)

weslleyspereira · 2021-03-01T11:54:23Z

If you want this, I would suggest you mark #493 #494 and #495 as work in progress, make a new pr fixing the la_constant and precision modules and then re-adjust the PR's against that PR. These 3 contain duplicate changes. :)

Sure! It sounds like a good idea 👍

Ok, but how should it handle in case the library is compiled in quad, then the code basically doesn't do what it tells it to do? :)

I see your point. Maybe it could be a good idea to track the compiled precision and inform it to a preprocessor variable. For instance, a code needing the third precision as "one precision above the double precision" would use:

real p3
#ifdef _dp_eq4
    parameter ( p3 = 8 )
#else
    parameter ( p3 = 16 )
#endif

real u, a, ...
double precision u_dp, a_dp, ...
real(p3) u_p3, a_p3, ...

Preprocessor variables like _dp_eq4 could be included in the Makefile and CMakeLists as soon as we really need them.
What do you think?

zerothi · 2021-03-01T12:21:01Z

Ok, but how should it handle in case the library is compiled in quad, then the code basically doesn't do what it tells it to do? :)

I see your point. Maybe it could be a good idea to track the compiled precision and inform it to a preprocessor variable. For instance, a code needing the third precision as "one precision above the double precision" would use:
real p3
#ifdef _dp_eq4
    parameter ( p3 = 8 )
#else
    parameter ( p3 = 16 )
#endif

real u, a, ...
double precision u_dp, a_dp, ...
real(p3) u_p3, a_p3, ...
Preprocessor variables like _dp_eq4 could be included in the Makefile and CMakeLists as soon as we really need them.
What do you think?

While pre-processors may be useful in fortran. I would heavily suggest this gets careful inputs from many more persons. :)

Pre-processors are nice, since you can do lots of fancy stuff with them, but for more advanced stuff one will easily fall into incompatibilities. I say pre-processors should be left out for now, alternatively I think fypp could be used in a more stable way (compiler independent).

I guess the check in the code could be something like:

integer, parameter :: qp = selected_real_kind(p=33)
real :: r_sp
double precision :: r_dp

if ( kind(r_dp) == qp ) then
    ! something should be done

here is a small kind explorer:

program kind_prog
  integer :: i, k

  k = -1
  do i = 100, 1, -1
    if ( selected_real_kind(p=i) /= k ) then
      k = selected_real_kind(p=i)
      print *,'real', i, k
    end if
  end do
  
  k = -1
  do i = 100, 1, -1
    if ( selected_int_kind(i) /= k ) then
      k = selected_int_kind(i)
      print *, 'int', i, k
    end if
  end do
end program kind_prog

A small test-program at compile time could also error out if these kinds are not unique... I don't really mind :)

weslleyspereira · 2021-03-01T20:48:24Z

Hi! As @zerothi suggested, I created PR #501 to include the module(s) for the scaling constants.

weslleyspereira · 2021-03-03T23:32:11Z

I have just added Doxygen preambles to all new files.

weslleyspereira · 2021-03-25T20:23:02Z

Still missing the ROTG subroutines, Thanks to Professor James Demmel for noticing that!

…rothi

…rothi

weslleyspereira · 2021-04-01T13:02:59Z

Still missing the ROTG subroutines, Thanks to Professor James Demmel for noticing that!

I included the ROTG routines in #527, and the la_constants module is in #501. Since the modifications in crot.f and zrot.f didn't include novelties regarding Safe Scaling algorithms, I am closing this PR without merging.

weslleyspereira changed the title ~~Move original xLARTG to DEPRECATED; Add new code~~ Add safe scaling (c,z)ROT routines from https://doi.org/10.1145/3061665 Feb 22, 2021

zerothi reviewed Feb 23, 2021

View reviewed changes

weslleyspereira mentioned this pull request Feb 24, 2021

Add safe scaling xLARTG from https://doi.org/10.1145/3061665 #493

Merged

zerothi reviewed Mar 1, 2021

View reviewed changes

weslleyspereira marked this pull request as draft March 1, 2021 15:13

weslleyspereira mentioned this pull request Mar 1, 2021

Try module for scaling constants #501

Merged

weslleyspereira marked this pull request as ready for review March 3, 2021 21:52

langou added this to the LAPACK 3.10.0 milestone Mar 18, 2021

weslleyspereira marked this pull request as draft March 25, 2021 20:21

weslleyspereira and others added 8 commits March 31, 2021 18:35

Move original xLARTG to DEPRECATED; Add new code

a9d2bc7

Add the module la_constants from @ecanesc with the suggestions of @ze…

0c4bd4e

…rothi

Adds .f90 extension to Doxyfile

777155a

Includes Nick Papior as a contributor

7ed5cf9

mnt: cleanup and removal of erroneous kind specifications

dd3cc54

Update la_constants.f90

9916625

Adding Doxygen preambles

5201f97

Remove old methods

ca4b965

weslleyspereira force-pushed the try-xROT-from-Edward-Anderson branch from 6f44e83 to f25e56d Compare March 31, 2021 21:43

Updates Doxygen preambles

d8b364d

weslleyspereira force-pushed the try-xROT-from-Edward-Anderson branch from f25e56d to d8b364d Compare April 1, 2021 12:58

weslleyspereira closed this Apr 1, 2021

weslleyspereira removed this from the LAPACK 3.10.0 milestone Apr 1, 2021

Add safe scaling (c,z)ROT routines from https://doi.org/10.1145/3061665 #496

Add safe scaling (c,z)ROT routines from https://doi.org/10.1145/3061665 #496

Uh oh!

Conversation

weslleyspereira commented Feb 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zerothi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

langou commented Feb 26, 2021

Uh oh!

zerothi commented Mar 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weslleyspereira commented Mar 1, 2021

Uh oh!

zerothi commented Mar 1, 2021

Uh oh!

weslleyspereira commented Mar 1, 2021

Uh oh!

zerothi commented Mar 1, 2021

Uh oh!

weslleyspereira commented Mar 1, 2021

Uh oh!

weslleyspereira commented Mar 3, 2021

Uh oh!

weslleyspereira commented Mar 25, 2021

Uh oh!

weslleyspereira commented Apr 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

weslleyspereira commented Feb 22, 2021 •

edited

Loading

codecov bot commented Feb 22, 2021 •

edited

Loading