Skip to content

Conversation

@weslleyspereira
Copy link
Collaborator

@weslleyspereira weslleyspereira commented Feb 22, 2021

Edward Anderson stated in his paper https://doi.org/10.1145/3061665 the following:

The square root of a sum of squares is well known to be prone to overflow and underflow. Ad hoc scaling
of intermediate results, as has been done in numerical software such as the BLAS and LAPACK, mostly
avoids the problem, but it can still occur at extreme values in the range of representable numbers. More
careful scaling, as has been implemented in recent versions of the standard algorithms, may come at the
expense of performance or clarity. This work reimplements the vector 2-norm and the generation of Givens
rotations from the Level 1 BLAS to improve their performance and design. In addition, support for negative
increments is extended to the Level 1 BLAS operations on a single vector, and a comprehensive test suite
for all the Level 1 BLAS is included.

  • This PR replaces the original cROT and zROT with Edward's safe scaling codes.

  • Since the codes were written in the Fortran90 standard, I removed

# Tell CMake that our Fortran sources are written in fixed format.
set(CMAKE_Fortran_FORMAT FIXED)

from the CMakeLists.txt. The compiler now determines the Fortran layout by the file extension.

@weslleyspereira weslleyspereira changed the title Move original xLARTG to DEPRECATED; Add new code Add safe scaling (c,z)ROT routines from https://doi.org/10.1145/3061665 Feb 22, 2021
@codecov
Copy link

codecov bot commented Feb 22, 2021

Codecov Report

Merging #496 (6f44e83) into master (77a0ceb) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 6f44e83 differs from pull request most recent head d8b364d. Consider uploading reports for the commit d8b364d to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #496      +/-   ##
==========================================
- Coverage   83.36%   83.33%   -0.03%     
==========================================
  Files        1820     1820              
  Lines      185545   170847   -14698     
==========================================
- Hits       154673   142378   -12295     
+ Misses      30872    28469    -2403     
Impacted Files Coverage Δ
SRC/crot.f90 100.00% <100.00%> (ø)
SRC/zrot.f90 100.00% <100.00%> (ø)
SRC/dlaqr1.f 47.61% <0.00%> (-6.23%) ⬇️
SRC/slaqr1.f 47.61% <0.00%> (-6.23%) ⬇️
SRC/dlals0.f 74.73% <0.00%> (-4.76%) ⬇️
SRC/slals0.f 74.73% <0.00%> (-4.76%) ⬇️
SRC/chegst.f 86.44% <0.00%> (-4.58%) ⬇️
SRC/zhegst.f 86.44% <0.00%> (-4.58%) ⬇️
SRC/dsygst.f 86.44% <0.00%> (-4.04%) ⬇️
SRC/ssygst.f 86.44% <0.00%> (-4.04%) ⬇️
... and 1812 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77a0ceb...d8b364d. Read the comment docs.

Copy link
Contributor

@zerothi zerothi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment in the code. I think introducing f90 modules require more careful thinking in terms of the namespace etc.

!
! Standard constants
!
integer, parameter :: wp = 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a good reason for having two different modules for precision constants.

Also, some compilers may not adhere to the double = 8. Although this is BLAS nomenclature, I think it should be adopted in case f90 is introduced anyways

Why not go with the same BLAS/LAPACK naming convention:

integer, parameter :: sp = kind(1.)
integer, parameter :: dp = kind(1.d0)
real(sp), parameter :: szero = 0.0_sp
real(dp), parameter :: dzero = 0.0_dp

It becomes a bit messy when using the same variables for different values when mixing modules.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such naming convention that I know of in the BLAS and LAPACK. Zero means a single precision zero in 32-bit routines and a double precision zero in 64-bit routines. The modules can not be combined without making extensive changes to all the code that includes them, and would make it more difficult to produce new routines in other precisions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I commented about the modules and double precision nomenclature at #493 (comment). Note the similarity between #493, #494 and #496. The 3 PRs need similar fixes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such naming convention that I know of in the BLAS and LAPACK. Zero means a single precision zero in 32-bit routines and a double precision zero in 64-bit routines. The modules can not be combined without making extensive changes to all the code that includes them, and would make it more difficult to produce new routines in other precisions.

Agreed, the naming convention was just a suggestion, but based on the routine calls of LAPACK it-self.
I just only saw this PR adding the la_constants module I think it would be appropriate to do it right the first time.
Agreed this would require changes to the @weslleyspereira referenced issues as well.

Consider the LAPACK zcgesv.f with mixed precision. That would require an interface:

use la_constants, only: dp => wp
use la_constants32, only: sp => wp

which just obfuscates even more.

I don't really mind any other nomenclature for the real parameters. But the kind parameters I would really not like to be the same in two different modules. And also, I hope that they could get merged into one interface module.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LAPACK95 had an LA_PRECISION module which defined sp=kind(1.0), dp=kind(1.0d0). If LA_PRECISION were included, LA_CONSTANTS could use LA_PRECISION :: wp => dp, LA_CONSTANTS32 could use LA_PRECISION :: wp => sp, and the program units needing both could get them from LA_PRECISION directly. When LAPACK95 came out, this abstraction was needed because kind(1.0) was 8 on a Cray vector system. Defining sp and dp in terms of kind implies that testing was done on systems where kind is defined differently. No such testing was done because to my knowledge no such systems exist anymore, so it seems more honest to just define sp=4 and dp=8 and inline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that lapack ships a make.inc enabling quad precision library using -freal-8-real-16 etc.

Here the la_constants* do not follow the kind specification of the used variables. For instance ulp is hard-coded to epsilon(..) for the 32 and 64 bit real's. If these variables are used in a quad library, bad stuff could happen ;)

Please see this small snippet which tests the flags + kind output + epsilon

#!/bin/bash

cat <<EOF > test.f90
program test
  real :: r1
  real*4:: r2
  real(4) :: r3
  real(selected_real_kind(p=6)) :: r4

  double precision :: d1
  real*8 :: d2
  real(8) :: d3
  real(kind(1.d0)) :: d4
  real(selected_real_kind(p=15)) :: d5

  print '(a10,10(tr1,i2))', 'single', kind(r1), kind(r2), kind(r3), kind(r4)
  print '(a10,10(tr1,i2))', 'double', kind(d1), kind(d2), kind(d3), kind(d4), kind(d5)
print *, epsilon(r1), epsilon(d1)
end program test
EOF

function crun {
    echo "$@"
    gfortran $@ -o a.out test.f90 && ./a.out
    shift $#
    echo
}

crun
crun -freal-4-real-8
crun -freal-8-real-16
crun -freal-8-real-16 -freal-4-real-8
crun -fdefault-real-8
crun -fdefault-real-16

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that modules are the best way to import parameter values consistently.

Various kind values are predefined in the standard iso_fortran_env module which could be used to set the precision. Using 4 instead of real32 and 8 instead of real64 as in the prototype la_constants modules, as I did, simply states the obvious that 4 bytes = 32 bits.

If the constants module defines each constant with a unique name akin to FLT_EPSILON and DBL_EPSILON in C's float.h, then as @zerothi says, we would only need one constants module, and the variable names in the existing code could be maintained with assignments like wp => sp on the use line. There is nothing wrong with that except that if the module names come out of committee as something like real32_one, then those use lines will get really long really fast.

I would strongly advise against renaming the variables in all the existing code to match the new type-specific constant values. The advantages of having common code between precisions should not be underestimated.

Ugh, double precision and double complex are so last century. Please reconsider using this archaic construction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi.

@zerothi :

2. Could you give an example? I haven't experienced any architectures that requires fortran modules to be handled specifically if using `kind`, or are you here talking about the machine precision etc. constants? Specifying a kind via `selected_real_kind` will use the compilers architecture knowledge to ensure the precision specification requested. I.e. at least operate as a float/double.

I was talking about the machine precision constants specifically, which depend on the machine floating-point radix and min and max exponents. I don't have a practical example of a non-IEEE machine still in use for linear algebra, so maybe I am being too cautious here. But I am not a specialist on this subject.

Nice code snippet in #496 (comment) . Thanks! Based on that, I am not sure how to use the hard-coded scaling constants. I think It does not matter if we use a single or multiple la_constans* files. Any suggestions?

@ecanesc :

I would strongly advise against renaming the variables in all the existing code to match the new type-specific constant values. The advantages of having common code between precisions should not be underestimated.

Yes, I am also against renaming all variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi everybody, does anyone have a good way to allow users to use flags such as -freal-4-real-8 (see @zerothi's comment above) in LAPACK? @weslleyspereira and I are brainstorming right now and we are not sure how to do this in a nice way. Suggestions welcome. Julien.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about the machine precision constants specifically, which depend on the machine floating-point radix and min and max exponents. I don't have a practical example of a non-IEEE machine still in use for linear algebra, so maybe I am being too cautious here. But I am not a specialist on this subject.

Nice code snippet in #496 (comment) . Thanks! Based on that, I am not sure how to use the hard-coded scaling constants. I think It does not matter if we use a single or multiple la_constans* files. Any suggestions?

This would go against hard-coding the constants in the la_constants* files. The problem is that:

   real(wp), parameter :: eps =  0.11102230246251565404E-015_wpreal(wp), parameter :: ulp =  0.22204460492503130808E-015_wpreal(wp), parameter :: safmin =  0.22250738585072013831E-307_wpreal(wp), parameter :: safmax =  0.44942328371557897693E+308_wpreal(wp), parameter :: smlnum =  0.10020841800044863890E-291_wpreal(wp), parameter :: bignum =  0.99792015476735990583E+292_wpreal(wp), parameter :: rtmin =  0.10010415475915504622E-145_wpreal(wp), parameter :: rtmax =  0.99895953610111751404E+146_wp

are hard-coded. So you can't use these in a quad-precision library. You really want to compute these at compile time.
Something like this:

program test
  integer, parameter :: dp = selected_real_kind(p=15)
  real(dp), parameter :: ulp = epsilon(0._dp)
  real(dp), parameter :: eps = ulp * 0.5_dp
  real(dp), parameter :: safmin = tiny(0._dp)
  real(dp), parameter :: safmax = huge(0._dp)
  real(dp), parameter :: smlnum = safmin / ulp
  real(dp), parameter :: bignum = safmax * ulp ! have no clue how this one is calculated
  real(dp), parameter :: rtmin = sqrt(smlnum)
  real(dp), parameter :: rtmax = sqrt(bignum)

  real(dp), parameter :: tsml = sqrt(safmin)
  real(dp), parameter :: tbig = sqrt(safmax)
  real(dp), parameter :: ssml = rtmax / eps ! looks reversed? Is this correct?
  real(dp), parameter :: sbig = rtmin * eps

  print *, ulp
  print *, eps
  print *, safmin
  print *, safmax
  print *, smlnum
  print *, bignum
  print *, rtmin
  print *, rtmax

  print *, 'blue'
  print *, tsml
  print *, tbig
  print *, ssml
  print *, sbig

end program test

everything computed at compile time. That would also make the 32 vs 64 go away if you want low/high?.

@langou
Copy link
Contributor

langou commented Feb 26, 2021

Comments:

  1. Need to support single, double, want to support quad, want to support 16-bit arithmetic (half) and such

  2. Need to support mixed precision (single/double). There are some three-precisions algorithms out there. (See, for example, "Accelerating the solution of linear systems by iterative refinement in three precisions," from Erin Carson and Nicholas J. Higham, SIAM J. Sci. Comput., 40(2), A817–A847, 2018.)

  3. If we abstract for mixed precision, my preference would be something like "low precision" / "high precision", as opposed to "single" / "double".

@zerothi
Copy link
Contributor

zerothi commented Mar 1, 2021

  1. Need to support single, double, want to support quad, want to support 16-bit arithmetic (half) and such

Wouldn't it then make sense to have the new precisions as separate codes? For instance I could easily foresee use cases where some parts of an algorithm requires quad precision, while the rest requires single/double. Linking such programs/libraries would require fancy linker lines with groups for each object required quad precision while also forcing this library to be placed in two different locations.

  1. Need to support mixed precision (single/double). There are some three-precisions algorithms out there. (See, for example, "Accelerating the solution of linear systems by iterative refinement in three precisions," from Erin Carson and Nicholas J. Higham, SIAM J. Sci. Comput., 40(2), A817–A847, 2018.)

Even this calls for adding all the code in more precisions.

  1. If we abstract for mixed precision, my preference would be something like "low precision" / "high precision", as opposed to "single" / "double".

If what you want is all precisions. I think it would be safer to add all codes as separate precisions. And then disable compiling lapack with -freal-4-real-8 etc. That would be more stable, I guess.

real(wp), parameter :: tsml = 0.1084202172E-18_wp
real(wp), parameter :: tbig = 0.4503599627E+16_wp
real(wp), parameter :: ssml = 0.3777893186E+23_wp
real(wp), parameter :: sbig = 0.1323488980E-22_wp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these reversed? all other small/small min/max refer to e-X and e+X whereas ssml and sbig have opposite exponent sizes? This is confusing to me, at least without more proper comments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not reversed. They are based on the Blue's constants s (small number) and S (large number) (https://doi.org/10.1145/355769.355771). The main reference of this PR (https://doi.org/10.1145/3061665) uses ssml = 1/s, and sbig = 1/S .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok,

@weslleyspereira
Copy link
Collaborator Author

Thanks @zerothi ! I like your solution here #496 (comment). It satisfies the low/high precision standard that @langou mentioned here #496 (comment) . I will try it.

A routine needing more than two precisions, e.g., https://epubs.siam.org/doi/abs/10.1137/17M1140819, could verify the Fortran single/double precisions at runtime and should have its own constants in place.

@zerothi
Copy link
Contributor

zerothi commented Mar 1, 2021

Thanks @zerothi ! I like your solution here #496 (comment). It satisfies the low/high precision standard that @langou mentioned here #496 (comment) . I will try it.

If you want this, I would suggest you mark #493 #494 and #495 as work in progress, make a new pr fixing the la_constant and precision modules and then re-adjust the PR's against that PR. These 3 contain duplicate changes. :)

A routine needing more than two precisions, e.g., https://epubs.siam.org/doi/abs/10.1137/17M1140819, could verify the Fortran single/double precisions at runtime and should have its own constants in place.

Ok, but how should it handle in case the library is compiled in quad, then the code basically doesn't do what it tells it to do? :)

@weslleyspereira
Copy link
Collaborator Author

If you want this, I would suggest you mark #493 #494 and #495 as work in progress, make a new pr fixing the la_constant and precision modules and then re-adjust the PR's against that PR. These 3 contain duplicate changes. :)

Sure! It sounds like a good idea 👍

Ok, but how should it handle in case the library is compiled in quad, then the code basically doesn't do what it tells it to do? :)

I see your point. Maybe it could be a good idea to track the compiled precision and inform it to a preprocessor variable. For instance, a code needing the third precision as "one precision above the double precision" would use:

real p3
#ifdef _dp_eq4
    parameter ( p3 = 8 )
#else
    parameter ( p3 = 16 )
#endif

real u, a, ...
double precision u_dp, a_dp, ...
real(p3) u_p3, a_p3, ...

Preprocessor variables like _dp_eq4 could be included in the Makefile and CMakeLists as soon as we really need them.
What do you think?

@zerothi
Copy link
Contributor

zerothi commented Mar 1, 2021

Ok, but how should it handle in case the library is compiled in quad, then the code basically doesn't do what it tells it to do? :)

I see your point. Maybe it could be a good idea to track the compiled precision and inform it to a preprocessor variable. For instance, a code needing the third precision as "one precision above the double precision" would use:

real p3
#ifdef _dp_eq4
    parameter ( p3 = 8 )
#else
    parameter ( p3 = 16 )
#endif

real u, a, ...
double precision u_dp, a_dp, ...
real(p3) u_p3, a_p3, ...

Preprocessor variables like _dp_eq4 could be included in the Makefile and CMakeLists as soon as we really need them.
What do you think?

While pre-processors may be useful in fortran. I would heavily suggest this gets careful inputs from many more persons. :)

Pre-processors are nice, since you can do lots of fancy stuff with them, but for more advanced stuff one will easily fall into incompatibilities. I say pre-processors should be left out for now, alternatively I think fypp could be used in a more stable way (compiler independent).

I guess the check in the code could be something like:

integer, parameter :: qp = selected_real_kind(p=33)
real :: r_sp
double precision :: r_dp

if ( kind(r_dp) == qp ) then
    ! something should be done

here is a small kind explorer:

program kind_prog
  integer :: i, k

  k = -1
  do i = 100, 1, -1
    if ( selected_real_kind(p=i) /= k ) then
      k = selected_real_kind(p=i)
      print *,'real', i, k
    end if
  end do
  
  k = -1
  do i = 100, 1, -1
    if ( selected_int_kind(i) /= k ) then
      k = selected_int_kind(i)
      print *, 'int', i, k
    end if
  end do
end program kind_prog

A small test-program at compile time could also error out if these kinds are not unique... I don't really mind :)

@weslleyspereira weslleyspereira marked this pull request as draft March 1, 2021 15:13
@weslleyspereira
Copy link
Collaborator Author

Hi! As @zerothi suggested, I created PR #501 to include the module(s) for the scaling constants.

@weslleyspereira weslleyspereira marked this pull request as ready for review March 3, 2021 21:52
@weslleyspereira
Copy link
Collaborator Author

I have just added Doxygen preambles to all new files.

@langou langou added this to the LAPACK 3.10.0 milestone Mar 18, 2021
@weslleyspereira weslleyspereira marked this pull request as draft March 25, 2021 20:21
@weslleyspereira
Copy link
Collaborator Author

Still missing the ROTG subroutines, Thanks to Professor James Demmel for noticing that!

@weslleyspereira weslleyspereira force-pushed the try-xROT-from-Edward-Anderson branch from 6f44e83 to f25e56d Compare March 31, 2021 21:43
@weslleyspereira weslleyspereira force-pushed the try-xROT-from-Edward-Anderson branch from f25e56d to d8b364d Compare April 1, 2021 12:58
@weslleyspereira
Copy link
Collaborator Author

Still missing the ROTG subroutines, Thanks to Professor James Demmel for noticing that!

I included the ROTG routines in #527, and the la_constants module is in #501. Since the modifications in crot.f and zrot.f didn't include novelties regarding Safe Scaling algorithms, I am closing this PR without merging.

@weslleyspereira weslleyspereira removed this from the LAPACK 3.10.0 milestone Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants