-
Notifications
You must be signed in to change notification settings - Fork 480
Description
There's an accuracy issue in the LAPACK and ScaLAPACK Frobenius norms, which exhibits at large sizes, particularly in single precision. Ignoring the scaling for the moment, it computes a sum:
sum = 0;
for j = 1:n
for i = 1:m
sum = sum + abs(A(i,j))^2;
end
end
For large matrices, clearly sum >> abs(A(i,j))^2, making the addition inaccurate. A better, though not perfect, solution would be to sum each column individually, and accumulate the column sums:
sum = 0;
for j = 1:n
colsum = 0;
for i = 1:m
colsum = colsum + abs(A(i,j))^2;
end
sum = sum + colsum;
end
For extremely tall matrices (say, 1e6 rows), there would still be an issue. I've implemented this fix in ScaLAPACK *lange, *lansy, *lantr, and can easily do the same changes for LAPACK. It adds only a few multiply-adds and a divide per column to deal with the scaling, so has trivial performance impact.
We noticed this issue in developing SLATE. Because SLATE computes results per tile and then sums up tile results, its answer is more accurate than ScaLAPACK's. LAPACK computes the same answer as single node ScaLAPACK (except Intel MKL appears to have fixed the issue already). Below is output for genorm (i.e., lange) before and after changes. The relative error is computed as
(||A||_lapack - ||A||_slate) / (sqrt(m n) ||A||_lapack),
with a tolerance of error < 3*eps to pass. Note that the error grows with the size of the matrix. Also, the same matrix is used in single and double, so the norms should match.
[note: output modified for clarity and brevity]
slate/test> ./test genorm --norm fro --dim 500:4000:500,10000 --verbose 1 --type s,d
SLATE ScaLAPACK relative
type norm m n norm norm error status
single fro 500 500 2.88898743e+02 2.88892822e+02 4.10e-08 pass
single fro 1000 1000 5.77679993e+02 5.77577881e+02 1.77e-07 pass
single fro 1500 1500 8.66120239e+02 8.65607910e+02 3.95e-07 FAILED
single fro 2000 2000 1.15464795e+03 1.15303625e+03 6.99e-07 FAILED
single fro 2500 2500 1.44320764e+03 1.44007178e+03 8.71e-07 FAILED
single fro 3000 3000 1.73191736e+03 1.72434570e+03 1.46e-06 FAILED
single fro 3500 3500 2.02049438e+03 2.00892126e+03 1.65e-06 FAILED
single fro 4000 4000 2.30940161e+03 2.28700610e+03 2.45e-06 FAILED
single fro 10000 10000 5.77353906e+03 4.09599976e+03 4.10e-05 FAILED
double fro 500 500 2.88898764e+02 2.88898764e+02 8.26e-18 pass
double fro 1000 1000 5.77679957e+02 5.77679957e+02 1.18e-17 pass
double fro 1500 1500 8.66120383e+02 8.66120383e+02 1.29e-17 pass
double fro 2000 2000 1.15464767e+03 1.15464767e+03 2.73e-17 pass
double fro 2500 2500 1.44320751e+03 1.44320751e+03 2.52e-19 pass
double fro 3000 3000 1.73191680e+03 1.73191680e+03 1.14e-18 pass
double fro 3500 3500 2.02049402e+03 2.02049402e+03 5.21e-18 pass
double fro 4000 4000 2.30940433e+03 2.30940433e+03 1.72e-18 pass
double fro 10000 10000 5.77356340e+03 5.77356340e+03 1.67e-17 pass
7 tests FAILED.
With updated ScaLAPACK code, all the tests pass.
slate/test> ./test genorm --norm fro --dim 500:4000:500,10000 --verbose 1 --type s,d
SLATE (Sca)LAPACK relative
type norm m n norm norm error status
single fro 500 500 2.88898743e+02 2.88898773e+02 2.11e-10 pass
single fro 1000 1000 5.77679993e+02 5.77679932e+02 1.06e-10 pass
single fro 1500 1500 8.66120300e+02 8.66120544e+02 1.88e-10 pass
single fro 2000 2000 1.15464771e+03 1.15464832e+03 2.64e-10 pass
single fro 2500 2500 1.44320740e+03 1.44320703e+03 1.01e-10 pass
single fro 3000 3000 1.73191687e+03 1.73191687e+03 0.00e+00 pass
single fro 3500 3500 2.02049390e+03 2.02049280e+03 1.55e-10 pass
single fro 4000 4000 2.30940112e+03 2.30940430e+03 3.44e-10 pass
single fro 10000 10000 5.77354053e+03 5.77353467e+03 1.01e-10 pass
double fro 500 500 2.88898764e+02 2.88898764e+02 3.94e-19 pass
double fro 1000 1000 5.77679957e+02 5.77679957e+02 3.94e-19 pass
double fro 1500 1500 8.66120383e+02 8.66120383e+02 8.75e-20 pass
double fro 2000 2000 1.15464767e+03 1.15464767e+03 9.85e-20 pass
double fro 2500 2500 1.44320751e+03 1.44320751e+03 1.89e-19 pass
double fro 3000 3000 1.73191680e+03 1.73191680e+03 4.81e-19 pass
double fro 3500 3500 2.02049402e+03 2.02049402e+03 0.00e+00 pass
double fro 4000 4000 2.30940433e+03 2.30940433e+03 0.00e+00 pass
double fro 10000 10000 5.77356340e+03 5.77356340e+03 4.10e-19 pass
All tests passed.
NB: there are other issues with ScaLAPACK's *lantr due to incorrect indexing. We also have fixes for those. They don't affect LAPACK.