Skip to content

Accuracy issue in norms (lange, lansy, lantr, etc.) #261

@mgates3

Description

@mgates3

There's an accuracy issue in the LAPACK and ScaLAPACK Frobenius norms, which exhibits at large sizes, particularly in single precision. Ignoring the scaling for the moment, it computes a sum:

sum = 0;
for j = 1:n
    for i = 1:m
        sum = sum + abs(A(i,j))^2;
    end
end

For large matrices, clearly sum >> abs(A(i,j))^2, making the addition inaccurate. A better, though not perfect, solution would be to sum each column individually, and accumulate the column sums:

sum = 0;
for j = 1:n
    colsum = 0;
    for i = 1:m
        colsum = colsum + abs(A(i,j))^2;
    end
    sum = sum + colsum;
end

For extremely tall matrices (say, 1e6 rows), there would still be an issue. I've implemented this fix in ScaLAPACK *lange, *lansy, *lantr, and can easily do the same changes for LAPACK. It adds only a few multiply-adds and a divide per column to deal with the scaling, so has trivial performance impact.

We noticed this issue in developing SLATE. Because SLATE computes results per tile and then sums up tile results, its answer is more accurate than ScaLAPACK's. LAPACK computes the same answer as single node ScaLAPACK (except Intel MKL appears to have fixed the issue already). Below is output for genorm (i.e., lange) before and after changes. The relative error is computed as
(||A||_lapack - ||A||_slate) / (sqrt(m n) ||A||_lapack),
with a tolerance of error < 3*eps to pass. Note that the error grows with the size of the matrix. Also, the same matrix is used in single and double, so the norms should match.

[note: output modified for clarity and brevity]
slate/test> ./test genorm --norm fro --dim 500:4000:500,10000 --verbose 1 --type s,d

                                        SLATE       ScaLAPACK  relative
type     norm       m       n            norm            norm     error  status
single    fro     500     500  2.88898743e+02  2.88892822e+02  4.10e-08  pass
single    fro    1000    1000  5.77679993e+02  5.77577881e+02  1.77e-07  pass
single    fro    1500    1500  8.66120239e+02  8.65607910e+02  3.95e-07  FAILED
single    fro    2000    2000  1.15464795e+03  1.15303625e+03  6.99e-07  FAILED
single    fro    2500    2500  1.44320764e+03  1.44007178e+03  8.71e-07  FAILED
single    fro    3000    3000  1.73191736e+03  1.72434570e+03  1.46e-06  FAILED
single    fro    3500    3500  2.02049438e+03  2.00892126e+03  1.65e-06  FAILED
single    fro    4000    4000  2.30940161e+03  2.28700610e+03  2.45e-06  FAILED
single    fro   10000   10000  5.77353906e+03  4.09599976e+03  4.10e-05  FAILED

double    fro     500     500  2.88898764e+02  2.88898764e+02  8.26e-18  pass
double    fro    1000    1000  5.77679957e+02  5.77679957e+02  1.18e-17  pass
double    fro    1500    1500  8.66120383e+02  8.66120383e+02  1.29e-17  pass
double    fro    2000    2000  1.15464767e+03  1.15464767e+03  2.73e-17  pass
double    fro    2500    2500  1.44320751e+03  1.44320751e+03  2.52e-19  pass
double    fro    3000    3000  1.73191680e+03  1.73191680e+03  1.14e-18  pass
double    fro    3500    3500  2.02049402e+03  2.02049402e+03  5.21e-18  pass
double    fro    4000    4000  2.30940433e+03  2.30940433e+03  1.72e-18  pass
double    fro   10000   10000  5.77356340e+03  5.77356340e+03  1.67e-17  pass
7 tests FAILED.

With updated ScaLAPACK code, all the tests pass.

slate/test> ./test genorm --norm fro --dim 500:4000:500,10000 --verbose 1 --type s,d

                                        SLATE     (Sca)LAPACK  relative
type     norm       m       n            norm            norm     error  status
single    fro     500     500  2.88898743e+02  2.88898773e+02  2.11e-10  pass
single    fro    1000    1000  5.77679993e+02  5.77679932e+02  1.06e-10  pass
single    fro    1500    1500  8.66120300e+02  8.66120544e+02  1.88e-10  pass
single    fro    2000    2000  1.15464771e+03  1.15464832e+03  2.64e-10  pass
single    fro    2500    2500  1.44320740e+03  1.44320703e+03  1.01e-10  pass
single    fro    3000    3000  1.73191687e+03  1.73191687e+03  0.00e+00  pass
single    fro    3500    3500  2.02049390e+03  2.02049280e+03  1.55e-10  pass
single    fro    4000    4000  2.30940112e+03  2.30940430e+03  3.44e-10  pass
single    fro   10000   10000  5.77354053e+03  5.77353467e+03  1.01e-10  pass

double    fro     500     500  2.88898764e+02  2.88898764e+02  3.94e-19  pass
double    fro    1000    1000  5.77679957e+02  5.77679957e+02  3.94e-19  pass
double    fro    1500    1500  8.66120383e+02  8.66120383e+02  8.75e-20  pass
double    fro    2000    2000  1.15464767e+03  1.15464767e+03  9.85e-20  pass
double    fro    2500    2500  1.44320751e+03  1.44320751e+03  1.89e-19  pass
double    fro    3000    3000  1.73191680e+03  1.73191680e+03  4.81e-19  pass
double    fro    3500    3500  2.02049402e+03  2.02049402e+03  0.00e+00  pass
double    fro    4000    4000  2.30940433e+03  2.30940433e+03  0.00e+00  pass
double    fro   10000   10000  5.77356340e+03  5.77356340e+03  4.10e-19  pass
All tests passed.

NB: there are other issues with ScaLAPACK's *lantr due to incorrect indexing. We also have fixes for those. They don't affect LAPACK.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions