Accuracy issue in norms (lange, lansy, lantr, etc.)

There's an accuracy issue in the LAPACK and ScaLAPACK Frobenius norms, which exhibits at large sizes, particularly in single precision. Ignoring the scaling for the moment, it computes a sum:

    sum = 0;
    for j = 1:n
        for i = 1:m
            sum = sum + abs(A(i,j))^2;
        end
    end

For large matrices, clearly sum >> abs(A(i,j))^2, making the addition inaccurate. A better, though not perfect, solution would be to sum each column individually, and accumulate the column sums:

    sum = 0;
    for j = 1:n
        colsum = 0;
        for i = 1:m
            colsum = colsum + abs(A(i,j))^2;
        end
        sum = sum + colsum;
    end

For extremely tall matrices (say, 1e6 rows), there would still be an issue. I've implemented this fix in ScaLAPACK *lange, *lansy, *lantr, and can easily do the same changes for LAPACK. It adds only a few multiply-adds and a divide per column to deal with the scaling, so has trivial performance impact.

We noticed this issue in developing SLATE. Because SLATE computes results per tile and then sums up tile results, its answer is more accurate than ScaLAPACK's. LAPACK computes the same answer as single node ScaLAPACK (except Intel MKL appears to have fixed the issue already). Below is output for genorm (i.e., lange) before and after changes. The relative error is computed as
    (||A||_lapack - ||A||_slate) / (sqrt(m n) ||A||_lapack),
with a tolerance of error < 3*eps to pass. Note that the error grows with the size of the matrix. Also, the same matrix is used in single and double, so the norms should match.


    [note: output modified for clarity and brevity]
    slate/test> ./test genorm --norm fro --dim 500:4000:500,10000 --verbose 1 --type s,d

                                            SLATE       ScaLAPACK  relative
    type     norm       m       n            norm            norm     error  status
    single    fro     500     500  2.88898743e+02  2.88892822e+02  4.10e-08  pass
    single    fro    1000    1000  5.77679993e+02  5.77577881e+02  1.77e-07  pass
    single    fro    1500    1500  8.66120239e+02  8.65607910e+02  3.95e-07  FAILED
    single    fro    2000    2000  1.15464795e+03  1.15303625e+03  6.99e-07  FAILED
    single    fro    2500    2500  1.44320764e+03  1.44007178e+03  8.71e-07  FAILED
    single    fro    3000    3000  1.73191736e+03  1.72434570e+03  1.46e-06  FAILED
    single    fro    3500    3500  2.02049438e+03  2.00892126e+03  1.65e-06  FAILED
    single    fro    4000    4000  2.30940161e+03  2.28700610e+03  2.45e-06  FAILED
    single    fro   10000   10000  5.77353906e+03  4.09599976e+03  4.10e-05  FAILED

    double    fro     500     500  2.88898764e+02  2.88898764e+02  8.26e-18  pass
    double    fro    1000    1000  5.77679957e+02  5.77679957e+02  1.18e-17  pass
    double    fro    1500    1500  8.66120383e+02  8.66120383e+02  1.29e-17  pass
    double    fro    2000    2000  1.15464767e+03  1.15464767e+03  2.73e-17  pass
    double    fro    2500    2500  1.44320751e+03  1.44320751e+03  2.52e-19  pass
    double    fro    3000    3000  1.73191680e+03  1.73191680e+03  1.14e-18  pass
    double    fro    3500    3500  2.02049402e+03  2.02049402e+03  5.21e-18  pass
    double    fro    4000    4000  2.30940433e+03  2.30940433e+03  1.72e-18  pass
    double    fro   10000   10000  5.77356340e+03  5.77356340e+03  1.67e-17  pass
    7 tests FAILED.


With updated ScaLAPACK code, all the tests pass.

    slate/test> ./test genorm --norm fro --dim 500:4000:500,10000 --verbose 1 --type s,d

                                            SLATE     (Sca)LAPACK  relative
    type     norm       m       n            norm            norm     error  status
    single    fro     500     500  2.88898743e+02  2.88898773e+02  2.11e-10  pass
    single    fro    1000    1000  5.77679993e+02  5.77679932e+02  1.06e-10  pass
    single    fro    1500    1500  8.66120300e+02  8.66120544e+02  1.88e-10  pass
    single    fro    2000    2000  1.15464771e+03  1.15464832e+03  2.64e-10  pass
    single    fro    2500    2500  1.44320740e+03  1.44320703e+03  1.01e-10  pass
    single    fro    3000    3000  1.73191687e+03  1.73191687e+03  0.00e+00  pass
    single    fro    3500    3500  2.02049390e+03  2.02049280e+03  1.55e-10  pass
    single    fro    4000    4000  2.30940112e+03  2.30940430e+03  3.44e-10  pass
    single    fro   10000   10000  5.77354053e+03  5.77353467e+03  1.01e-10  pass

    double    fro     500     500  2.88898764e+02  2.88898764e+02  3.94e-19  pass
    double    fro    1000    1000  5.77679957e+02  5.77679957e+02  3.94e-19  pass
    double    fro    1500    1500  8.66120383e+02  8.66120383e+02  8.75e-20  pass
    double    fro    2000    2000  1.15464767e+03  1.15464767e+03  9.85e-20  pass
    double    fro    2500    2500  1.44320751e+03  1.44320751e+03  1.89e-19  pass
    double    fro    3000    3000  1.73191680e+03  1.73191680e+03  4.81e-19  pass
    double    fro    3500    3500  2.02049402e+03  2.02049402e+03  0.00e+00  pass
    double    fro    4000    4000  2.30940433e+03  2.30940433e+03  0.00e+00  pass
    double    fro   10000   10000  5.77356340e+03  5.77356340e+03  4.10e-19  pass
    All tests passed.

NB: there are other issues with ScaLAPACK's *lantr due to incorrect indexing. We also have fixes for those. They don't affect LAPACK.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accuracy issue in norms (lange, lansy, lantr, etc.) #261

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accuracy issue in norms (lange, lansy, lantr, etc.) #261

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions