Skip to content

[ML] AucRoc gives misleading results when num_top_classes is set too low. #63306

@przemekwitek

Description

@przemekwitek

In the case of multiclass classification, the calculation of AucRoc should require that the class in question appears in all documents top classes arrays, so that we know its probability for every document.
Otherwise, the results are not correct or, in some cases, as pointed out by @wwang500, the evaluation request fails because it cannot find even one single document with the class in question listed in top classes.

The solution is to set num_top_classes so that it is greater or equal to the total number of classes. We should minimize the surprise for the users though and possibly apply a sensible default ourselves.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :mlMachine learning>bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions