Skip to content

Add resource (CPU,RAM,GPU,thread count) monitoring to AutoML experiments #6320

@andrasfuchs

Description

@andrasfuchs

Is your feature request related to a problem? Please describe.
As others also experienced, AutoML training is heavy on CPU and RAM and it can cause slowdowns and crashes (#6175, #6286, #6288, #6297). I sometimes run into an issue where some of my trials run longer than expected, potentially because my systems ran out of one of my resources. I had a few system crashes as well, when running AutoML forced Windows to start closing other applications.

Describe the solution you'd like
It would be great to have more information about the running AutoML trials, including how much CPU, RAM, GPU are using on how many threads. Ideally it would be included in a new, periodically called method on AutoML's IMonitor interface.
If this was combined with an extended experiment control (#5736), we could make clever decisions about a trial or experiment depending on its resource usage. We could pause the experiment if the system is out of resources, or even cancel a trial if it uses suspiciously high amount of RAM to prevent system failure, for example. (As it happens sometimes with my experiments.)

Describe alternatives you've considered
Well, theoretically I could monitor my system resources constantly on a separate thread, but I still couldn't determine if AutoML is the reason for an elevated CPU, RAM or GPU usage, or something else running on the system independently from AutoML.

Additional context
This issue is related to AutoML experiment resource usage limiting (#6061) and AutoML experiment control (#5736).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions