Skip to content

ModelCheckpoint with custom filepath don't support training on multiple nodes #2916

@angshine

Description

@angshine

🐛 Bug

When training on multiple nodes using ModelCheckpoint with custom filepath, it will raise FileExistsError caused by the following line of code: model_checkpoint.py#L127.

Maybe a try-except block is needed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedOpen to be worked onpriority: 0High priority task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions