-
Notifications
You must be signed in to change notification settings - Fork 101
Optimise Attention Mechanisms #145
Optimise Attention Mechanisms #145
Conversation
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
|
Waiting for version 0.0.16 from xformers |
|
Adopting Linear layers is more efficiency instead of the 1x1 convs |
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
|
On my TITAN RTX, the new attention layers are making the 2D ddpm tutorial consume 20Gb, 33 sec per training epoch and 15 sec to sample 1 image. When using xformers, it consume a little less memory 18Gb, 38 sec per training epoch and 10 sec to sample 1 image. When I tested the autoencoderKL, it had no significant difference between with and without xformers. I will try with an A100 yet |
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
|
On a A100, the 2D DDPM tutorial takes 15-16 s per training epoch, 19 Gb of memory, and 8 s to sample. With xformers it takes 18-19s per training epoch, 16.4Gb of memory, and 8 s to generate 1 sample |
…attention-mechanisms # Conflicts: # generative/networks/nets/diffusion_model_unet.py
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
danieltudosiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pull request is good. But we should try not to create so much duplicate code as in this PR. If time allows it please try and create an attention utils or something similar and aggregate reusable methods there.
…attention-mechanisms
Signed-off-by: Walter Hugo Lopez Pinaya <[email protected]>
Signed-off-by: Walter Hugo Lopez Pinaya [email protected]