You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Moving all tensor allocation from cpu to meta device in SplitTableBatchedEmbeddingBagsCodegen
Summary: Used profiler logs sorted by `cpu_memory_usage` in `embedding_bag_wprofiler_gpu_test.py` to add device kwarg to all tensor allocation sites in `split_table_batched_embeddings_ops.py`, so that they can be materialized on the meta device. Some `torch.tensor` calls switched out for `torch.zeros` calls (where appropriate) to avoid temporary allocation on CPU memory. Still, some `torch.tensor` calls were kept with temp. CPU memory allocation but with final materialization on the meta device.
Reviewed By: xush6528
Differential Revision: D29566376
fbshipit-source-id: c01575127cb2392f95ec1d3712ad43803a373db5
0 commit comments