You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support for enabling sparse gradients in EmbeddingBag.
Motivation
Adding support for sparse gradients will allow to fit larger embedding tables on the TPU.
Pitch
I encountered the following error when turning on the sparse=True in EmbeddingBag API.
NotImplementedError: Could not run 'aten::_sparse_coo_tensor_with_dims_and_tensors' with arguments from the 'SparseXLA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_sparse_coo_tensor_with_dims_and_tensors' is only available for these backends: [XLA, Meta, SparseCPU, SparseCUDA, SparseMeta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastXLA, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
Without this flag, memory consumption doubles during training (Embedding table memory size + gradients for the same). If we implement the support for the sparse gradients we can almost double the embedding_dim of any model on the same hardware (provided it doesn't exceed the HBM).
Alternatives
Alternative is to just keep using EmbeddingBag with sparse=False.
Additional context
The text was updated successfully, but these errors were encountered:
🚀 Feature
Support for enabling sparse gradients in EmbeddingBag.
Motivation
Adding support for sparse gradients will allow to fit larger embedding tables on the TPU.
Pitch
I encountered the following error when turning on the
sparse=True
in EmbeddingBag API.Without this flag, memory consumption doubles during training (Embedding table memory size + gradients for the same). If we implement the support for the sparse gradients we can almost double the embedding_dim of any model on the same hardware (provided it doesn't exceed the HBM).
Alternatives
Alternative is to just keep using EmbeddingBag with
sparse=False
.Additional context
The text was updated successfully, but these errors were encountered: