Support for enabling sparse gradients in EmbeddingBag #8719

chandrasekhard2 · 2025-02-18T19:38:18Z

🚀 Feature

Support for enabling sparse gradients in EmbeddingBag.

Motivation

Adding support for sparse gradients will allow to fit larger embedding tables on the TPU.

Pitch

I encountered the following error when turning on the sparse=True in EmbeddingBag API.

NotImplementedError: Could not run 'aten::_sparse_coo_tensor_with_dims_and_tensors' with arguments from the 'SparseXLA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_sparse_coo_tensor_with_dims_and_tensors' is only available for these backends: [XLA, Meta, SparseCPU, SparseCUDA, SparseMeta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastXLA, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

Without this flag, memory consumption doubles during training (Embedding table memory size + gradients for the same). If we implement the support for the sparse gradients we can almost double the embedding_dim of any model on the same hardware (provided it doesn't exceed the HBM).

Alternatives

Alternative is to just keep using EmbeddingBag with sparse=False.

Additional context

The text was updated successfully, but these errors were encountered:

miladm · 2025-02-19T19:02:01Z

@ysiraichi do we have bandwidth to get started on this work this week?

cc @qihqi

miladm assigned ysiraichi Feb 19, 2025

miladm added lowering ATen Operation lowering (deprecated) core aten opset and removed (deprecated) core aten opset labels Feb 19, 2025

ysiraichi added the enhancement New feature or request label Mar 19, 2025

amjames linked a pull request Mar 28, 2025 that will close this issue

[Draft] Add Experimental limited sparse embedding bag #8905

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for enabling sparse gradients in EmbeddingBag #8719

Support for enabling sparse gradients in EmbeddingBag #8719

chandrasekhard2 commented Feb 18, 2025

miladm commented Feb 19, 2025

Support for enabling sparse gradients in EmbeddingBag #8719

Support for enabling sparse gradients in EmbeddingBag #8719

Comments

chandrasekhard2 commented Feb 18, 2025

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

miladm commented Feb 19, 2025