You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While evaluating with TorchRec ShardedEmbeddingCollection under a distributed env, some rank may get zero input for the last global batch. However fbgemm.block_bucketize_sparse_features will throw errors when the lengths tensor contains no elements:
Traceback (most recent call last):
x = torch.ops.fbgemm.block_bucketize_sparse_features(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1120, in __call__
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Bug description
While evaluating with TorchRec ShardedEmbeddingCollection under a distributed env, some rank may get zero input for the last global batch. However fbgemm.block_bucketize_sparse_features will throw errors when the lengths tensor contains no elements:
Can you help to support an empty input ? Thanks!
reproducer
ENV
The text was updated successfully, but these errors were encountered: