Manually register einsum xla #8787

pgmoka · 2025-03-04T20:11:31Z

Do manual registration of XLANativeFunctions::einsum for XLA.

This is necessary because currently PyTorch overwrites the key AutogradXLA registration with a its XLA key registration. While ideally we would be able to resolve this problem, this work around resolves the issue from our end. It is also not possible to use full code generation due to #8739.

This manual registration relies on the XLANativeFunctions::einsum function from xla/torch_xla/csrc/aten_xla_type.cpp

pgmoka · 2025-03-04T20:28:13Z

As the overwrite is written, there is no meaningful unit test we can add as we rely on the generated XLANativeFunctions function. I could do something like what nms_kernel is doing, and refer to tensor_methods::einsum, but this will require rewriting the conditions from

xla/torch_xla/csrc/aten_xla_type.cpp

Line 1531 in 17270e2

at::Tensor XLANativeFunctions::einsum(std::string_view equation,

This cleanest way to do something like that would be to create a utility function shared by both XLANativeFunctions::einsum and our overwrite, and then that might let us test the overwrite. This can be the next step here, or if we think it is unnecessary, we can just do the implementation from this PR.

nms_kernel is itself not being tested directly as far as I can tell. Perhaps this is a separate larger issue we can track separately from the PR.

@tengyifei @ysiraichi: Do y'all have any opinions on this?

tengyifei · 2025-03-04T22:48:38Z

@pgmoka are you looking for a unit test? A good test IMO is what we wrote in the https://github.com/tengyifei/playground/blob/master/aot-einsum-3.ipynb notebook. We could verify the lowering of einsum in an custom op.

Another test is we should remove the two workarounds referenced in https://github.com/search?q=repo%3Apytorch%2Fxla+8713&type=code, and then the unit test for XLAPatchedLinear should still pass. Because we also check its lowering there.

pgmoka · 2025-03-04T23:31:11Z

CC: @lsy323

tengyifei

Let's add a unit test in Python and also remove the _xla_einsum workaround in this PR (which will also test that this registration worked).

In this PR, we modify this behavior to always accommodate explicitly donated buffers, working simultaneously with LTC, IF enabled.

Co-authored-by: Siyuan Liu <[email protected]>

Co-authored-by: Han Qi <[email protected]>

tengyifei · 2025-03-05T20:09:06Z

Not sure how all these commits got into this branch. Usually I rebase the branch on top of the latest master and then force push. This way I only have a single commit in the PR.

ysiraichi · 2025-03-05T20:10:28Z

One thing you can do is to check for xla::einsum in the XLA counters. I believe that, before your PR, it wouldn't be in there, since the CompositeImplicitAutograd kernel was called.

…ding

pgmoka · 2025-03-05T20:21:47Z

Not sure how all these commits got into this branch. Usually I rebase the branch on top of the latest master and then force push. This way I only have a single commit in the PR.

I honestly don't know how this happened either. I think I messed something up while fetching the current master to rebase to branch with. I needed to do this to get the latest changes related to _einsum. The final state is what I wanted, but it creates this unfortunate commit map on the PR

tengyifei · 2025-03-05T21:06:05Z

I honestly don't know how this happened either. I think I messed something up while fetching the current master to rebase to branch with. I needed to do this to get the latest changes related to _einsum. The final state is what I wanted, but it creates this unfortunate commit map on the PR

Gotcha. In that case could you squash the commits from git and reset the commit message so that it hopefully doesn't confuse future readers? Thanks!

Use .backward() with in-place grad mutations for the GA API (#8768) Use placeholder tensor in scan (#8785) Pin update to 20250303 (#8788) Co-authored-by: Chengji Yao <[email protected]> correct linter

pgmoka · 2025-03-05T23:33:34Z

Too many conflicts. I accidentally merged from master rather than rebasing, and it caused a bunch of issues. My changes are small enough that I will just carry on in a separate PR. I apologize to the reviewers for the noise

pgmoka added 3 commits March 1, 2025 02:46

add einsum overwrite

87dc435

add manual registration dependency

33b2338

add comment to ease tracing in the future

57a72c8

pgmoka self-assigned this Mar 4, 2025

pgmoka added 2 commits March 4, 2025 20:19

lint change

390552e

lint change

7f41b38

pgmoka requested review from tengyifei and ysiraichi March 4, 2025 20:23

pgmoka mentioned this pull request Mar 4, 2025

torch.einsum is incorrectly decomposed when wrapped inside a custom op #8713

Closed

tengyifei marked this pull request as ready for review March 4, 2025 22:46

tengyifei requested changes Mar 5, 2025

View reviewed changes

vanbasten23 and others added 16 commits March 5, 2025 19:53

Split page indices in the ragged paged attention. (#8688)

c06e7d2

Add information to xla.launch (#8724)

6308b17

Add autograd function for mark_sharding (#8723)

b6469a4

Improve the ragged kernel benchmarking script. (#8733)

fa30a7d

Add padding ragged paged attention test (#8741)

53ed842

Introduce apply_xla_patch_to_nn_linear and test that in a scan (#8739)

a54b0ed

Extend buffer donation aliasing APIs (#8721)

d2af485

In this PR, we modify this behavior to always accommodate explicitly donated buffers, working simultaneously with LTC, IF enabled.

Increase tolerance in test_scan_xla_patched_linear (#8749)

44dec55

Add --network=host to TPU docker build command (#8735)

7503da6

Add start_trace and stop_trace API in profiler (#8743)

a9f4a10

Lower as_strided_copy use fast path with slice (#8734)

db57f00

Change num_seqs type from int to torch.Tensor (#8736)

db3af50

Enable default buffer donation for gradient accumulation (#8758)

82cd4ce

Revert "Change num_seqs type from int to torch.Tensor" (#8767)

c89aa2e

Add sm_scale in ragged attention kernel (#8771)

e26cf92

[scan] Make sure inputs into fn are not device_data IR nodes (#8769)

494c7ee

qihqi and others added 10 commits March 5, 2025 19:53

Misc changes to make torchax runnable on GPU. (#8756)

16aca5b

Co-authored-by: Siyuan Liu <[email protected]>

Fix build on Python 3.9 (#8759)

76bdd9d

Introduce a GRU module implemented with scan (#8777)

a954763

write _shard_map; refactor flash attention to support 5d inputs. (#8730)

4a71be2

Build torch_xla wheel in build script (#8782)

dfdf8b3

Minimal support for calling JAX from PyTorch/XLA (#8781)

083a1c0

Co-authored-by: Han Qi <[email protected]>

Integrate ragged paged attention v2 (#8791)

47ec58c

remove _einsum workaround

e3d11da

Merge branch 'master' into manually_register_einsum_XLA

5945fb1

remove _einsum workaround for init binding

36f6d96

Remove custom torch.einsum direct reference in custom op for xla_shar…

e8939da

…ding

pgmoka enabled auto-merge (squash) March 5, 2025 22:41

Remove references to torch_xla._XLAC._xla_einsum from xla_sharding.py

b922fa0

Use .backward() with in-place grad mutations for the GA API (#8768) Use placeholder tensor in scan (#8785) Pin update to 20250303 (#8788) Co-authored-by: Chengji Yao <[email protected]> correct linter

pgmoka force-pushed the manually_register_einsum_XLA branch from e2aace1 to b922fa0 Compare March 5, 2025 23:02

write _shard_map; refactor flash attention to support 5d inputs. (#8730)

494df92

pgmoka closed this Mar 5, 2025

auto-merge was automatically disabled March 5, 2025 23:32
Pull request was closed

pgmoka mentioned this pull request Mar 6, 2025

Manually register einsum on xla #8801

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually register einsum xla #8787

Manually register einsum xla #8787

pgmoka commented Mar 4, 2025 •

edited

Loading

pgmoka commented Mar 4, 2025 •

edited

Loading

tengyifei commented Mar 4, 2025

pgmoka commented Mar 4, 2025

tengyifei left a comment

tengyifei commented Mar 5, 2025

ysiraichi commented Mar 5, 2025

pgmoka commented Mar 5, 2025

tengyifei commented Mar 5, 2025

pgmoka commented Mar 5, 2025

Manually register einsum xla #8787

Manually register einsum xla #8787

Conversation

pgmoka commented Mar 4, 2025 • edited Loading

pgmoka commented Mar 4, 2025 • edited Loading

tengyifei commented Mar 4, 2025

pgmoka commented Mar 4, 2025

tengyifei left a comment

Choose a reason for hiding this comment

tengyifei commented Mar 5, 2025

ysiraichi commented Mar 5, 2025

pgmoka commented Mar 5, 2025

tengyifei commented Mar 5, 2025

pgmoka commented Mar 5, 2025

pgmoka commented Mar 4, 2025 •

edited

Loading

pgmoka commented Mar 4, 2025 •

edited

Loading