Use .backward() with in-place grad mutations for the GA API #8768

rpsilva-aws · 2025-02-27T21:23:53Z

In this PR, we switch to using .backward() instead of torch.autograd.grad due to #8729. We resolve this similarly to scan (https://github.com/rpsilva-aws/xla/blob/master/torch_xla/experimental/scan.py#L232), except in our case, since we have full control over the backward pass, we can explicitly do clone over the input gradients, knowingly that the underlying tensors will be updated in-place.

In short, my understanding of why we need to clone is because once we do an in-place mutation, the resulting IR node will not be a device data. Since we rely on the lowering context to collect all parameters in the mapping (device_parameter_id_tensor_mapping), this IR will be exempted, since it'll be an IR op (e.g. %add). This doesn't allow us to capture it as a parameter to other XLA while computations (condition and init). This is a known issue with this and the scan API, and we should eventually find a more robust way around it. If we did so, we could offer a more general and all-purpose while_loop/fori_loop API for users (and use that as part of scan and this gradient accumulation API).

rpsilva-aws · 2025-02-27T21:31:24Z

Canceled the workflow, this is pending on #8758.

bhavya01 · 2025-02-28T19:57:05Z

torch_xla/experimental/gradient_accumulation.py

-            *acc_grads)
+    loss.backward()
+    grads = [param.grad for param in params]
+    return (iteri, loss, *iterable_tensors, *carried_tensors, *params, *grads)


Why do we need the grads variable in the body_fn if we are doing an in-place update? params argument should have everything we need right?

Indeed. When tracing the body, it'll identify the XLA device data inputs for the gradients, and so, we need to make sure that those are part of the output (T -> T) for both the body and condition - so I added those explicitly since it was clearer when working with torch.autograd.grad. Now that we have changed to .backward() and in-place, we should ideally leverage the hoisted vars to achieve this automatically. This can be out-of-scope in this PR (unblocking chkpt), since we should eventually revamp/unify most of the internals with while_loop/fori_loop/scan as we better understand the existing LTC limitations/complications with the XLA while op. I can create a follow-up issue for this, thanks for raising it.

rpsilva-aws · 2025-03-04T17:45:14Z

@bhavya01 Can you PTAL once you find the time? Thanks

rpsilva-aws · 2025-03-05T19:57:00Z

Kind reminder @tengyifei @bhavya01, so I have time to include a couple follow-ups in time for 2.7.0 - one of which is to leverage @tengyifei 's #8785, so the clone would simultaneously be for just staging (separating the IR that is fed to the body ctx, and the one that is used in the mapping).

Use .backward() with in-place grad mutations for the GA API (#8768) Use placeholder tensor in scan (#8785) Pin update to 20250303 (#8788) Co-authored-by: Chengji Yao <[email protected]> correct linter

rpsilva-aws force-pushed the rpsilva_grad_acc_chkpt branch from c49a1fd to 5922c4d Compare February 27, 2025 21:26

Use .backward() with in-place grad mutations for the GA API

ecc92bd

rpsilva-aws force-pushed the rpsilva_grad_acc_chkpt branch from 5922c4d to ecc92bd Compare February 28, 2025 00:49

rpsilva-aws marked this pull request as ready for review February 28, 2025 00:50

rpsilva-aws requested a review from tengyifei February 28, 2025 00:50

bhavya01 reviewed Feb 28, 2025

View reviewed changes

rpsilva-aws requested a review from bhavya01 March 3, 2025 18:09

bhavya01 approved these changes Mar 5, 2025

View reviewed changes

bhavya01 merged commit 4540d81 into pytorch:master Mar 5, 2025
23 checks passed

rpsilva-aws deleted the rpsilva_grad_acc_chkpt branch March 5, 2025 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use .backward() with in-place grad mutations for the GA API #8768

Use .backward() with in-place grad mutations for the GA API #8768

rpsilva-aws commented Feb 27, 2025

rpsilva-aws commented Feb 27, 2025

bhavya01 Feb 28, 2025

rpsilva-aws Feb 28, 2025

rpsilva-aws commented Mar 4, 2025

rpsilva-aws commented Mar 5, 2025 •

edited

Loading

Use .backward() with in-place grad mutations for the GA API #8768

Use .backward() with in-place grad mutations for the GA API #8768

Conversation

rpsilva-aws commented Feb 27, 2025

rpsilva-aws commented Feb 27, 2025

bhavya01 Feb 28, 2025

Choose a reason for hiding this comment

rpsilva-aws Feb 28, 2025

Choose a reason for hiding this comment

rpsilva-aws commented Mar 4, 2025

rpsilva-aws commented Mar 5, 2025 • edited Loading

rpsilva-aws commented Mar 5, 2025 •

edited

Loading