Extend buffer donation aliasing APIs #8721

rpsilva-aws · 2025-02-19T01:15:40Z

Enables #8711. In particular, it alleviates the fallback aliasing logic for user config. Previously, it would only be applied if auto LTC did not infer any needed aliasing which is a semantically suboptimal surface area. In this PR, we modify this behavior to always accommodate explicitly donated buffers, working simultaneously with LTC, IF enabled.

In this PR, it is out of scope to address the per-device concerns associated with having global (DeviceArenaContext) metadata for all the devices with respect to aliasing and other dynamo execution -specific logic.

rpsilva-aws · 2025-02-19T18:48:13Z

I had

Removes the unnecessary force_ltc_data effect on cache hash, since this may be present without any inferred aliasing. Hence, this will unnecessarily trigger a recompilation even if virtually no difference in the HLO.

but that was failing test_fallback_multiple_submodules, since it expected the exact same graph hash to trigger a recompilation, even though the underlying computation is exactly the same. This seems to be because during cache warm up, we reflect force_ltc_data on the hash, instead of having it only refer to the resulting aliasing. IMO, it's not correct to have a aliasing factor be included in the resulting hash, as opposed to only the resulting aliasing (e.g. warming up the cache with a computation without aliasing, should not retrigger a recompilation on the following same computation without aliasing). It's out of scope in this PR, so removed it for now.

ysiraichi

Overall, LGTM. I do think it would be better to leave non-trivial refactorings for another PR. It would make reviewing easier.

That said, I'm not entirely familiar with this buffer donor system. Maybe @tengyifei @lsy323 @vanbasten23 @bhavya01 could help review.

torch_xla/__init__.py

test/test_input_output_aliases.py

torch_xla/csrc/init_python_bindings.cpp

torch_xla/csrc/xla_graph_executor.h

torch_xla/experimental/gradient_accumulation.py

test/test_input_output_aliases.py

test/dynamo/test_dynamo_aliasing.py

test/neuron/run_tests.sh

torch_xla/csrc/xla_graph_executor.cpp

rpsilva-aws · 2025-02-20T22:32:24Z

As suggested by @ysiraichi, so we don't block on this needed PR, deferring:

Switch to using a batch call for the donation - since in practice, there should be a large number of tensors that can be collected and simultaneously annotated. Note that the dynamo execution API remain unchanged, as these are intentionally not modified in this PR.

In addition to separating the following as separate PRs (I'll do so shortly after closing this one):

Refining the parameter disabling user behavior (Extend buffer donation aliasing APIs #8721 (comment))
Gradient accumulation API changes

ysiraichi · 2025-02-20T23:06:05Z

I'll do so shortly after closing this one

Just so we are clear: are you closing this one? Or do you mean after this gets merged?

rpsilva-aws · 2025-02-20T23:54:14Z

I'll do so shortly after closing this one

Just so we are clear: are you closing this one? Or do you mean after this gets merged?

I will do these as separate PRs after this one is merged.

ysiraichi

Just one comment. Looks good otherwise.
Could you also change the PR description so we know exactly what changes this PR is introducing?

torch_xla/csrc/xla_graph_executor.cpp

tengyifei

Apologies for the delay. I'll be frank I don't understand the internal details that well. In fact nobody on the team does. So I've asked some questions on the PR.

test/test_input_output_aliases.py

tengyifei · 2025-02-22T01:20:30Z

test/test_input_output_aliases.py

+      self.assertEqual(met.metric_data("InputOutputAliasCount")[1], 2.0)
+
+  def test_user_config_donation_with_ltc_donation_overlap(self):
+    with alias_with_buffer_donor_config_context(True):


What does this config context do?

It enables the general behavior of buffer donation. In other words, this (_xla_set_enable_alias_with_buffer_donor_config) needs to be set, so that the tensors (with the associated XLA device data backend) that were explicitly marked to be donated (_set_buffer_donation) can be donated in the next execution (i.e. mark step) [1]. It's effectively a general opt-in for the entire donation behavior (arguably we wouldn't need it since _set_buffer_donation is already assuming that the user does intend to explicitly donate it, but I just kept it as is).

#8711 describes the problem and use case, but Jack introduced this in the context of dynamo execution (#6587). Since it's an opt-in, we want to give users the flexibility to explicitly annotate the donation (precisely when they don't need the tensor after mark step). This is particularly helpful with the gradient accumulation (and scan), since we can't do in-place mutations of the fn parameters (aliasing is uncaptured by LTC), so this allows us to mitigate / close that gap. The main change in this PR, to resolve the PFR, is that we enable LTC and buffer donation to coexist (and alleviate the former constrained fallback behavior: if no LTC aliasings + user config is enabled -> donate the buffers). It doesn't alter the original intended behavior, since this heuristic is still directly maintained - but we widen the applications.

[1] XLA HLO donation ref: https://github.com/openxla/xla/blob/1c7d5f62e3b4c90a4b3484429c5042c69ceb5c5b/xla/service/hlo.proto#L479

Got it. So we need both the config and an explicit donation call.

tengyifei · 2025-02-22T01:21:53Z

test/test_input_output_aliases.py

+      self.assertIn('XlaSetBufferDonation', met.counter_names())
+      self.assertEqual(met.counter_value('XlaSetBufferDonation'), 1)
+      t1 = t0 + 1
+      torch_xla._XLAC._xla_sync_multi([t0, t1], [str(xla_device)], True, False)


What does sync_multi do? Is it the same as mark_step?

Mostly, but more flexible. It allows me to specify sync_xla_data=False, so that I can ensure that the aliasing is due to the user config (explicit buffer donation), and not auto LTC aliasings. So sync_ltc_data in SyncTensorsConfig will be false, so it won't include the default LTC aliasing.

I'll add a comment, since that's the motivation behind using it over mark_step.

tengyifei · 2025-02-22T01:23:54Z

torch_xla/csrc/xla_graph_executor.h

@@ -95,8 +95,7 @@ class XLAGraphExecutor : public torch::lazy::LazyGraphExecutor {
  torch::lazy::BackendDataPtr GetBaseSeedData(
      const torch::lazy::BackendDevice& device);

-  void SetAliasWithBufferDonorConfig(bool should_alias);
-
+  void SetAliasWithBufferDonorConfig(bool enable_alias);


I see you've renamed "should" to "enable". Does this imply some feature change?

Only in the sense that we allow is to run simultaneously with LTC aliasing, hence it should always be in effect. In this case, "should" for donating buffers (_set_buffer_donation) can be kept, because it may not necessarily donate it, since it depends whether this config is enabled (_xla_set_enable_alias_with_buffer_donor_config). I renamed the parameters for _xla_set_enable_alias_with_buffer_donor_config/SetAliasWithBufferDonorConfig, since we are enabling/allowing explicitly marked tensors (_set_buffer_donation) to be donated, simultaneously with LTC aliasing, so I figured that renaming it to "enable" is more suitable now.

rpsilva-aws · 2025-02-22T08:04:42Z

Apologies for the delay. I'll be frank I don't understand the internal details that well. In fact nobody on the team does. So I've asked some questions on the PR.

No worries. Thanks for taking a look - let me know if I was able to answer these. I'll revise with the test change asap.

rpsilva-aws · 2025-02-24T23:49:14Z

@tengyifei Done, PTAL

rpsilva-aws · 2025-02-25T17:35:00Z

@tengyifei @ysiraichi Hey folks, can you PTAL, thanks!

tengyifei · 2025-02-26T09:25:43Z

torch_xla/csrc/xla_graph_executor.cpp

+        tensors, coll.indices, parameters_data);
+  }
+
+  std::vector<size_t> user_config_buffer_donor_indices;


This "user config" name is really unfortunate. It should just be called "explicit donation". I was going to suggest renaming it until I realized this is an existing feature that you exposed.

Yes, unfortunately these are tied to existing API bindings, so they could already be used by users. We could rename, but we'd have to do it over a few release cycles with a deprecation warning.

ysiraichi

LGTM.

In this PR, we modify this behavior to always accommodate explicitly donated buffers, working simultaneously with LTC, IF enabled.

rpsilva-aws force-pushed the rpsilva_aliasing_extension branch 8 times, most recently from 0dc6476 to 03e26ac Compare February 19, 2025 03:03

rpsilva-aws requested review from tengyifei, miladm, jeffhataws and ysiraichi February 19, 2025 03:06

rpsilva-aws marked this pull request as ready for review February 19, 2025 03:12

rpsilva-aws force-pushed the rpsilva_aliasing_extension branch 2 times, most recently from 019f145 to 5cb4a6c Compare February 19, 2025 18:45

rpsilva-aws force-pushed the rpsilva_aliasing_extension branch from 5cb4a6c to ed28bcd Compare February 19, 2025 18:51

ysiraichi reviewed Feb 20, 2025

View reviewed changes

rpsilva-aws closed this Feb 20, 2025

rpsilva-aws force-pushed the rpsilva_aliasing_extension branch from ed28bcd to 6016023 Compare February 20, 2025 20:03

rpsilva-aws reopened this Feb 20, 2025

rpsilva-aws force-pushed the rpsilva_aliasing_extension branch 3 times, most recently from 3dac1fd to b6a9371 Compare February 20, 2025 22:29

rpsilva-aws force-pushed the rpsilva_aliasing_extension branch from b6a9371 to 3904c2c Compare February 20, 2025 22:35

rpsilva-aws requested a review from ysiraichi February 20, 2025 22:35

ysiraichi reviewed Feb 21, 2025

View reviewed changes

torch_xla/csrc/xla_graph_executor.cpp Show resolved Hide resolved

rpsilva-aws requested a review from ysiraichi February 21, 2025 05:48

tengyifei reviewed Feb 22, 2025

View reviewed changes

rpsilva-aws force-pushed the rpsilva_aliasing_extension branch from 3904c2c to 1d82b93 Compare February 24, 2025 18:22

rpsilva-aws requested a review from tengyifei February 24, 2025 18:24

Extend buffer donation aliasing APIs

575d0c5

rpsilva-aws force-pushed the rpsilva_aliasing_extension branch from 1d82b93 to 575d0c5 Compare February 24, 2025 20:43

tengyifei approved these changes Feb 26, 2025

View reviewed changes

ysiraichi approved these changes Feb 26, 2025

View reviewed changes

rpsilva-aws merged commit 0797642 into pytorch:master Feb 26, 2025
23 checks passed

rpsilva-aws deleted the rpsilva_aliasing_extension branch February 26, 2025 20:07

This was referenced Feb 26, 2025

Add bindings for disabling param aliasings #8751

Open

Enable default buffer donation for gradient accumulation #8758

Merged

pgmoka pushed a commit that referenced this pull request Mar 5, 2025

Extend buffer donation aliasing APIs (#8721)

d2af485

In this PR, we modify this behavior to always accommodate explicitly donated buffers, working simultaneously with LTC, IF enabled.

rpsilva-aws mentioned this pull request May 6, 2025

Refine the gradient accumulation API #9078

Merged

Extend buffer donation aliasing APIs #8721

Extend buffer donation aliasing APIs #8721

Uh oh!

Conversation

rpsilva-aws commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rpsilva-aws commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ysiraichi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rpsilva-aws commented Feb 20, 2025

Uh oh!

ysiraichi commented Feb 20, 2025

Uh oh!

rpsilva-aws commented Feb 20, 2025

Uh oh!

ysiraichi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tengyifei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tengyifei Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

rpsilva-aws Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tengyifei Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

tengyifei Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

rpsilva-aws Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tengyifei Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

rpsilva-aws Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rpsilva-aws commented Feb 22, 2025

Uh oh!

rpsilva-aws commented Feb 24, 2025

Uh oh!

rpsilva-aws commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tengyifei Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

rpsilva-aws Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

ysiraichi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rpsilva-aws commented Feb 19, 2025 •

edited

Loading

rpsilva-aws commented Feb 19, 2025 •

edited

Loading

rpsilva-aws Feb 22, 2025 •

edited

Loading

rpsilva-aws Feb 22, 2025 •

edited

Loading

rpsilva-aws Feb 22, 2025 •

edited

Loading

rpsilva-aws commented Feb 25, 2025 •

edited

Loading