Add runtime assert #98878

tugsbayasgalan · 2023-04-11T21:42:53Z

Stack from ghstack (oldest at bottom):

-> Add runtime assert #98878

This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions.

Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request.

In addition, we also make input constraints and intermediate constraints into runtime assertions utilizing aten._assert_async.msg.

Future work:

Assess whether we still need to introduce a fake control dependency
Explore adding non-async version of assert.

cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @soumith @desertfire

[ghstack-poisoned]

pytorch-bot · 2023-04-11T21:42:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98878

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Long queue for windows.4xlarge

❌ 2 Failures

As of commit 5e62d4c:

NEW FAILURES - The following jobs have failed:

cuda11.8-py3.10-gcc7-sm86 / test (inductor_torchbench_dynamic, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)

BROKEN TRUNK - The following jobs failed but were present on the merge base 57e1a50:

👉 Rebase onto the `viable/strict` branch to avoid these failures

cuda11.8-py3.10-gcc7-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

torch/_dynamo/eval_frame.py

ezyang · 2023-04-12T18:13:49Z

torch/_dynamo/symbolic_convert.py

@@ -245,12 +245,25 @@ def inner(self: "InstructionTranslatorBase", inst: Instruction):
                self.jump(inst)
                return

-            # Manually insert torch._assert instead of python assert and jump over
+            # Manually insert torch._assert_async instead of python assert and jump over


One interesting question is that, even if we capture the assert, should we also transform it async? If the original assert was an assert XXX, the user might be expecting a DtoH sync to happen, and maybe we should respect that, instead of silently turning it into async asserts that can trash the cuda context.

Yeah that seems like a problem, but we don't have an assert that is blocking, maybe we should add one instead of using assert_async

I see. I am not very familiar with CUDA so it will take me sometime to get it right :). I can do it in the follow up diff.

torch/_dynamo/symbolic_convert.py

torch/_dynamo/eval_frame.py

avikchaudhuri · 2023-04-13T23:09:19Z

torch/_dynamo/eval_frame.py

+            super().__init__(m)
+            self.count = 0
+            self.constraints_id_to_constraint = defaultdict(list)
+            if constraints is not None:


Is this captured from above? If so then why bother making self.constraints_id_to_constraint, maybe compute it above too...

Yeah I can :)

Actually left it here because this map is only needed within this pass.

avikchaudhuri · 2023-04-13T23:12:00Z

torch/_dynamo/eval_frame.py

+                min_int_val = _convert_to_int(constraint_range.vr.lower)
+                max_int_val = _convert_to_int(constraint_range.vr.upper)
+
+                if min_int_val is None and max_int_val is None:


I don't think these can be None? You should be able to trust upstream and assert in _convert_to_int I think...

It will be None if the expression is not something we can convert to integer (complicated guard expression).

avikchaudhuri · 2023-04-13T23:13:10Z

torch/_dynamo/eval_frame.py

+                dim = self.tracer.create_proxy('call_function', torch.ops.aten.sym_size, (arg, constraint.dim), {})
+                assert_msg = f"Input #{self.count}'s dimension #{constraint.dim} size is outside of supported dynamic range"
+
+                if min_int_val:


Why this check?

If the min_val is somehow None, we want skip asserting.

torch/_dynamo/eval_frame.py

avikchaudhuri · 2023-04-13T23:22:41Z

torch/_dynamo/eval_frame.py

+                # TODO ignore expressions for now
+                return None
+
+            for constraint in constraints:


It doesn't seem like a good idea to do each of these transformations by hand...rather, can't we generate Python code and trace it in? In the future there may be more kinds of constraints...

Yep I can explore this option in the follow up diff.

torch/_dynamo/eval_frame.py

This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

torch/_dynamo/eval_frame.py

ezyang · 2023-04-15T17:35:27Z

torch/_dynamo/eval_frame.py

@@ -643,6 +645,177 @@ def __bool__(self):
        )


+class _AddRuntimeAssertsInInputConstraint(torch.fx.interpreter.Transformer):


Hmmph, I still am failing to get the big picture here.

Let's suppose that I trace and export a model that does if x.size(3) > 10 do something else do something else. Furthermore, let's suppose I didn't pass in any explicit constraints (which I assume means "try to make everything as dynamic as possible"). Then you will end up with a model which requires x.size(3) > 10 to be sound. But you aren't going to have a constraint for it (because it's dynamic by default). Do you still want to put the tests into the graph?

More generally, it seems to me that you are trying to solve the problem of "what to do with guards" by directly putting them in the graph. But does this make sense? Why is this not tested out of band?

My thinking is that Avik's surfacing of guards will output those more fine-grained constraints directly to user (e.g x.size(3) > 10). And we suggest user to use those constraints directly on the input to the export call. And then, my pass will translate the new constraints into runtime assertions.

One thing we can explore is, we can put the constraints derived from guards directly into graph after we finish exporting (not asking user to pass in the constraint).

I guess, my point is, it's inconsistent to only do directly given constraints, and not also do implicit constraints from guards.

Yeah I agree. It should also convert the implicit constraints as well. Can do it in the follow up diff.

This is worth doing now, and is more important, and the primary reason for my requested changes.

This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

pytorchmergebot · 2023-04-18T18:26:40Z

Successfully rebased gh/tugsbayasgalan/108/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/98878)

ghstack-source-id: 5fc9fd3 Pull Request resolved: #98878

This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

tugsbayasgalan · 2023-04-18T19:14:59Z

@pytorchbot rebase

pytorchmergebot · 2023-04-18T19:17:07Z

@pytorchbot successfully started a rebase job. Check the current status here

This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints and intermediate constraints into runtime assertions utilizing `aten._assert_async.msg`. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Explore adding non-async version of assert. cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

pytorchmergebot · 2023-04-18T19:17:26Z

Successfully rebased gh/tugsbayasgalan/108/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/98878)

ghstack-source-id: eb9b71d Pull Request resolved: #98878

avikchaudhuri

Looks good for a start. Follow-up work (among other things pointed out in the implementation itself) is to broaden the scope, by adding assertions for other things, like specializations, equalities, etc.

voznesenskym · 2023-04-20T17:33:34Z

aten/src/ATen/native/TensorCompare.cpp

@@ -405,6 +405,10 @@ void _assert_async_cpu(const Tensor& self) {
  TORCH_CHECK(native::is_nonzero(self), "Expected Tensor with single nonzero value, but got zero");
 }

+void _assert_async_msg_cpu(const Tensor& self, c10::string_view assert_msg) {
+  TORCH_CHECK(native::is_nonzero(self), assert_msg != "" ? assert_msg : "Assertion is failed");


Use AT_ASSERT

This message is sus because you might violate semantics preserving notions here.

You also don't ever pass the tensor to the message. Which is fine, but kinda lame.

Sorry can you elaborate more what do you mean?

voznesenskym · 2023-04-20T17:34:50Z

test/dynamo/test_repros.py

+    @torch._dynamo.config.patch(
+        capture_scalar_outputs=True,
+        capture_dynamic_output_shape_ops=True,
+        dynamic_shapes=True,
+    )


Why is this constrained only to this now?

voznesenskym · 2023-04-20T17:34:56Z

test/dynamo/test_repros.py

@@ -2475,15 +2480,15 @@ def f(x):
        args = (torch.Tensor([3, 4, 5]),)
        cnt = torch._dynamo.testing.CompileCounter()

-        opt_f = torch._dynamo.optimize(cnt, nopython=True)(f)
+        opt_f = torch._dynamo.optimize(cnt, nopython=True, dynamic=True)(f)


Redundant w/ patch.

voznesenskym · 2023-04-20T17:36:15Z

This PR adds a ton of complexity, but I am not sure we want this functionality.

voznesenskym

Needs more discussion.

voznesenskym · 2023-04-20T17:37:16Z

torch/_dynamo/eval_frame.py

@@ -642,6 +644,221 @@ def __bool__(self):
        )


+ConstraintSpec = namedtuple("ConstraintSpec", ["constraint_dim", "min_val", "max_val"])


No constrain spec notions in eval frame.

Hmm the export API takes in Constraint notion as input tho

voznesenskym · 2023-04-20T17:38:05Z

torch/_dynamo/eval_frame.py

-            if "val" in self.current_node.meta:
-                r.node.meta["val"] = self.current_node.meta["val"]
-            return r
+    graph = _AddRuntimeAssertsInInputConstraint(


I don't understand why we want this as a default? Like, this feels like something that should exist entirely outside of dynamo - anyone is free to add any passes or transforms they want, but this is far too assumptive of a single usecase.

We currently don't have agreed final export API yet. So there is no other place to put this pass at the moment. I can move it out of dynamo once we have that API. Are you ok with it if I hide it under a flag?

I'm OK hiding under a flag. In the final export API we want a self-contained graph module that captures user constraints as assertions. So we want that flag to be turned on by default.

@voznesenskym can you specify what you'd like to see changed / what needs discussion?

What I would like to discuss is the direction of passes like this in dynamo. To me, this feels like a laborious and assumptive default, or inclusion. If someone needs this, they can invoke a util we provide.

w/r/t hiding it under a flag - export growing in flags and flag combinations is not a good direction.

The burden of proof of including a new feature is that it is generally useable, and I think promotion of guards to asserts does not cross that bar. Offering a util that does it. and putting the burden on the .export caller to use it seems sufficient.

Thanks for the comment @voznesenskym. Tugsuu and I analyzed the different options and we also feel that it is better to implement as a util that can be invoked after dynamo export call, so that dynamo can maintain the contract of "making assumptions, leave it to the caller as to how to enforce them"

voznesenskym · 2023-04-20T22:46:31Z

I feel like a jerk rejecting this - because this is fundamentally a good change - I think we just need to think a little bit more about (1) where it should live and (2) What it should encompass (If we do this, I would really like to see this for all guards, not just the user directives)

This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of #98878, refer there for historic reviews. cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of #98878, refer there for historic reviews. Pull Request resolved: #100101 Approved by: https://github.com/jansel

This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of #98878, refer there for historic reviews. cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

CaoE · 2023-05-07T13:17:21Z

@tugsbayasgalan May I know why not add cuda kernel for _assert_async.msg ?

github-actions · 2023-07-06T13:40:22Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Add runtime assert

0e3b5e0

[ghstack-poisoned]

pytorch-bot bot added the release notes: fx release notes category label Apr 11, 2023

This was referenced Apr 11, 2023

Add Symbool support in python to C++ translation #98453

Closed

Add unbacked symbool support #98877

Closed

github-actions bot added ciflow/inductor module: dynamo labels Apr 11, 2023

tugsbayasgalan mentioned this pull request Apr 11, 2023

[WIP] Add e2e test using dynamo + AOT functionalization #98879

Closed

Update on "Add runtime assert"

f6dc9b0

cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

tugsbayasgalan requested review from avikchaudhuri, ezyang and gmagogsfm April 12, 2023 06:12

ezyang reviewed Apr 12, 2023

View reviewed changes

torch/_dynamo/eval_frame.py Outdated Show resolved Hide resolved

ezyang reviewed Apr 12, 2023

View reviewed changes

torch/_dynamo/symbolic_convert.py Show resolved Hide resolved

ezyang reviewed Apr 12, 2023

View reviewed changes

torch/_dynamo/eval_frame.py Show resolved Hide resolved

ezyang reviewed Apr 12, 2023

View reviewed changes

torch/_dynamo/eval_frame.py Outdated Show resolved Hide resolved

avikchaudhuri reviewed Apr 13, 2023

View reviewed changes

tugsbayasgalan added 4 commits April 13, 2023 18:02

tugsbayasgalan requested review from avikchaudhuri and ezyang April 14, 2023 18:55

ezyang reviewed Apr 15, 2023

View reviewed changes

tugsbayasgalan added 2 commits April 16, 2023 23:16

tugsbayasgalan added the with-ssh label Apr 17, 2023

pytorchmergebot pushed a commit that referenced this pull request Apr 18, 2023

Add runtime assert

3d338da

ghstack-source-id: 5fc9fd3 Pull Request resolved: #98878

pytorchmergebot pushed a commit that referenced this pull request Apr 18, 2023

Add runtime assert

28df19c

ghstack-source-id: eb9b71d Pull Request resolved: #98878

tugsbayasgalan requested review from ezyang and gmagogsfm April 18, 2023 19:18

avikchaudhuri approved these changes Apr 19, 2023

View reviewed changes

voznesenskym reviewed Apr 20, 2023

View reviewed changes

voznesenskym requested changes Apr 20, 2023

View reviewed changes

voznesenskym reviewed Apr 20, 2023

View reviewed changes

tugsbayasgalan mentioned this pull request Apr 27, 2023

Persist torch.assert in aten graph #100101

Closed

guangy10 mentioned this pull request Apr 28, 2023

Store constraints and example inputs in the graph module as metadata in export #99961

Closed

github-actions bot added the Stale label Jul 6, 2023

github-actions bot closed this Aug 5, 2023

facebook-github-bot deleted the gh/tugsbayasgalan/108/head branch September 5, 2023 14:22

		@@ -643,6 +645,177 @@ def __bool__(self):
		)


		class _AddRuntimeAssertsInInputConstraint(torch.fx.interpreter.Transformer):

		@@ -642,6 +644,221 @@ def __bool__(self):
		)


		ConstraintSpec = namedtuple("ConstraintSpec", ["constraint_dim", "min_val", "max_val"])

Add runtime assert #98878

Add runtime assert #98878

Uh oh!

Conversation

tugsbayasgalan commented Apr 11, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98878

❗ 1 Active SEVs

❌ 2 Failures

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan Apr 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Apr 18, 2023

Uh oh!

tugsbayasgalan commented Apr 18, 2023

Uh oh!

pytorchmergebot commented Apr 18, 2023

Uh oh!

pytorchmergebot commented Apr 18, 2023

Uh oh!

avikchaudhuri left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan commented Apr 11, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 11, 2023 •

edited

Loading

tugsbayasgalan Apr 17, 2023 •

edited

Loading