-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Add runtime assert #98878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add runtime assert #98878
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98878
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 FailuresAs of commit 5e62d4c: NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base 57e1a50:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
@@ -245,12 +245,25 @@ def inner(self: "InstructionTranslatorBase", inst: Instruction): | |||
self.jump(inst) | |||
return | |||
|
|||
# Manually insert torch._assert instead of python assert and jump over | |||
# Manually insert torch._assert_async instead of python assert and jump over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One interesting question is that, even if we capture the assert, should we also transform it async? If the original assert was an assert XXX
, the user might be expecting a DtoH sync to happen, and maybe we should respect that, instead of silently turning it into async asserts that can trash the cuda context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that seems like a problem, but we don't have an assert
that is blocking, maybe we should add one instead of using assert_async
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I am not very familiar with CUDA so it will take me sometime to get it right :). I can do it in the follow up diff.
torch/_dynamo/eval_frame.py
Outdated
super().__init__(m) | ||
self.count = 0 | ||
self.constraints_id_to_constraint = defaultdict(list) | ||
if constraints is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this captured from above? If so then why bother making self.constraints_id_to_constraint, maybe compute it above too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually left it here because this map is only needed within this pass.
torch/_dynamo/eval_frame.py
Outdated
min_int_val = _convert_to_int(constraint_range.vr.lower) | ||
max_int_val = _convert_to_int(constraint_range.vr.upper) | ||
|
||
if min_int_val is None and max_int_val is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these can be None? You should be able to trust upstream and assert in _convert_to_int I think...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be None if the expression is not something we can convert to integer (complicated guard expression).
torch/_dynamo/eval_frame.py
Outdated
dim = self.tracer.create_proxy('call_function', torch.ops.aten.sym_size, (arg, constraint.dim), {}) | ||
assert_msg = f"Input #{self.count}'s dimension #{constraint.dim} size is outside of supported dynamic range" | ||
|
||
if min_int_val: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the min_val is somehow None, we want skip asserting.
torch/_dynamo/eval_frame.py
Outdated
# TODO ignore expressions for now | ||
return None | ||
|
||
for constraint in constraints: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like a good idea to do each of these transformations by hand...rather, can't we generate Python code and trace it in? In the future there may be more kinds of constraints...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I can explore this option in the follow up diff.
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
@@ -643,6 +645,177 @@ def __bool__(self): | |||
) | |||
|
|||
|
|||
class _AddRuntimeAssertsInInputConstraint(torch.fx.interpreter.Transformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmph, I still am failing to get the big picture here.
Let's suppose that I trace and export a model that does if x.size(3) > 10 do something else do something else. Furthermore, let's suppose I didn't pass in any explicit constraints (which I assume means "try to make everything as dynamic as possible"). Then you will end up with a model which requires x.size(3) > 10 to be sound. But you aren't going to have a constraint for it (because it's dynamic by default). Do you still want to put the tests into the graph?
More generally, it seems to me that you are trying to solve the problem of "what to do with guards" by directly putting them in the graph. But does this make sense? Why is this not tested out of band?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that Avik's surfacing of guards will output those more fine-grained constraints directly to user (e.g x.size(3) > 10). And we suggest user to use those constraints directly on the input to the export call. And then, my pass will translate the new constraints into runtime assertions.
One thing we can explore is, we can put the constraints derived from guards directly into graph after we finish exporting (not asking user to pass in the constraint).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, my point is, it's inconsistent to only do directly given constraints, and not also do implicit constraints from guards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. It should also convert the implicit constraints as well. Can do it in the follow up diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth doing now, and is more important, and the primary reason for my requested changes.
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
Successfully rebased |
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints into runtime assertions utilizing `aten._assert_async.msg` per dimension. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Convert our inline constraints into runtime asserts cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
@pytorchbot rebase |
@pytorchbot successfully started a rebase job. Check the current status here |
This PR introduces a new operator called `aten._assert_async.msg`, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that `make_fx` also knows how to handle assertions. Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request. In addition, we also make input constraints and intermediate constraints into runtime assertions utilizing `aten._assert_async.msg`. Future work: 1. Assess whether we still need to introduce a fake control dependency 2. Explore adding non-async version of assert. cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
Successfully rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for a start. Follow-up work (among other things pointed out in the implementation itself) is to broaden the scope, by adding assertions for other things, like specializations, equalities, etc.
@@ -405,6 +405,10 @@ void _assert_async_cpu(const Tensor& self) { | |||
TORCH_CHECK(native::is_nonzero(self), "Expected Tensor with single nonzero value, but got zero"); | |||
} | |||
|
|||
void _assert_async_msg_cpu(const Tensor& self, c10::string_view assert_msg) { | |||
TORCH_CHECK(native::is_nonzero(self), assert_msg != "" ? assert_msg : "Assertion is failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use AT_ASSERT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is sus because you might violate semantics preserving notions here.
You also don't ever pass the tensor to the message. Which is fine, but kinda lame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry can you elaborate more what do you mean?
@torch._dynamo.config.patch( | ||
capture_scalar_outputs=True, | ||
capture_dynamic_output_shape_ops=True, | ||
dynamic_shapes=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this constrained only to this now?
@@ -2475,15 +2480,15 @@ def f(x): | |||
args = (torch.Tensor([3, 4, 5]),) | |||
cnt = torch._dynamo.testing.CompileCounter() | |||
|
|||
opt_f = torch._dynamo.optimize(cnt, nopython=True)(f) | |||
opt_f = torch._dynamo.optimize(cnt, nopython=True, dynamic=True)(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant w/ patch.
This PR adds a ton of complexity, but I am not sure we want this functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs more discussion.
@@ -642,6 +644,221 @@ def __bool__(self): | |||
) | |||
|
|||
|
|||
ConstraintSpec = namedtuple("ConstraintSpec", ["constraint_dim", "min_val", "max_val"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No constrain spec notions in eval frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm the export API takes in Constraint notion as input tho
if "val" in self.current_node.meta: | ||
r.node.meta["val"] = self.current_node.meta["val"] | ||
return r | ||
graph = _AddRuntimeAssertsInInputConstraint( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we want this as a default? Like, this feels like something that should exist entirely outside of dynamo - anyone is free to add any passes or transforms they want, but this is far too assumptive of a single usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently don't have agreed final export API yet. So there is no other place to put this pass at the moment. I can move it out of dynamo once we have that API. Are you ok with it if I hide it under a flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK hiding under a flag. In the final export API we want a self-contained graph module that captures user constraints as assertions. So we want that flag to be turned on by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@voznesenskym can you specify what you'd like to see changed / what needs discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I would like to discuss is the direction of passes like this in dynamo. To me, this feels like a laborious and assumptive default, or inclusion. If someone needs this, they can invoke a util we provide.
w/r/t hiding it under a flag - export growing in flags and flag combinations is not a good direction.
The burden of proof of including a new feature is that it is generally useable, and I think promotion of guards to asserts does not cross that bar. Offering a util that does it. and putting the burden on the .export caller to use it seems sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment @voznesenskym. Tugsuu and I analyzed the different options and we also feel that it is better to implement as a util that can be invoked after dynamo export call, so that dynamo can maintain the contract of "making assumptions, leave it to the caller as to how to enforce them"
I feel like a jerk rejecting this - because this is fundamentally a good change - I think we just need to think a little bit more about (1) where it should live and (2) What it should encompass (If we do this, I would really like to see this for all guards, not just the user directives) |
This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of #98878, refer there for historic reviews. cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of #98878, refer there for historic reviews. Pull Request resolved: #100101 Approved by: https://github.com/jansel
This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of #98878, refer there for historic reviews. cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of #98878, refer there for historic reviews. cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
@tugsbayasgalan May I know why not add cuda kernel for _assert_async.msg ? |
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
This PR introduces a new operator called
aten._assert_async.msg
, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so thatmake_fx
also knows how to handle assertions.Originally, we planned to create a dependency chain to introduce a fake control dependency, but this new implementation seems to work with AOTAutograd and friends, which will be demonstrated in the next pull request.
In addition, we also make input constraints and intermediate constraints into runtime assertions utilizing
aten._assert_async.msg
.Future work:
cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @soumith @desertfire