-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[AutoDiff] Pullbacks w/ loops can segfault #68392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Why is there a double-free happening in this example? We missed the issue in #67944 because a double-free only happens if non-trivial context is captured in a region before or inside the loop. From the autodiff runtime perspective this leads to the context being stored in non-top level subcontexts. Then because of the code we generate for loopy pullbacks a double-free happens. The following example will clarify what I mean. For loopy pullbacks capturing context after the end of for loop we have -
For loopy pullbacks capturing context before or inside the for loop we have -
|
I have been working on fixing the issue, however it seems like I have reached an impasse and need some guidance/suggestions. In order to solve this issue I needed to establish 3 high-level rules. The rules weren't only necessitated by the problem at hand -- in my mind they also seem like the logical/right thing to do.
With the above rules laid out a pullback which initially may have looked like this -
Will now look like this -
Notice that the trampoline block now simply forwards pullback arguments and non-trivial pullback context is received as @guaranteed dispensing with the need to do a "free". |
While this change solves most of the issues, it falls short for a function like this.
The compiler fails to compile this program and fails during SIL verification with the following error -
Why?
The OSSA SIL for the problematic basic block looks like this -
|
So that's where I'm at with my attempt to fix this issue. I need guidance/help with the following questions to proceed.
|
@asl @rxwei @BradLarson The recent segfaults that we have seen in our internal test suite after #67944 was merged have proven to be trickier to fix than expected. I have tried to explain the problem, my approach to solve it and the problems of my proposed solution. Could you guys take a look and provide any suggestions/guidance you might have? P.S. - Apologize for the long write up! |
I think one of the questions here is that why tuple containing enum with pullback payload is considered as a trivial type (note %45 below): // %43 // user: %44
bb1(%43 : $Builtin.RawPointer): // Preds: bb5
%44 = pointer_to_address %43 : $Builtin.RawPointer to [strict] $*(predecessor: _AD__$s2pb1B1xS2f_tF_bb1__Pred__src_0_wrt_0) // user: %45
%45 = load [trivial] %44 : $*(predecessor: _AD__$s2pb1B1xS2f_tF_bb1__Pred__src_0_wrt_0) // user: %46
br bb3(%64 : $Float, %65 : $Float, %45 : $(predecessor: _AD__$s2pb1B1xS2f_tF_bb1__Pred__src_0_wrt_0)) // id: %46
// %51 // users: %72, %57, %63, %55
// %52 // users: %72, %63
// %53 // user: %54
bb3(%51 : $Float, %52 : $Float, %53 : $(predecessor: _AD__$s2pb1B1xS2f_tF_bb1__Pred__src_0_wrt_0)): // Preds: bb1 bb2
%54 = destructure_tuple %53 : $(predecessor: _AD__$s2pb1B1xS2f_tF_bb1__Pred__src_0_wrt_0) // user: %59
debug_value %51 : $Float, let, name "x", argno 1 // id: %55
copy_addr %14 to %10 : $*Float // id: %56
debug_value %51 : $Float, let, name "x", argno 1 // id: %57
copy_addr %14 to %6 : $*Float // id: %58
switch_enum %54 : $_AD__$s2pb1B1xS2f_tF_bb1__Pred__src_0_wrt_0, case #_AD__$s2pb1B1xS2f_tF_bb1__Pred__src_0_wrt_0.bb2!enumelt: bb4, case #_AD__$s2pb1B1xS2f_tF_bb1__Pred__src_0_wrt_0.bb0!enumelt: bb6, forwarding:
@owned // id: %59
...
// %71 // user: %72
bb6(%71 : @owned $(_: @callee_guaranteed (Float) -> Float)): // Preds: bb3
br bb7(%51 : $Float, %52 : $Float, %71 : $(_: @callee_guaranteed (Float) -> Float)) // id: %72 Somewhere here it looks like an ownership gap to me... |
Working on the fix |
Description
Recently we fixed an issue in the Autodiff runtime due to which PBs w/ loops used to leak memory. A fix for the issue was merged in #67944. While validating the fix on our internal test suite we uncovered another memory related bug (a double free) because of which certain pullbacks w/ loops can segfault.
Below is a minimal reproducer for the issue.
Running the above program leads to a segfault.
Expected Behavior
The program should not segfault and pullback of
B
should be runnable multiple times.The text was updated successfully, but these errors were encountered: