Skip to content

Fix ocamlcfg builds #163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 20, 2021
Merged

Fix ocamlcfg builds #163

merged 6 commits into from
Aug 20, 2021

Conversation

gretay-js
Copy link
Contributor

Compilation with -ocamlcfg flag with an assertion violation in Linear_to_cfg. For example:

let rec compare_list a b =
  match a, b with
  | [], [] -> 0
  | [], _ -> -1
  | _, [] -> 1
  | x :: xs, y :: ys ->
    let res = compare x y in
    if res <> 0 then res else compare_list xs ys

This PR fixes the failure.

This PR is currently on top of #159, to check ocamlcfg in the new CI workflow (but should probaby be merged before the changes in #159).

A change in #72 in linearize.ml:L141 breaks a nice invariant that was previously maintained by Linearize and checked by an assertion in Linear_to_cfg translation. The invariant is "no dead code after a tail call". After #72, we get code fragment like this for the for the tail call above example (jmp .L104 followed by dead jmp .L101):

.L103:
        movq    (%rsp), %rax
        movq    8(%rax), %rbx
        movq    8(%rsp), %rax
        movq    8(%rax), %rax
        jmp     .L104
        jmp     .L101
        .align  4
.L102:
        movl    $3, %eax
        addq    $24, %rsp
        ret
        .align  4

To restore the invariant, I added back a call to discard_deadcode. I am not sure it is the right fix. Note that it doesn't revert to the way Itailcalls were handled before #72. I do not understand why in #72 n was replaced with linear env i.Mach.next n in that line. Other cases in linear for an instruction i that does not return still discard the rest of i.next (for example, Ireturn). @lthls @mshinwell do you remember why the change was made?

@lthls
Copy link
Contributor

lthls commented Aug 17, 2021

I do not understand why in #72 n was replaced with linear env i.Mach.next n in that line. Other cases in linear for an instruction i that does not return still discard the rest of i.next (for example, Ireturn). @lthls @mshinwell do you remember why the change was made?

We need to keep all the instructions that can change the trap depth (you can see that discard_dead_code is careful to keep these instructions). Before #72 you could only change the trap depth around Itrywith blocks, but after that it can happen everywhere.

The other point is that most of the instructions that never fallthrough (like Ireturn or Itailcall) are expected to be followed by Iend, so discarding i.next and linearizing it are equivalent. But since this might not be true after CFG manipulations, it may need to be handled more coherently now. At least, an assertion that the next instruction is indeed Iend would catch any possible nasty surprise.

@gretay-js
Copy link
Contributor Author

I do not understand why in #72 n was replaced with linear env i.Mach.next n in that line. Other cases in linear for an instruction i that does not return still discard the rest of i.next (for example, Ireturn). @lthls @mshinwell do you remember why the change was made?

We need to keep all the instructions that can change the trap depth (you can see that discard_dead_code is careful to keep these instructions). Before #72 you could only change the trap depth around Itrywith blocks, but after that it can happen everywhere.

Thanks for the explanation. Just to double-check: we need linear env i.Mach.next n before discard_deadcode for Itailcall.. case, but not for Ireturn case in line 181, right?

@lthls
Copy link
Contributor

lthls commented Aug 17, 2021

Normally you wouldn't need the extra linear call for Itailcall. But if you can produce Mach terms where Itailcall is not followed by Iend (which is apparently the case), then you do need the extra call.
The same applies to Ireturn: if you're guaranteed that the next instruction is Iend, then no need to call linear. Otherwise, you need it.

@gretay-js
Copy link
Contributor Author

Ok, thanks, I'll leave Ireturn in linearize as is for now.

@gretay-js
Copy link
Contributor Author

The CI is green again, including the new cfg workflows!
There are 3 changes:

  1. discard dead code after Itailcalls
  2. don't propagate Ladjust_trap_depth past Llabel
  3. new terminator of cfg blocks: external calls that are known to never return (name of the terminator is open for debate)

The first 2 changes is for cmm traps in #72. The last change is for returns field of Iextcall in #63.

@gretay-js
Copy link
Contributor Author

I didn't make the new terminator into a separate PR because I wanted to see that it passes the new CI workflows.

@gretay-js
Copy link
Contributor Author

Change 2 above reverts a change made as part of #72 in Linearize.adjust_trap_depth. The reason we cannot propagate adjust_trap_depth past Llabel is the label may be reachable via another control flow path.

@xclerc
Copy link
Contributor

xclerc commented Aug 20, 2021

Change 2 above reverts a change made as part of #72 in Linearize.adjust_trap_depth. The reason we cannot propagate adjust_trap_depth past Llabel is the label may be reachable via another control flow path.

Doesn't this suggest the need for a kind of join operation?

@xclerc
Copy link
Contributor

xclerc commented Aug 20, 2021

The third change looks fine, but can you elaborate on why it is needed?

@lthls
Copy link
Contributor

lthls commented Aug 20, 2021

Change 2 above reverts a change made as part of #72 in Linearize.adjust_trap_depth. The reason we cannot propagate adjust_trap_depth past Llabel is the label may be reachable via another control flow path.

I thought a bit more on this, and I'm not convinced this change is needed. I think the Ladjust_trap_depth instructions are not normal instructions. They're "executed" linearly from the beginning of the linear program, regardless of actual control flow (during emission of assembly in emit.mlp, and the corresponding CFI instructions are also executed in the same way by the debugger).

In a way the Lbranch case is more problematic than the Llabel case: the debugger can stop on Lbranch instructions, and with the wrong stack offset the debugger might get confused (although this is slightly compensated by the fact that currently there isn't much that the debugger can do even with the correct stack offset). But I think it's fine to merge Ladjust_trap_depth instructions around Llabel instructions, as the debugger can only stop at the next real instruction after the label.

@gretay-js
Copy link
Contributor Author

Change 2 above reverts a change made as part of #72 in Linearize.adjust_trap_depth. The reason we cannot propagate adjust_trap_depth past Llabel is the label may be reachable via another control flow path.

I thought a bit more on this, and I'm not convinced this change is needed. I think the Ladjust_trap_depth instructions are not normal instructions. They're "executed" linearly from the beginning of the linear program, regardless of actual control flow (during emission of assembly in emit.mlp, and the corresponding CFI instructions are also executed in the same way by the debugger).

@lthls you are right, the generated assembly would be fine when Ladjust_trap_depth is propagated past a label, because the CFI directive can appear before or after the label and apply to the same assembly instruction.
I was trying to maintain an invariant about join points in the intermediate representation: all control flows reach a given label with the same trap depth, as interpreted by the various pseudo-operations. I think it still holds after cmm traps patch, without propagation of CFI directives.
(@xclerc There is a join operation for the trap stacks in linear_to_cfg but I don't think we need it for emitting CFI instructions as @lthls mentioned because the offset table is computed by a linear scan of the instructions, not by following control flow.)

The CFI directive would be correct with and without the propagation. What do you think about leaving the propagation/optimization of CFI directives for a separate PR that would also makes the corresponding changes in linear_to_cfg.

In a way the Lbranch case is more problematic than the Llabel case: the debugger can stop on Lbranch instructions, and with the wrong stack offset the debugger might get confused (although this is slightly compensated by the fact that currently there isn't much that the debugger can do even with the correct stack offset). But I think it's fine to merge Ladjust_trap_depth instructions around Llabel instructions, as the debugger can only stop at the next real instruction after the label.

I've deleted the Lbranch case from adjust_trap_depths.

@gretay-js
Copy link
Contributor Author

The third change looks fine, but can you elaborate on why it is needed?

Iextcall operation with return=false appears at the end of a block, not following by any other block terminator. We could use Never as terminator, but (a) it is use as a special value in linear_to_cfg construction with a bunch of invariant checks based on it, and (b) making it a terminator accurate describes the control flow in this case (similarly to Raise or Return terminators).

@lthls
Copy link
Contributor

lthls commented Aug 20, 2021

I was trying to maintain an invariant about join points in the intermediate representation: all control flows reach a given label with the same trap depth, as interpreted by the various pseudo-operations. I think it still holds after cmm traps patch, without propagation of CFI directives.

Ah, I see the context. If it's not too complicated, you might want to propagate the trap stacks computed on the Mach representation to Linear in some way. The original code by @mshinwell that I based my traps patch on actually had each Linear instruction annotated with its trap stack (or trap depth, I'm not sure), so maybe it would make sense to bring that feature back.

@xclerc
Copy link
Contributor

xclerc commented Aug 20, 2021

Sorry to insist, but #163 (comment) explains that we have
a more accurate information (which is, of course, positive),
but not why it is needed to fix the failures reported by the CI.
In other words, what is failing when we do not clearly mark
the ends of blocks?

@gretay-js
Copy link
Contributor Author

In other words, what is failing when we do not clearly mark the ends of blocks?

An assertion in linear_to_cfg translation fails only in Flambda2 build, because a function ends with Iextcall which is not a terminator. The input is a genuinely new one for Flamdba2 that generates Iextcall with return=false instead of Icheckbound.

Here is an example from `stdlib.ml`:
*** Linearized code
camlStdlib__loop_14_104_code: {stdlib.ml:277,15-132}
  prologue
  L175:
  i3644/29[%rdi] := R/0[%rax]
  s3646/31[%rax] := val[my_closure3645/30[%rbx] + 24]
  V/32[%rsi] := val[my_closure3645/30[%rbx] + 16]
  if i3644/29[%rdi] <s V/32[%rsi] goto L174
  I/43[%rbx] := "camlStdlib__immstring338"
  tailcall "camlStdlib__^_6_96_code" R/0[%rax] R/1[%rbx] {stdlib.ml:278,19-26}
  L174:
  I/33[%rsi] := int[s3646/31[%rax] + -8] {stdlib.ml:279,10-24}
  I/34[%rsi] := I/34[%rsi] >>u 10 {stdlib.ml:279,10-24}
  tmp/35[%rsi] := I/34[%rsi]  * 8 + -1 {stdlib.ml:279,10-24}
  I/36[%rdx] := unsigned int8[s3646/31[%rax] + tmp/35[%rsi]] {stdlib.ml:279,10-24}
  I/37[%rsi] := I/37[%rsi] - I/36[%rdx] {stdlib.ml:279,10-24}
  I/38[%rdx] := i3644/29[%rdi]
  I/38[%rdx] := I/38[%rdx] >>s 1 {stdlib.ml:279,10-24}
  if I/38[%rdx] >=u I/37[%rsi] goto L171
  I/40[%rsi] := unsigned int8[s3646/31[%rax] + I/39[%rdx]] {stdlib.ml:279,10-24}
  *match*3653/41[%rsi] := I/40[%rsi]  * 2 + 1 {stdlib.ml:279,10-24}
  if *match*3653/41[%rsi] <s 97 goto L173
  if *match*3653/41[%rsi] <s 117 goto L172
  reload retaddr
  return R/0[%rax]
  L173:
  if *match*3653/41[%rsi] ==s 91 goto L172
  reload retaddr
  return R/0[%rax]
  L172:
  I/42[%rax] := i3644/29[%rdi]
  I/42[%rax] := I/42[%rax] + 2 {stdlib.ml:280,31-38}
  tailcall "camlStdlib__loop_14_104_code" R/0[%rax] R/1[%rbx] {stdlib.ml:280,26-38}
  L171:
  {}
  extcall "caml_ml_array_bound_error"  (noalloc) {stdlib.ml:279,10-24}

The error message is

Fatal error: exception Misc.Fatal_error
Raised at Misc.fatal_errorf.(fun) in file "ocaml/utils/misc.ml", line 22, characters 14-31
Called from Linear_to_cfg.run in file "backend/cfg/linear_to_cfg.ml", line 649, characters 2-64
Called from Asmgen.compile_fundecl.(fun) in file "backend/asmgen.ml", line 146, characters 18-65
Called from Asmgen.(++) in file "backend/asmgen.ml" (inlined), line 119, characters 15-18
Called from Asmgen.compile_fundecl in file "backend/asmgen.ml", line 124, characters 2-1023
Called from Stdlib__List.iter in file "list.ml", line 114, characters 12-15
Called from Misc.try_finally in file "ocaml/utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "ocaml/utils/misc.ml", line 45, characters 10-56
Called from Asmgen.(++) in file "backend/asmgen.ml" (inlined), line 119, characters 15-18
Called from Asmgen.end_gen_implementation0 in file "backend/asmgen.ml", line 208, characters 2-90
Called from Asmgen.compile_unit.(fun) in file "backend/asmgen.ml", line 189, characters 12-18
Called from Misc.try_finally in file "ocaml/utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "ocaml/utils/misc.ml", line 45, characters 10-56
Called from Asmgen.compile_unit.(fun) in file "backend/asmgen.ml", line 187, characters 7-307
Called from Misc.try_finally in file "ocaml/utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "ocaml/utils/misc.ml", line 45, characters 10-56
Called from Optcompile.flambda2.(fun) in file "driver/optcompile.ml", line 54, characters 6-388
Called from Optcompile.flambda_and_flambda2.(fun) in file "driver/optcompile.ml", line 46, characters 6-98
Called from Misc.try_finally in file "ocaml/utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "ocaml/utils/misc.ml", line 45, characters 10-56
Called from Compile_common.implementation.(fun) in file "ocaml/driver/compile_common.ml", line 120, characters 71-113
Called from Misc.try_finally in file "ocaml/utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "ocaml/utils/misc.ml", line 45, characters 10-56
Called from Misc.try_finally in file "ocaml/utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "ocaml/utils/misc.ml", line 45, characters 10-56
Called from Misc.try_finally in file "ocaml/utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "ocaml/utils/misc.ml", line 45, characters 10-56
Called from Compenv.process_action.impl in file "ocaml/driver/compenv.ml", line 613, characters 4-69
Called from Stdlib__List.iter in file "list.ml", line 114, characters 12-15
Called from Compenv.process_deferred_actions in file "ocaml/driver/compenv.ml", line 697, characters 2-61
Called from Optmaindriver.main in file "driver/optmaindriver.ml", line 55, characters 6-198
Re-raised at Location.report_exception.loop in file "ocaml/parsing/location.ml", line 926, characters 14-25
Called from Optmaindriver.main in file "driver/optmaindriver.ml", line 138, characters 4-35
Called from Flambda_backend_main_native in file "flambda_backend_main_native.ml", line 8, characters 7-124
make: *** [stage2] Error 1

Copy link
Contributor

@xclerc xclerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving change 3, and would prefer to let @lthls approve 1-2.

@gretay-js
Copy link
Contributor Author

Ah, I see the context. If it's not too complicated, you might want to propagate the trap stacks computed on the Mach representation to Linear in some way. The original code by @mshinwell that I based my traps patch on actually had each Linear instruction annotated with its trap stack (or trap depth, I'm not sure), so maybe it would make sense to bring that feature back.

Yes, propagating Cmm.trap_action to Linear would give us a much better way to construct Cfgs than trying to reverse engineer the trap_stacks (trap_depth is easy to reconstruct linearly like the unwinder does). It's a big change and I am not sure we want to do it because @xclerc is working on a translation from Mach directly to Cfg.

Copy link
Contributor

@lthls lthls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change 2 is fine.
Change 1 is correct, but:

  • The change from discard_dead_code n to linear env i.Mach.next n was likely a mistake during rebase. In my upstream traps branch (which is based on an old trunk commit), the first case is used when spacetime is not in use, and the second one when it is. The traps branch only adapted the existing code to pass the extra env parameter. I assume that when rebasing it on flambda-backend, which does not have spacetime, the wrong branch was kept. (Note that I don't even know why there was a different path with spacetime; as far as I could tell even with spacetime a tailcall instruction was always followed by Iend, at least after running Selection).
  • I still don't see how we could end up with the problematic case. I'll try reproducing the error, as I suspect it likely hides another problem.

The changes proposed in this PR do not introduce any bugs that I could find though, so I'm fine with merging as it is.

@gretay-js
Copy link
Contributor Author

Thank you very much!
@lthls what about the last commit c49c98c0b510f641e092f08906dc44942f08c6cf that addresses the comment above regarding propagation of Ladjust_trap_stack past Lbranch instructions.

@lthls
Copy link
Contributor

lthls commented Aug 20, 2021

I was including it in my review of Change 2. I should probably have been more specific.

@mshinwell
Copy link
Collaborator

I think this discussion shows that these changes should have been separate PRs...

Blocks that end with Iextcall that does not return
can be generated by Flambda2 (e.g., bounds check)
and result in a block without terminator.
@gretay-js gretay-js force-pushed the fix_ocamlcfg_builds branch from 31d4904 to c130fe0 Compare August 20, 2021 14:40
@gretay-js gretay-js merged commit 8dbe9fd into oxcaml:main Aug 20, 2021
@mshinwell
Copy link
Collaborator

@gretay-js Please make a note somewhere to ensure we don't forget what Vincent is looking at, namely that there might be another hidden problem relating to Change 1.

@lthls
Copy link
Contributor

lthls commented Aug 20, 2021

I've actually finished my investigation. When compiling an if-then-else, unless it falls into one of the special cases Linearize adds a Lbranch instruction just after the then branch. This instruction is meant to be deleted by discard_dead_code if the then branch doesn't fall through.
The only mystery remaining is why dead code elimination after tail calls was turned off with Spacetime... as far as I can tell it shouldn't have made any difference.

basimkhajwal pushed a commit to basimkhajwal/flambda-backend that referenced this pull request Sep 10, 2021
* Discard dead code after Itailcalls and Iextcalls that don't return

Revert a line from flambda-backend#72

* Don't propagate Ladjust_trap_depth past Llabel

* Add terminator "Throw" for invoking a continuation

Blocks that end with Iextcall that does not return
can be generated by Flambda2 (e.g., bounds check)
and result in a block without terminator.

* Rename Cfg.terminator.Throw to Call_no_return

* Don't propagate Ladjust_trap_depth past Lbranch

* Revert to pre-Cmm-traps version of discard_dead_code after Itailcall
poechsel pushed a commit that referenced this pull request Sep 20, 2021
* Discard dead code after Itailcalls and Iextcalls that don't return

Revert a line from flambda-backend#72

* Don't propagate Ladjust_trap_depth past Llabel

* Add terminator "Throw" for invoking a continuation

Blocks that end with Iextcall that does not return
can be generated by Flambda2 (e.g., bounds check)
and result in a block without terminator.

* Rename Cfg.terminator.Throw to Call_no_return

* Don't propagate Ladjust_trap_depth past Lbranch

* Revert to pre-Cmm-traps version of discard_dead_code after Itailcall
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants