Skip to content

Ban Prim_poly from Lambda #2189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

mshinwell
Copy link
Collaborator

In #2180 there were really two fixes:

  • changes to Call_kind in Flambda 2 to deal with the accidental omission of the region variable for C calls during a recent improvement/refactoring PR (Better region handling for functions #1871).
  • a change in Lambda.primitive_may_allocate to correctly handle the case when prim_alloc is false but in fact the external function locally allocates. This appears to have been a pre-existing bug. (It is permitted to write such code iff you are certain the compiler used will always be configured with stack allocation.)

Unfortunately the second change has revealed another problem. Some primitives, for example one declared like this:

external float_of_bits : (int64[@local_opt]) -> (float[@local_opt])
  = "caml_int64_float_of_bits" "caml_int64_float_of_bits_unboxed"
  [@@unboxed] [@@noalloc]

end up appearing in Lambda as Prim_poly (the comment in primitive.mli is kind of misleading here). However this isn't ok from the point of view of Lambda.primitive_may_allocate and in particular its new helper function Lambda.alloc_mode_of_primitive_description (which is also used by Flambda 2). For example, the above primitive might get eta-expanded by the code in Translprim, in conjunction with a judgement that effectively turns Prim_poly (plus some extra information, the poly_mode) into Alloc_heap. This is then used to avoid inserting a region, correctly, but is not propagated further (there is nowhere to put it on the Lambda terms). We thereby end up in a situation where the Lambda code says Prim_poly, so alloc_mode_of_primitive_description says "might allocate locally" -- but the Translprim code had not inserted a region. This then causes a fatal error in Flambda 2, because if a external call might allocate locally, its applications will contain region variables. These will be unbound if the frontend failed to insert the appropriate regions (or marked the enclosing function as allowing local allocations to escape, in which case the implicit my_region function parameter would be used).

The idea of this PR is to ban Prim_poly from appearing in Lambda terms at all. It does this by making the argument of the Pccall constructor private, which necessitates only minor changes, and insisting that all constructions of it are done by supplying the return mode. This then rewrites Prim_poly into either Prim_global or Prim_local as appropriate, avoiding the above problem. It seems better than an alternative solution I considered, namely causing regions to be inserted for all Prim_poly primitive calls.

@mshinwell mshinwell added the bug Something isn't working label Dec 20, 2023
@mshinwell
Copy link
Collaborator Author

@goldfirere is looking at this one.

@mshinwell mshinwell requested a review from goldfirere December 20, 2023 20:05
@goldfirere
Copy link
Collaborator

I think this fixes the bug (thanks for the test @ncik-roberts), and looks reasonable. I made a number of (what I believe are) improvements, but my commits need further review.

@lpw25
Copy link
Collaborator

lpw25 commented Dec 21, 2023

I think this probably isn't the best approach. The primitive type is intended to describe the information from an external declaration. The problem identified here is that the information from that declaration is not sufficient to compile calls to such a primitive -- you also need the return mode. The solution then is to add the other required information to calls to primitives (i.e. Pccall) in addition to the existing information coming from the primitive declaration. Rather than what is done here, which is to edit the declaration to a different one that will be compiled the right way.

In other words, leave the Primitive.t alone and add another field to Pccall. Then you don't need any invariants or private types and the meaning of a Primitive.t is the same in all contexts.

@lpw25
Copy link
Collaborator

lpw25 commented Dec 21, 2023

Actually, thinking about it more, I don't think this statement is true for C calls:

the information from that declaration is not sufficient to compile calls to such a primitive

which means there is no problem with the existing Pccall and the bug fix is just to treat Prim_poly as heap allocating not stack allocating. The reason for this is that any C function that has a Prim_poly type cannot safely allocate on the stack because it does not know when that would be safe.

So, I think you just need a one line fix in alloc_mode_of_primitive_description

@mshinwell
Copy link
Collaborator Author

Superceded by #2190

@mshinwell mshinwell closed this Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants