Skip to content

Ub vs tbd #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 17, 2019
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ support unwinding that crosses FFI boundaries.
- [acfoltzer (Adam)](https://github.com/acfoltzer)
- [batmanaod (Kyle)](https://github.com/batmanaod)
- Rust lang team contacts:
- [nikmoatsakis (Niko)](https://github.com/nikmoatsakis)
- [nikomatsakis (Niko)](https://github.com/nikmoatsakis)
- [joshtriplett (Josh)](https://github.com/joshtriplett)
- [Our chat room][zulip-room]
- [Our charter](charter.md)
Expand Down
16 changes: 10 additions & 6 deletions faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,22 @@

### How does cross-language unwinding differ from cross-language `setjmp`/`longjmp`?

- `setjmp`/`longjmp` across Rust frames is currently guaranteed to have
well defined behavior as long as those frames do not contain destructors.
When crossing frames that do contain deestructors, the behavior of `longjmp`
is undefined; conversely, a primary goal of defining cross-language unwinding
behavior is to support crossing frames with destructors.
- `setjmp`/`longjmp` across Rust frames is currently intended to have
well defined behavior as long as those frames do not contain
destructors, although we don't have any documentation to that
effect.
- When crossing frames that do contain destructors, the behavior of
`longjmp` is [Undefined Behavior]; conversely, a primary goal of
defining cross-language unwinding behavior is to support crossing
frames with destructors.
- Rust does not have a concept of `Copy` for stack-frames, which would permit
the compiler to check that `longjmp` may safely traverse those frames. Such a
language feature [may be added in the future][centril-effects], but although
it would be useful for `longjmp`, it would not be useful for unwinding.
- It should never be assumed that `drop` will be called for objects in
intermediate frames traversed by a `longjmp`, but this may occur on certain
platforms. Rust provides no guarantee either way (which is why this is
considered undefined behavior). Cross-language unwind, however, will be
considered [Undefined Behavior]). Cross-language unwind, however, will be
defined such that `Drop` objects whose frames are unwound are guaranteed
`drop`ed.
- Unwinding across Rust frames when `panic = abort` is currently undefined
Expand All @@ -49,3 +52,4 @@
roadmap][roadmap-panic-abort] for details.

[roadmap-panic-abort]: roadmap/c-unwind-abi.md#panic--abort
[Undefined Behavior]: /spec-terminology.md#UB
48 changes: 23 additions & 25 deletions roadmap/c-unwind-abi.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Summary

Functions that use the plain `"C"` ABI are **not permitted to unwind**.
Doing so is "undefined behavior", which means that the compiler is free
Doing so is [Undefined Behavior], which means that the compiler is free
to assume it cannot happen. The results are therefore unpredictable.
It is always a bug.

Expand All @@ -16,30 +16,24 @@ compilers use.

**Warning:** We are still in the process of fully specifying when and
how "unwinding interop" works between native functions and Rust. This
roadmap item alone **only** adds the ABI -- it does not define what
should happen if unwinding actually occurs, and hence **even if you
use this ABI**, unwinding across a `"C unwind"` ABI barrier is **still
undefined behavior**. However, it is our **intent** to define that
behavior in the future (and this is what the other roadmap items
are all about). Additionally, we consider unwinding across an `extern "C"`
boundary to be [LLVM-undefined behavior][LLVM-UB], whereas for
`extern "C unwind"` it is not; i.e., `rustc` is not permitted to generate
code that would _intentionally_ be undefined at the LLVM level in this case.

In practical terms, the effect of using `"C unwind"` right now is that
we will tell LLVM that the function "may unwind". We will also not add
intentional shims that abort the program if unwinding occurs. (With a
`"C"` ABI, we sometimes do both of those things.)

Further, in practical terms, you can use the `"C unwind"` ABI today to
enable a Rust panic to propagate across native frames if you like --
but your program is relying on unspecified and undefined behavior
which **likely will change** across Rust stable releases. Therefore,
you will have to keep up. Effectively, you're on a nightly release,
even though you're using only stable syntax. (The same is true for
many aspects of unsafe code.)

[LLVM-UB]: ../spec-terminology.md#LLVM-undefined-behavior-or-LLVM-UB
roadmap item alone **only** adds the ABI string to Rust. All details
of how that ABI strings are implemented for various targets are still
considered [To Be Defined]. As a result, programs that rely on those
details are not truly stable; you may find that the details change as
Rust evolves. Eventually, though, we do intend to define many (but not
all) aspects of how Rust panics and native unwinding interoperate.
Moreover, we guarantee that unwinding will **not** result in
[Undefined Behavior] and in particular not [LLVM-UB].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of an idea for the future, but maybe we should use a term like "optimizer-UB" or "intermediate-representation-UB" instead of LLVM-UB. The core idea of the term, I think, is that we are avoiding the form of UB that lets compilers make "adversarial" optimizations (in particular, in our case, eliminating landing pads).


## The goal

In practice, there are a number of crates that rely on unwinding
across a "C" ABI today -- even though that is [Undefined
Behavior]. Creating the "C unwind" ABI means that those crates can
migrate to this ABI and better express their intent. It does not (in
and of itself) make the behavior of those crates defined -- they are
still relying on [To Be Defined] details. But it *does* mean that they
will no longer trigger [Undefined Behavior].

## Panic = abort

Expand All @@ -50,3 +44,7 @@ generated, which would make the behavior of `"C unwind"`
require landing-pad generation for any function calling a `"C unwind"`
function, even when compiling with `panic = abort`. These landing pads would of
course `abort` the application rather than propagate the unwind.

[Undefined Behavior]: /spec-terminology.md#UB
[LLVM-UB]: /spec-terminology.md#LLVM-UB
[To Be Defined]: /spec-terminology.md#TBD
39 changes: 23 additions & 16 deletions roadmap/propagate-native-exception-through-Rust-frame.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,19 @@ In this case, the stack would look like this:
+--- native function throws -+
```

## Current status

Currently, all details of how a "native exception" interacts with Rust
frames are [To Be Defined] behavior. The goal of these roadmap items
is to define some aspects of this behavior. Note that these
definitions will be on a "per target" basis, as the details vary with
the ABI.

## Possible properties of the Rust frames

Even within a given target, it is worth separating out some different
cases:

* The Rust frame has **no in-scope destructors**.
* We do not currently have a language or tooling mechanism for guaranteeing
that Rust function is guaranteed to have no destructors. There is an
Expand All @@ -36,14 +47,6 @@ In this case, the stack would look like this:
* The Rust frame **has destructors** it would like to execute.
* We need to define how they interact with a native exception.

Currently, having a native exception "unwind" a Rust frame is
**undefined behavior** in both of the above cases. However, we plan to
specify the first case (no destructors) before we think about the more
complex case (contains destructors). We will also have to specify this
on a per-target basis, as the details will vary depending on what
exception mechanism is in use, and what other non-exception features
(such as `longjmp`) may use the same mechanism.

[centril-effects]: https://github.com/Centril/rfc-effects/issues/11
[FAQ-longjmp]: faq.md#how-does-cross-language-unwinding-differ-from-cross-language-setjmplongjmp

Expand All @@ -55,16 +58,20 @@ Rust frames:
* Native frames - the native unwind re-enters native frames
* The native code runtime should be able to treat this as a normal exception
at this point, as though Rust had never been involved.
* Thread boundary - the exception propagates all the way to a Rust frame that
was invoked from another thread
* App entry point - the unwind is not caught and does not cross a thread
boundary
* Thread boundary - the exception propagates all the way to a Rust frame that
was invoked from another thread

Again, these are all currenntly undefined.
The first two possibilities (re-entering native frames or reaching the
app entry point) are [To Be Defined] behavior. Their behavior should
be a natural consequence of how the `"C unwind"` ABI is defined.

We **do** expect to make the first and last possibilities (re-entering native
frames or reaching the app entry point) well-defined; their behavior should be
a natural consequence of how the `"C unwind"` ABI is defined.
The final possibility, wherein a native unwind propagates through Rust
frames all the way to a thread boundary, is [Unspecified Behavior],
meaning that we **do not** intend to define it.

We **do not** currently have plans to define the middle case, wherein a native
unwind propagates through Rust frames all the way to a thread boundary.
[Undefined Behavior]: /spec-terminology.md#UB
[LLVM-UB]: /spec-terminology.md#LLVM-UB
[To Be Defined]: /spec-terminology.md#TBD
[Undefined Behavior]: /spec-terminology.md#unspecified
87 changes: 66 additions & 21 deletions spec-terminology.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,75 @@
# Terminology about specifications

Language and platform specifications have several different terms used to
describe how well-defined a language feature is, i.e., how well constrained the
runtime behavior is.
Language and platform specifications have several different terms used
to describe how well-defined a language feature is, i.e., how well
constrained the runtime behavior is. In cases where our terminology
overlaps with that from other communities, we try to remain generally
compatible.

The ISO C and C++ standards distinguish several degrees of specification by
assigning precise definitions to the following terms:
<a name="UB"></a>

* Well-defined behavior
* Implementation-defined behavior
* Unspecified behavior
* Undefined behavior
## Undefined Behavior

Of these, only "undefined behavior" is used consistently within the Rust
project; there are _no_ guarantees about the runtime behavior of such code.
As is typical within the Rust community we use the phrase **undefined
behavior** to refer to illegal program actions that can result in
arbitrary results. In short, "undefined behavior" is always a bug and
never something you should do. See the [Rust
reference](https://doc.rust-lang.org/reference/behavior-considered-undefined.html)
for more details.

It is currently out of scope for this project to define the other terms in that
list or their relationship to the corresponding terms in other languages.
However, this project does distinguish a particular _category_ of undefined
behavior:
Our usage of the term is generally the same as the [standard
usage](https://en.wikipedia.org/wiki/Undefined_behavior) from other
languages.

#### LLVM-undefined behavior (or LLVM-UB)
<a name="LLVM-UB"></a>

Cases of known LLVM-UB are a specific subset of undefined behavior in general.
Rust code with LLVM-UB will cause `rustc` to generate LLVM IR exhibiting
undefined behavior.
### LLVM-undefined behavior (LLVM-UB)

This is distinct from the general case of Rust undefined behavior, in which
it is unknown whether `rustc` will generate well-behaved LLVM IR.
As a special case of undefined behavior, we use the phrase **LLVM
undefined behavior** to indicate things that are considered undefined
behavior by LLVM itself (as opposed to by the Rust compiler). There is
no theoretical difference between UB and LLVM-UB -- both can cause
arbitrary things to happen in your code. However, as a practical
measure, LLVM-UB is worth distinguishing because it is much more
*likely to* in practice.

<a name="unspecified"></a>

## Unspecified behavior

We use the term "unspecified behavior" to refer to behavior that may
vary across Rust releases, depending on what options are given to the
compiler, or even -- in extreme cases -- across executions of the Rust
compiler. However, unlike undefined behavior, the resulting execution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth noting variation across platforms, too.

is not completely undefined, and it must typically fall within some
range of possibilities. Often, we will not specify precisely *how*
something is implemented, but rather the patterns that must work.

An example of "unspecified behavior" is the [layout for structs with
no declared `#[repr]` attribute][ucg-struct]. This layout can and
does change across Rust releases -- but of course within a given
compilation, a struct must have *some* layout. Moreover, we guarantee
that programs can (for example) use `sizeof` to determine the size of
that layout, or access fields using Rust syntax like `foo.bar`. This
requires the layout to be communicated in some fashion but doesn't
specify how that is done.

[ucg-struct]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/layout/structs-and-tuples.md

Our usage of the term is generally the same as the [standard
usage](https://en.wikipedia.org/wiki/Unspecified_behavior) from other
languages.

<a name="TBD"></a>

## To Be Defined Behavior (TBD)

We refer to some behavior as **to be defined** to indicate that --
while it is currently unspecified -- we *intend* to define that
behavior at some point as part of this project group (though plans can
change). This helps to define the scope of the group, but it also
indicates behavior that you would be able to rely upon in the future
in stable code. Note that TBD behavior is **still unspecified** until
a formal decision is made, though, so if you rely on it today, your
code cannot be considered stable Rust (even if it compiles on the
stable compiler).