Skip to content

How to discover what kind of bad function pointer casts occur? #16126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
juj opened this issue Jan 27, 2022 · 9 comments
Open

How to discover what kind of bad function pointer casts occur? #16126

juj opened this issue Jan 27, 2022 · 9 comments

Comments

@juj
Copy link
Collaborator

juj commented Jan 27, 2022

Consider the following C++ code that is undefined behavior:

a.cpp
typedef int (*foo)(void);
typedef void (*bar)(void);

void func()
{
}

int main()
{
	foo ptr = (foo)&func;
	ptr(); // Undefined behavior in both C & C++, relaxed in x86 native compilers
}

Building this code natively with GCC or Clang succeeds and runs fine. However building it with Emscripten with

em++ a.cpp -o a.js

and running it in node, crashes with

:\code\emsdk\emscripten\main\a.js:147
      throw ex;
      ^

RuntimeError: function signature mismatch
    at <anonymous>:wasm-function[2]:0x1a2
    at main (<anonymous>:wasm-function[3]:0x1c1)
    at E:\code\emsdk\emscripten\main\a.js:1564:22
    at callMain (E:\code\emsdk\emscripten\main\a.js:2188:15)
    at doRun (E:\code\emsdk\emscripten\main\a.js:2245:23)
    at run (E:\code\emsdk\emscripten\main\a.js:2260:5)
    at runCaller (E:\code\emsdk\emscripten\main\a.js:2166:19)
    at removeRunDependency (E:\code\emsdk\emscripten\main\a.js:1473:7)
    at receiveInstance (E:\code\emsdk\emscripten\main\a.js:1653:5)
    at receiveInstantiationResult (E:\code\emsdk\emscripten\main\a.js:1670:5)

which is all as expected, given that Wasm is strict with function pointer signatures, and cannot relax the calling conventions.

Building with -s EMULATE_FUNCTION_POINTER_CASTS=1 linker flag makes the code also work on Emscripten. However, imagine a user is dealing with a large codebase, and they would want to avoid using the linker flag -s EMULATE_FUNCTION_POINTER_CASTS=1 due to its runtime overhead.

Unfortunately there is no machinery that would help the user to diagnose where these bad casts occur, since -s EMULATE_FUNCTION_POINTER_CASTS=1 silently passes such code.

When building without the flag, one gets a crash callstack to the call site where the first bad function pointer cast call is attempted, but this has two issues:

  1. the crash does not specify what the mismatch was: what was the signature of the call that was attempted, and what was the actual signature of the function pointer that was being called?
  2. if there are multiple such crashes in the application, the program aborts on the first one, leaving the developer to do a tedious fix first crash-rebuild-fix next crash-rebuild-fix next crash... cycle.

It would be nice to have a mode -s EMULATE_FUNCTION_POINTER_CASTS=2 or similar, which would for each mismatching function pointer signature call, log the callstack and the signatures in question (via e.g. a warnOnce style logger), but still keep on executing the program. Would this be doable?

@kleisauke
Copy link
Collaborator

kleisauke commented Jan 27, 2022

For C code, you can detect such issues at compile-time by enabling the -Wbad-function-cast -Wcast-function-type warnings. For example:

$ emcc -Wbad-function-cast -Wcast-function-type a.c -o a.js
a.c:10:12: warning: cast from 'void (*)()' to 'foo' (aka 'int (*)(void)') converts to incompatible function type [-Wcast-function-type]
        foo ptr = (foo)&func;
                  ^~~~~~~~~~
1 warning generated.

For C++ code, it's not possible to detect that at compile-time (AFAIK). Such conversions were not allowed in C++03, but most compilers didn't issue a diagnostic. In C++11 these are "conditionally supported", no diagnostics are printed.

An option that prints more details when the generic function signature mismatch error occurs would be nice to have. Note that the old deprecated fastcomp did issue a lot of more details of which symbol is causing issues, see for example:

Invalid function pointer 69 called with signature 'vii'. Perhaps this is an invalid value (e.g. caused by calling a virtual method on a NULL pointer)? Or calling a function with an incorrect type, which will fail? (it is worth building your source files with -Werror (warnings are errors), as warnings can indicate undefined behavior which can cause this). This pointer might make sense in another type signature: as sig "vi" pointing to function _g_zlib_decompressor_iface_init

(i.e. in the above example it's calling a vi signature with vii)

@juj
Copy link
Collaborator Author

juj commented Jan 27, 2022

Thanks @kleisauke , did not know about those -W levels. Agree that old fastcomp did have a more actionable error message, Alon did a good work on that error print.

@kripken
Copy link
Member

kripken commented Jan 27, 2022

It would be nice to have a mode -s EMULATE_FUNCTION_POINTER_CASTS=2 or similar, which would for each mismatching function pointer signature call, log the callstack and the signatures in question (via e.g. a warnOnce style logger), but still keep on executing the program. Would this be doable?

Yes, but it would take some work. EMULATE_FUNCTION_POINTER_CASTS adds a wrapper to each function in the wasm table, basically fixing up the params and result. That wrapper would need to receive some kind of encoding of the expected signature and compare that to the encoding of the actual signature, and then call an import with the two signatures (and then the import would need to decode that somehow into human-readable text for logging).

Cool about that C warning! I was not aware of that either.

@seanm
Copy link

seanm commented Oct 19, 2022

Building this code natively with GCC or Clang succeeds and runs fine.

FYI you can use UBSan to get a runtime warning:

% clang++ -fsanitize=undefined a.cpp
% ./a.out
a.cpp:11:2: runtime error: call to function func() through pointer to incorrect function type 'int (*)()'
(a.out:x86_64+0x100003eb0): note: func() defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior a.cpp:11:2

@sbc100
Copy link
Collaborator

sbc100 commented Oct 19, 2022

Building this code natively with GCC or Clang succeeds and runs fine.

FYI you can use UBSan to get a runtime warning:

% clang++ -fsanitize=undefined a.cpp
% ./a.out
a.cpp:11:2: runtime error: call to function func() through pointer to incorrect function type 'int (*)()'
(a.out:x86_64+0x100003eb0): note: func() defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior a.cpp:11:2

Nice, does that work for C for as well as C++ code?

@seanm
Copy link

seanm commented Oct 19, 2022

Nice, does that work for C for as well as C++ code?

Yes. C, C++, Objective-C.

@kleisauke
Copy link
Collaborator

FWIW, the forthcoming Clang 16 will introduce a new stricter warning for casts of function types (see llvm/llvm-project@1aad641), which is included as part of the existing error diagnostic setting of -Wcast-function-type.

kripken pushed a commit that referenced this issue Oct 31, 2022
Corresponds to the fix introduced in commit harfbuzz/harfbuzz@60c6b77.

Context: #16126 (comment).
@curiousdannii
Copy link
Contributor

I tried compiling with UBSan, but now it says "Compiling function ... failed: local count too large". (It also required a huge amount of RAM to compile.) So helpful in some situations, but not all unfortunately.

@juj
Copy link
Collaborator Author

juj commented Aug 8, 2023

Recently we have run into yet another bad function pointer cast issue. This time it looks like the root cause is from google/draco library, but we only have a precompiled program to debug, so we are unable to find if the pointer cast is an actual source code issue, or if it is a memory overrun issue - and we are unable to use Chrome or Firefox debugger or DevTools console to see which one it is.

I started implementing a pass to Binaryen to let one diagnose these types of errors, at WebAssembly/binaryen#5864 .

Additionally, I've opened a ticket on LLVM repository side regarding adding a more precise diagnostic flag to LLVM: llvm/llvm-project#64526 so that these issues would be traceable at compile-time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@curiousdannii @seanm @kripken @juj @sbc100 @kleisauke and others