-
Notifications
You must be signed in to change notification settings - Fork 24.3k
[CXX11ABI] torch 2.6.0-cu126 and cu124 have different exported symbols #152790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree this shouldn't be the case... |
Maybe worth adding some smoke tests on what symbols actually get exported / compare between versions. Especially if now symbols should not get auto-exported to reduce symbol pollution... I also wonder how/why flash_attention depends on Also, xformers hid all 2.6.0-compatible installation recipes, despite the fact that vllm hasn't released 2.7.0 version yet and flash_attention seems broken with 2.7.0... |
We do have smoke tests for those in https://github.com/pytorch/pytorch/blob/main/.ci/pytorch/smoke_test/check_binary_symbols.py but they were mostly focused on non-using CXX11 ABI before it were enabled by default
|
I also wonder if this symbol can be brought back in 2.7.1 - just to make sure that flash_attention's pip-distributed pre-compiled binaries work for torch 2.7.1 out-of-the-box?.. Maybe this symbols is needed for building PyTorch C++ extensions using |
Ok, I sometimes fail to read issues: this is an expected behavior for 2.6.0 and indeed associated with CXX11 ABI migration: 12.4 were build with pre-CXX11ABI and 12.6 with it, which should have been reflected in release notes |
I see - just a bit strange that this happened without version bump within the same 2.6.0 release. And if it happened during 2.6.0 release - flash_attention's own releases seem falling behind for quite some months now... |
I.e. the symbol name in 2.6 (and all 2.7.0) are |
So it is expected that binaries of the most C++ extensions need to upgrade and push new binaries, right? |
@vadimkantorov just checking: are you saying you can't build flash attention from source, or their binaries are incompatible? |
Until 2.7.0 PyTorch had no stable ABI, so it was always the recommendation. But I think some packages might have hardcoded the ABI flags in their script instead of querying torch build system (I forgot the exact API name, but it should be something like |
I haven't tried compiling flash_attention from scratch - since I discovered that I need to downgrade to cu124 to get it working for now But I expect more people would send issues to flash_attention about this soon, since 2.7.0 is a new release (and maybe much fewer people did upgrade 2.6.0 from cu124 to cu126):
|
Maybe for extensions like flash_attention (which do not use other functions from libtorch to do computations with tensors or benefit from autograd), simply them providing functions via C FFI and accepting tensors via raw pointers / DLPack would be sufficient - and not require upgrading when PyTorch upgrades its ABI - (I hope the |
@malfet maybe good for "smoketests" would be to have a test suite in PyTorch which tries building from source the most popular C++ pytorch extensions: to check if there are any errors and to raise the issue with the authors before the PyTorch release or even to have in Known Issues section in the release notes saying about other tools: flash_attention's binaries are broken and needs a new release, vllm same, etc I guess the thing with |
Hi @vadimkantorov Please go ahead and create an issue in https://github.com/pytorch/test-infra/issues with list of extensions you think could be useful for us to test against. We do have nightly smoke tests running on nightly basis: https://github.com/pytorch/test-infra/blob/main/.github/workflows/validate-nightly-binaries.yml hence can look into extending the validation framework and adding extra test here. |
Some followups:
#define _GLIBCXX_USE_CXX11_ABI 0
#include <c10/util/Exception.h>
#include <iostream>
c10::Error::Error(SourceLocation source_location, std::string msg) {
std::cout << "Ha" << std::endl;
} But it is much harder to redispatch strings from one ABI to the next, and even if one achieves something like that C ships, it results in segfauls when inlined destructor is called even for as simple extensions as #define _GLIBCXX_USE_CXX11_ABI 0
#include <torch/extension.h>
// A simple function that multiplies each element by 2
torch::Tensor multiply_by_two(torch::Tensor input) {
TORCH_CHECK_VALUE(input.numel() < 10, "Too many elements");
return input * 2;
}
PYBIND11_MODULE(simple_extension, m) {
m.def("multiply_by_two", &multiply_by_two, "Multiply each element by 2"); which works with abovementioned example, but crashes if wrappers redispatches to C-like constructor
|
@vadimkantorov can you explain in a bit more detail how you end up in that situation? I run
and it seems to pass smoke tests like
|
I probably first installed unsloth which brought in PyTorch 2.6.0cu124 and flash_attn with old ABI, and then I somehow updated cuda on the machine and tried to upgrade PyTorch manually. Pip then maybe always fetches the old abi binary of flash_attn from the cache? Or maybe the new abi binary of flash_attn is not discovered by pip instal and must be installed manually via the whl link? |
So the full installation line for pytorch 2.7.0 and flash_attn for cu126 seems to be:
although, xformers now announced they dropped support for 2.6.0 for precompiled binaries :( seems a bit too fast :( - e.g. vllm's support for 2.7.0 is not released yet (expected in 0.9.0, and latest vllm depending on 2.6.0 still requires xformers which itself depends on 2.7.0) |
@malfet I think pip sometimes would also ignore Found it with vllm: But likely it affects pytorch itself as well :( |
Uh oh!
There was an error while loading. Please reload this page.
🐛 Describe the bug
The symbol
_ZN3c105ErrorC2ENS_14SourceLocationESs
is exported in cu124's version, but missing in cu126: somenm
outputs in Dao-AILab/flash-attention#1644I understand that because of missing symbols, flash_attention has stopped working with torch 2.7. But it was a bit surprising that the exported symbols differ between cu124 and cu126 version of the same release...
Also, a question is why torch exported
_ZN3c105ErrorC2ENS_14SourceLocationESs
and why flash_attention depends on it...@malfet
Versions
torch 2.6.0-cu126 and cu124
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman @ptrblck @eqy @jerryzh168
The text was updated successfully, but these errors were encountered: