Skip to content

clang++ incompatibility with libstdc++ gcc4-compatible ABI on Windows #135910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jeremyd2019 opened this issue Apr 16, 2025 · 81 comments · May be fixed by #140145
Open

clang++ incompatibility with libstdc++ gcc4-compatible ABI on Windows #135910

jeremyd2019 opened this issue Apr 16, 2025 · 81 comments · May be fixed by #140145
Labels
ABI Application Binary Interface clang Clang issues not falling into any other category platform:windows

Comments

@jeremyd2019
Copy link
Contributor

This was first seen on Cygwin (x86_64-pc-windows-cygnus), where GCC is built with --with-default-libstdcxx-abi=gcc4-compatible, but I reproduced it on x86_64-pc-windows-gnu as well, so I decided to report here.

Start with a mingw-w64 gcc built without --enable-fully-dynamic-string configure argument, and possibly with --with-default-libstdcxx-abi=gcc4-compatible. It should be possible to switch to the gcc4-compatible ABI with -D_GLIBCXX_USE_CXX11_ABI=0 if not built with that option.

The following test program aborts when built with -O2, unless built with -std=c++20 or newer:

#include <stdio.h>
#include <string>

class foo
{
public:
  foo() { m_str.assign("hello"); }

  std::string m_str;

} bar;

int main()
{
  printf("%ld - %s\n", __cplusplus, bar.m_str.c_str());
  return 0;
}

Output of clang++ -O2 -S -o test.clang.s test.cpp
test.clang.s.txt

output of g++ -O2 -S -o test.gcc.s test.cpp
test.gcc.s.txt

relevant difference:

	leaq	_ZNSs4_Rep20_S_empty_rep_storageE(%rip), %rdx
	cmpq	%rdx, %rcx
...
	.section	.bss$_ZNSs4_Rep20_S_empty_rep_storageE,"bw",discard,_ZNSs4_Rep20_S_empty_rep_storageE
	.globl	_ZNSs4_Rep20_S_empty_rep_storageE # @_ZNSs4_Rep20_S_empty_rep_storageE
	.p2align	4, 0x0
_ZNSs4_Rep20_S_empty_rep_storageE:
	.zero	32

vs

	cmpq	.refptr._ZNSs4_Rep20_S_empty_rep_storageE(%rip), %rcx
...
	.section	.rdata$.refptr._ZNSs4_Rep20_S_empty_rep_storageE, "dr"
	.globl	.refptr._ZNSs4_Rep20_S_empty_rep_storageE
	.linkonce	discard
.refptr._ZNSs4_Rep20_S_empty_rep_storageE:
	.quad	_ZNSs4_Rep20_S_empty_rep_storageE

relevant bit of libstdc++ header:

  // Inhibit implicit instantiations for required instantiations,
  // which are defined via explicit instantiations elsewhere.
#if _GLIBCXX_EXTERN_TEMPLATE
  // The explicit instantiation definitions in src/c++11/string-inst.cc and
  // src/c++17/string-inst.cc only instantiate the members required for C++17
  // and earlier standards (so not C++20's starts_with and ends_with).
  // Suppress the explicit instantiation declarations for C++20, so C++20
  // code will implicitly instantiate std::string and std::wstring as needed.
# if __cplusplus <= 201703L && _GLIBCXX_EXTERN_TEMPLATE > 0
  extern template class basic_string<char>;
# elif ! _GLIBCXX_USE_CXX11_ABI
  // Still need to prevent implicit instantiation of the COW empty rep,
  // to ensure the definition in libstdc++.so is unique (PR 86138).
  extern template basic_string<char>::size_type
    basic_string<char>::_Rep::_S_empty_rep_storage[];
# elif _GLIBCXX_EXTERN_TEMPLATE > 0
  // Export _M_replace_cold even for C++20.
  extern template void
    basic_string<char>::_M_replace_cold(char *, size_type, const char*,
					const size_type, const size_type);
# endif

_GLIBCXX_EXTERN_TEMPLATE is 1, and the issue manifests when _GLIBCXX_USE_CXX11_ABI is 0 and __cplusplus is <= 201703.

see msys2/MSYS2-packages#5329 for history of the investigation.

/cc @mstorsjo

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 16, 2025
@jeremyd2019
Copy link
Contributor Author

I don't have rights to set labels, assuming there are more relevant ones.

@EugeneZelenko EugeneZelenko added ABI Application Binary Interface platform:windows labels Apr 16, 2025
@jeremyd2019
Copy link
Contributor Author

MSYS2's mingw-w64-*-gcc packages are built with --enable-fully-dynamic-string so I had to build a custom GCC with the options necessary to test this there: https://github.com/jeremyd2019/MINGW-packages/actions/runs/14479493510

jeremyd2019 added a commit to jeremyd2019/llvm-project that referenced this issue Apr 16, 2025
jeremyd2019 added a commit to jeremyd2019/llvm-project that referenced this issue Apr 16, 2025
jeremyd2019 added a commit to jeremyd2019/llvm-project that referenced this issue Apr 16, 2025
jeremyd2019 added a commit to jeremyd2019/llvm-project that referenced this issue Apr 16, 2025
@jeremyd2019
Copy link
Contributor Author

I'm also seeing a crash with the following simple "hello world" program. I haven't been able to see a relevant difference in the -S output yet (I see something fishy around std::endl but it crashes before getting to that and still crashes with the endl removed).

#include <iostream>

int main() {
    std::cout << "Hello" << std::endl;
    return 0;
}
Thread 1 "hello" received signal SIGSEGV, Segmentation fault.
0x00000003fd9f9cd6 in cygstdc++-6!_ZNSo6sentryC1ERSo ()
   from /usr/bin/cygstdc++-6.dll
(gdb) bt
#0  0x00000003fd9f9cd6 in cygstdc++-6!_ZNSo6sentryC1ERSo ()
   from /usr/bin/cygstdc++-6.dll
#1  0x00000003fda7cc19 in cygstdc++-6!_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l () from /usr/bin/cygstdc++-6.dll
#2  0x00000003fda8baa7 in cygstdc++-6!_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc () from /usr/bin/cygstdc++-6.dll
#3  0x00000001004010a6 in main ()

This crashes on Cygwin but not on MSYS2, so it seems likely to be due to the C++11 ABI option.

jeremyd2019 added a commit to jeremyd2019/llvm-project that referenced this issue Apr 21, 2025
@mstorsjo
Copy link
Member

I haven't really studied the libstdc++ code closely here to understand what's going on... Is this a case where the Cygwin GCC build somehow sets different ABI defines somewhere, that we should try to match? From the fix attempts, it seems so? It's a bit annoying if this is based on a configure option, which sets custom hardcoded values for that GCC build, but which may change at some point in the future...

For reference, see 4e6c8f1, where I added a similar libstdc++ define for a different case. That one was attempting to fix a define that GCC itself sets in gcc/config/i386/cygming.h, where it always is enabled for such targets, which makes it a bit easier to deal with.

That particular commit adds it in clang/lib/Frontend/InitPreprocessor.cpp though; adding it in a toolchain specific file is probably neater. Adding code for checking whether an explicit define already is set feels a bit unusual though - I don't think much other cases does that? If we predefined one value in the compiler, the user specified one should probably apply later anyway - but it would possibly cause warnings about being redefined. I think Clang maintainers may have better input on that case though...

@jeremyd2019
Copy link
Contributor Author

jeremyd2019 commented Apr 22, 2025

I haven't really studied the libstdc++ code closely here to understand what's going on... Is this a case where the Cygwin GCC build somehow sets different ABI defines somewhere, that we should try to match? From the fix attempts, it seems so?

No, the "fix attempts" are really more "workaround attempts" that actually switch to the other of the "dual abi" because that one is not broken. The default ABI on Cygwin (due to gcc configure option, kept to keep compatibility with old libaries) is broken with Clang. It apparently doesn't treat the static member of an extern explicit template class instantiation as extern, while GCC does.

@mstorsjo
Copy link
Member

mstorsjo commented Apr 22, 2025

Thanks, I see! Ok that makes it clearer what to look for here. Is it possible to reduce a minimal testcase with the problematic code without any includes at all? Does Clang do the right thing for a mingw target, for such a testcase, or is this a case which we just haven't hit there either?

@jeremyd2019
Copy link
Contributor Author

jeremyd2019 commented Apr 22, 2025

Clang does not do the "right thing" (doesn't do the same thing as GCC, up to someone else to decide which is "right") for mingw target either, which convinced me to open this issue. I might be able to put something simple together, I don't know if it requires a DLL in the mix or if it will reproduce with just two TUs linked together.

@jeremyd2019
Copy link
Contributor Author

Slightly more complicated than I expected, but not too bad for a simple reproducer:

#include <stdio.h>

template<typename T>
class foo
{
public:
	class inner
	{
	public:
		static char buffer[];
	} i;
};

template<typename T>
char foo<T>::inner::buffer[10];

extern template class foo<char>;

int main(void)
{
	foo<char> bar;
	printf("%p\n", bar.i.buffer);
	return 0;
}

it should give an undefined reference if you try to link it, like

$ g++ -o a.exe a.cpp
D:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: D:\msys64\tmp\ccgQgIpo.o:a.cpp:(.rdata$.refptr._ZN3fooIcE5inner6bufferE[.refptr._ZN3fooIcE5inner6bufferE]+0x0): undefined reference to `foo<char>::inner::buffer'
collect2.exe: error: ld returned 1 exit status

but clang++ succeeds, because it doesn't treat buffer as extern like GCC does.

@mstorsjo
Copy link
Member

it should give an undefined reference if you try to link it, like

$ g++ -o a.exe a.cpp
D:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: D:\msys64\tmp\ccgQgIpo.o:a.cpp:(.rdata$.refptr._ZN3fooIcE5inner6bufferE[.refptr._ZN3fooIcE5inner6bufferE]+0x0): undefined reference to `foo<char>::inner::buffer'
collect2.exe: error: ld returned 1 exit status

but clang++ succeeds, because it doesn't treat buffer as extern like GCC does.

Thanks, I see! I also can observe that Clang does behave like GCC here, for a Linux target. So there's some target specific logic that ends up going differently between the mingw/cygwin targets and Linux here.

@mstorsjo
Copy link
Member

I've reduced the case down a bit further:

template<typename T>
class foo
{
public:
        class inner
        {
        public:
                static char buffer[];
        } i;
};

template<typename T>
char foo<T>::inner::buffer[10];

extern template class foo<char>;

void other(void *ptr);

void func(void)
{
        foo<char> bar;
        other(bar.i.buffer);
}

(Removed the include and printf, so it should compile as such anywhere without any SDK available.)

If compiled with clang++ -target x86_64-linux-gnu -S -emit-llvm, I get this:

@_ZN3fooIcE5inner6bufferE = external global [0 x i8], align 1

While for a x86_64-windows-gnu target, I get this instead:

$_ZN3fooIcE5inner6bufferE = comdat any

@_ZN3fooIcE5inner6bufferE = linkonce_odr dso_local global [10 x i8] zeroinitializer, comdat, align 1

So something in Clang on the codegen level is producing different things for these two targets.

@jeremyd2019
Copy link
Contributor Author

I tend to use printf and main for my reproducers, but I can see how including stdio could add unnecessary overhead.

@mstorsjo
Copy link
Member

I tend to use printf and main for my reproducers, but I can see how including stdio could add unnecessary overhead.

Yes, using that is clearer when one wants to link and execute it. But for comparing code generation, it’s simpler to omit them, so one can generate code for any of the near-infinite number of arch/OS targets that clang supports :-) Plus there’s less things involved in the translation unit.

jeremyd2019 added a commit to jeremyd2019/llvm-project that referenced this issue May 2, 2025
jeremyd2019 added a commit to jeremyd2019/llvm-project that referenced this issue May 2, 2025
jeremyd2019 added a commit to jeremyd2019/llvm-project that referenced this issue May 3, 2025
@tyan0
Copy link

tyan0 commented May 4, 2025

I found the problem disappears with the patch:

--- origsrc/clang-20.1.3.src/lib/Sema/SemaTemplateInstantiate.cpp	2025-04-16 09:23:49.000000000 +0900
+++ src/clang-20.1.3.src/lib/Sema/SemaTemplateInstantiate.cpp	2025-05-04 22:24:26.010847400 +0900
@@ -4360,6 +4360,7 @@
                                                 == TSK_ExplicitSpecialization)
         continue;
 
+#if 0
       if (Context.getTargetInfo().getTriple().isOSWindows() &&
           TSK == TSK_ExplicitInstantiationDeclaration) {
         // On Windows, explicit instantiation decl of the outer class doesn't
@@ -4370,6 +4371,7 @@
         // that users don't end up with undefined symbols during linking.
         continue;
       }
+#endif
 
       if (CheckSpecializationInstantiationRedecl(PointOfInstantiation, TSK,
                                                  Record,

But I don't understand enough what the comment means.

      if (Context.getTargetInfo().getTriple().isOSWindows() &&
          TSK == TSK_ExplicitInstantiationDeclaration) {
        // On Windows, explicit instantiation decl of the outer class doesn't
        // affect the inner class. Typically extern template declarations are
        // used in combination with dll import/export annotations, but those
        // are not propagated from the outer class templates to inner classes.
        // Therefore, do not instantiate inner classes on this platform, so
        // that users don't end up with undefined symbols during linking.
        continue;
      }

@jeremyd2019
Copy link
Contributor Author

jeremyd2019 commented May 4, 2025

Great! I think

  1. check what gcc does with dllexport/dllimport - does it propagate to inner classes or not?
  2. change isOSWindows to isWindowsMSVCEnvironment and possibly change the comment from "On Windows" to "With MSVC". (or else mention that GCC does propagate extern to inner classes and requires this for its gcc4 abi)
  3. open a PR and see if that breaks anything (I know I don't have a setup where I can run all the tests that run in CI)

@jeremyd2019
Copy link
Contributor Author

  1. change isOSWindows to isWindowsMSVCEnvironment

My first thought was adding && !...getTriple().isOSCygMing() but it feels like an MSVC thing being generalized to "Windows" here.

@tyan0
Copy link

tyan0 commented May 5, 2025

foo.h:

template<typename T>
class foo
{
public:
	class inner
	{
	public:
		static char buffer[];
	} i;
};

bar.cc:

#include "foo.h"

template<typename T>
char foo<T>::inner::buffer[10];

__declspec(dllexport) foo<char> bar;

main.cc:

#include <stdio.h>
#include "foo.h"

template<typename T>
char foo<T>::inner::buffer[10];

extern foo<char> bar;

int main(void)
{
	printf("%p\n", bar.i.buffer);
	return 0;
}

With these files,

$ g++ -O2 -shared bar.cc -o bar.dll
$ g++ -O2 main.cc bar.dll -o main.exe
$ ./main
0x100402060
$ 

and

H:\>cl /LD bar.cc
Microsoft(R) C/C++ Optimizing Compiler Version 19.38.33135 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

bar.cc
Microsoft (R) Incremental Linker Version 14.38.33135.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:bar.dll
/dll
/implib:bar.lib
bar.obj
   ライブラリ bar.lib とオブジェクト bar.exp を作成中

H:\>cl main.cc bar.lib
Microsoft(R) C/C++ Optimizing Compiler Version 19.38.33135 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main.cc
Microsoft (R) Incremental Linker Version 14.38.33135.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:main.exe
main.obj
bar.lib

H:\>main
00007FF64BA92B18

H:\>

Both of above successfully linked. I'm not sure how to think about this outcome.

@kikairoya
Copy link
Contributor

I'm just working for similar problem.

My first thought was adding && !...getTriple().isOSCygMing()

According to test CodeGenCXX/windows-itanium-dllexport.cpp, this behavior required by not only WindowsMSVC but also Itanium, PS4, PS5. So I think cannot use isWindowsMSVCEnvironment here, && !isOSCygMing is correct.

main...kikairoya:llvm-project:cygwin_abi
Here is my working patch but not complete, taking a long time because make all tests passed on Cygwin is too hard...

Notable behavior:
https://github.com/kikairoya/llvm-project/blob/e35140840ccf153fac6d3e53ef3fe8b67e3d07a2/clang/test/CodeGenCXX/mingw-template-dllexport.cpp#L39
GCC does not export inner class in dllexport-ed template. So,

$ g++ -O2 -shared bar.cc -o bar.dll
$ g++ -O2 main.cc bar.dll -o main.exe
$ ./main
0x100402060
$

shows address in .exe module, not in .dll.
I can't find any document describes why does this strange behavior.

@jeremyd2019
Copy link
Contributor Author

jeremyd2019 commented May 5, 2025

Cool. Some of those look like what @mati865 did already (https://github.com/llvm/llvm-project/pull/134458/files) but you split them up in a way that might actually be acceptable to merge.

And personally, I wouldn't wait for all the tests to pass - I'm not anybody on this project, but I think incremental improvements are nice too! They also seem to appreciate smaller pull requests here.

@mati865
Copy link
Contributor

mati865 commented May 6, 2025

I started working on creating unit tests but more important things got my attention.
My idea was to split all the changes into separate commits, each one with own unit test. And then create series of PRs with 1-2 commits each.
My patches were sufficient to let Clang rebuild itself and some simple programs with different code models.

I'd welcome if somebody takes over.

@jeremyd2019
Copy link
Contributor Author

jeremyd2019 commented May 16, 2025

woah, I just noticed

class _LIBCPP_TEMPLATE_VIS sentry;
and
class _LIBCPP_TEMPLATE_VIS basic_ostream<_CharT, _Traits>::sentry {

So in c++03 sentry already has template vis. It was removed from the normal header in #134885. But template vis isn't defined to anything on coff... 😦

@kikairoya
Copy link
Contributor

I was trying with patch like this (thank you for make CI run on llvm-mingw)

#  if defined(__MINGW32__) || defined(__CYGWIN__)
extern template _LIBCPP_EXTERN_TEMPLATE_TYPE_VIS basic_ostream<char>::sentry::sentry(basic_ostream<char>& __os);
extern template _LIBCPP_EXTERN_TEMPLATE_TYPE_VIS basic_ostream<char>::sentry::~sentry();
#  endif

but I think scattering # if defined(...) blocks is not a good style. Introducing a new annotation keyword seems preferable.
Names like _LIBCPP_INNER_CLASS_IN_TEMPLATE_VIS or _LIBCPP_MEMBER_OF_INNER_CLASS_IN_TEMPLATE_VIS are candidates, but there might be more better names.

I'll test if this approach fine but my local environment has broken, building libc++.dll says cannot export _ZNSt3__122__libcpp_verbose_abortEPKcz: symbol wrong type (2 vs 3). I'm seeking for what I wrong...

@kikairoya
Copy link
Contributor

I'll test if this approach fine but my local environment has broken, building libc++.dll says cannot export _ZNSt3__122__libcpp_verbose_abortEPKcz: symbol wrong type (2 vs 3). I'm seeking for what I wrong...

Found it. BFD ld (COFF) cannot export weak symbol. Need to use LLD.

@kikairoya
Copy link
Contributor

To keep me clear myself, list once more how compilers treat std::ostream::sentry.

  • mingw-gcc and patched-clang
    in DLL: Instantiated explicitly but not exported. dllexport at outer template has no effect.
    client: Does not instantiated by effect of extern template decl. of outer class and may be imported.
  • old clang
    in DLL: Instantiated explicitly but not exported. dllexport at outer template has no effect.
    client: Causes implicitly instantiation regardless existence of extern template decl. of outer class.
  • what patched-clang should do to keep compatibility
    in DLL: Instantiated explicitly or implicitly but not to export. Outer template must be instantiated and exported.
    client: Suppress effect of extern template outer to cause implicitly instantiation.

Clang has __attribute__((__exclude_from_explicit_instantiation__)) so attaching this attribute if and only if included from client code using clang can keep binary and source compatibility.

@kikairoya
Copy link
Contributor

@jeremyd2019 's test https://github.com/jeremyd2019/llvm-mingw/actions/runs/15080594305 has completed so we can consider affected entities in libc++ are std::ostream::sentry and std::istream::sentry only.
There are also __cxx03/ headers but they would not be touched (and, I think nobody makes a DLL from __cxx03 headers.)

I have a patch introduces new keyword 92a3b90 . It can be merged independently prior to patch clang. If this approach is acceptable, it's better to open a new PR for this new keyword, right?

@jeremyd2019
Copy link
Contributor Author

It can be merged independently prior to patch clang. If this approach is acceptable, it's better to open a new PR for this new keyword, right?

Yes.

To keep me clear myself, list once more how compilers treat std::ostream::sentry.

  • mingw-gcc and patched-clang
    in DLL: Instantiated explicitly but not exported. dllexport at outer template has no effect.
    client: Does not instantiated by effect of extern template decl. of outer class and may be imported.

Clang has __attribute__((__exclude_from_explicit_instantiation__)) so attaching this attribute if and only if included from client code using clang can keep binary and source compatibility.

Is there anything to be done for compatibility with mingw-gcc? I mean, it already doesn't work, but it'd be cool if what we do here could help that as well.

@kikairoya
Copy link
Contributor

Is there anything to be done for compatibility with mingw-gcc? I mean, it already doesn't work, but it'd be cool if what we do here could help that as well.

May this change need a new option like -f{a-short-nice-description-of-this-problem}-compatibility? If not, I have no other concern about mingw-gcc compatibility, both related or not to this issue.

@jeremyd2019
Copy link
Contributor Author

Is there anything to be done for compatibility with mingw-gcc? I mean, it already doesn't work, but it'd be cool if what we do here could help that as well.

May this change need a new option like -f{a-short-nice-description-of-this-problem}-compatibility? If not, I have no other concern about mingw-gcc compatibility, both related or not to this issue.

I'd have to defer to others on that, but I think no.

By "compatibility with mingw-gcc" I mean libc++ with gcc.

@kikairoya
Copy link
Contributor

By "compatibility with mingw-gcc" I mean libc++ with gcc.

OK, I understand.
Because building libc++ DLL on any Windows environment requires LLD, I had no idea to use libc++ with MinGW-GCC.
But using libc++ built by Clang from GCC might be able, so to support such situation, _LIBCPP_INNER_CLASS_IN_TEMPLATE_VIS must have another effect on GCC rather than no-op.

@jeremyd2019
Copy link
Contributor Author

OK, so here's what I'm currently thinking (and @mstorsjo feel free to correct me):

  1. it seems like a PR for the libc++ change should go first, since it would not break with existing clang, but a clang patched with a fix for this issue would break with a libc++ without this change.
  2. be sure to give details of how the extern qualifier is inherited by inner classes in GCC, and that dllexport (and dllimport) is not, and that clang needing to replicate this behavior necessitates changing these inner classes. Expect to explain the various other options considered and their drawbacks. It sounds like libc++ has a different review regime from the rest of llvm-project, and they are reluctant to add platform-specific tweaks.
  3. if they are unconvinced, the only other viable alternative would probably be to convince GCC and clang to change so that dllexport (and presumably dllimport) are inherited by inner classes, in addition to the fix for this issue making extern inherited by inner classes under clang mingw/cygwin. This seems like it would have a much greater probability of breaking existing code though.

@kikairoya
Copy link
Contributor

Thank you for your kind explanation and I apologize for my bad reading. This time I think I understand what you are talking about.

for 1. and 2.:
I totally agree with you and I found that my point of view was insufficient.

main...kikairoya:llvm-project:libcxx-new-visibility-keyword-for-compat-mingw
I'm in the process of preparing a patch of libc++ including description of what it does but It's not enough yet about both of commit msg and DesignDocs. I wrote about what happens by this patch but didn't the detail of why need to modify clang nor why didn't take another option.

for 3.:
I think modifying GCC is much harder ( @mstorsjo's report has left for over 5 years ), so we need to describe precisely and carefully to make patch accepted.
The last resort is, providing a -f{...}-compatibility and enable by default for Cygwin target only.

If I may ask you to help me, could you polish descriptions I wrote?

@jeremyd2019
Copy link
Contributor Author

jeremyd2019 commented May 19, 2025

To keep binary compatibility purpose only of a class not marked as dllexport
that is defined in a class template. Not to add this for new class, and this
can be replaced to _LIBCPP_EXPORTED_FROM_ABI at a timing of ABI change occurs.

Actually, I don't think putting _LIBCPP_EXPORTED_FROM_ABI where you currently put _LIBCPP_INNER_CLASS_IN_TEMPLATE_VIS would work. In MSVC mode that would expand to dllimport and would mean that MSVC would try to dllimport any instantiation of that class, not just the ones that are explicitly instantiated in the DLL and declared extern. I believe the libc++ tests cover that, if they could be convinced to get past stages 1 and 2 to actually run the Windows tests 😉 (you could modify the workflow in a fork to run them, as @mstorsjo suggested above somewhere).

I believe the best ABI-breaking solution would be for all members to be _LIBCPP_HIDE_FROM_ABI - actually it seems ABI v2 is "unstable" so probably add _LIBCPP_HIDE_FROM_ABI_AFTER_V1 to 'em now? Not sure how that would play out on Linux with the explicit extern template instantiations, probably have to run the tests to be sure.

#      if !defined(__clang__) || !__has_attribute(exclude_from_explicit_instantiation) \
          defined(_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS)

This doesn't seem right, not least because there's an operator missing between the __has_attribute and the defined (probably ||). But I don't see why you need !defined(__clang__) in there - if GCC added support for the exclude_from_explicit_instantiation attribute, that would be great too.

@kikairoya
Copy link
Contributor

kikairoya commented May 19, 2025

In MSVC mode that would expand to dllimport and would mean that MSVC would try to dllimport any instantiation of that class, not just the ones that are explicitly instantiated in the DLL and declared extern.

I had forgotten that about MSVC does. Probably, with this solution, it's not able to remove this new keyword simply even in the future?

actually it seems ABI v2 is "unstable" so probably add _LIBCPP_HIDE_FROM_ABI_AFTER_V1 to 'em now?

This seems to be a viable option. I think now, it might be better to apply _LIBCPP_HIDE_FROM_ABI if and only if __MINGW32__.

This doesn't seem right, not least because there's an operator missing between the __has_attribute and the defined (probably ||).

It's unveiled I didn't test after format manually... ( clang-format is disabled here )

But I don't see why you need !defined(__clang__) in there

I added it to minimize of side effect but it was better to did not as you say to believe GCC will support in the future.

I have to review myself what my patch does and should do once again.

@kikairoya
Copy link
Contributor

I think now, it might be better to apply _LIBCPP_HIDE_FROM_ABI if and only if __MINGW32__.

This will keep ABI (I'm saying without any testing), and according to https://releases.llvm.org/20.1.0/projects/libcxx/docs/DesignDocs/ABIVersioning.html , keeping ABI strictly on Windows is not required ( it says about MSVC but seems to be same for MinGW. )

@jeremyd2019
Copy link
Contributor Author

actually it seems ABI v2 is "unstable" so probably add _LIBCPP_HIDE_FROM_ABI_AFTER_V1 to 'em now?

This seems to be a viable option. I think now, it might be better to apply _LIBCPP_HIDE_FROM_ABI if and only if __MINGW32__.

Or __CYGWIN__. Regardless there's going to be a platform ifdef somewhere.

But I don't see why you need !defined(__clang__) in there

I added it to minimize of side effect but it was better to did not as you say to believe GCC will support in the future.

I don't have any idea whether or not GCC might add support for that, but it's generally better to test for features than versions (or compiler vendors).

@jeremyd2019
Copy link
Contributor Author

actually it seems ABI v2 is "unstable" so probably add _LIBCPP_HIDE_FROM_ABI_AFTER_V1 to 'em now?

This seems to be a viable option. I think now, it might be better to apply _LIBCPP_HIDE_FROM_ABI if and only if __MINGW32__.

Or __CYGWIN__. Regardless there's going to be a platform ifdef somewhere.

It's really going to need feedback from the libc++ reviewers to know for sure, but I'd guess _LIBCPP_HIDE_FROM_ABI on __MINGW32__ || __CYGWIN__ and _LIBCPP_HIDE_FROM_ABI_AFTER_V1 for everyone else?

#if defined (__MINGW32__) || defined (__CYGWIN__)
... _LIBCPP_HIDE_FROM_ABI
#else
... _LIBCPP_HIDE_FROM_ABI_AFTER_V1
#endif

The big question is whether this is a new define in __config or something in basic_ostream.h and istream... Probably up to the libc++ reviewers what's less objectionable.

I've found that the 'gcc' job in 'stage1' fails unless the function bodies are inside the class definition when _LIBCPP_HIDE_FROM_ABI is on them.

@mstorsjo
Copy link
Member

I think now, it might be better to apply _LIBCPP_HIDE_FROM_ABI if and only if __MINGW32__.

This will keep ABI (I'm saying without any testing), and according to https://releases.llvm.org/20.1.0/projects/libcxx/docs/DesignDocs/ABIVersioning.html , keeping ABI strictly on Windows is not required ( it says about MSVC but seems to be same for MinGW. )

For MSVC we can break ABI in libc++ if we need to. For mingw there’s no formal guarantee, but we would like to avoid breaking it a lot. A small low impact break may be fine, but one that breaks backwards compat for all existing binaries built against libc++.dll works be very painful for msys2, and for llvm-mingw I wouldn’t like to ship that either.

@kikairoya
Copy link
Contributor

kikairoya commented May 20, 2025

I've found that the 'gcc' job in 'stage1' fails unless the function bodies are inside the class definition when _LIBCPP_HIDE_FROM_ABI is on them.

Adding keyword inline with _LIBCPP_HIDE_FROM_ABI is enough to make gcc happy, as seen in basic_ostream::basic_ostream etc.

@kikairoya
Copy link
Contributor

but I'd guess _LIBCPP_HIDE_FROM_ABI on __MINGW32__ || __CYGWIN__ and _LIBCPP_HIDE_FROM_ABI_AFTER_V1 for everyone else?

Yes that is. The new patch is here but still incomplete to ready to be reviewed.
main...kikairoya:llvm-project:libcxx-new-visibility-keyword-for-compat-mingw-2

I have tried to run CI tests on github-provided worker but aborted due to disk full. I may need to tweak scripts to save storage or to prepare a self-hosted worker.

@jeremyd2019
Copy link
Contributor Author

// If time to a new class in a class template comes, all non-inline member functions of that new inner class must
// be declared with _LIBCPP_HIDE_FROM_ABI_AFTER_V1 otherwise they build to DLL will be inaccessible by MinGW-GCC.

I would say new member functions of inner classes (new or not) should be _LIBCPP_HIDE_FROM_ABI.

#  if defined(__MINGW32__) || defined(__CYGWIN__)
#    define _LIBCPP_HIDE_FROM_ABI_MINGW_OR_AFTER_V1 inline _LIBCPP_HIDE_FROM_ABI
#  else
#    define _LIBCPP_HIDE_FROM_ABI_MINGW_OR_AFTER_V1 _LIBCPP_HIDE_FROM_ABI_AFTER_V1
#  endif

shouldn't the inline be in both cases? (or on the methods themselves, apparently since it's not already included in the existing 'hide' macros)?

@kikairoya
Copy link
Contributor

I would say new member functions of inner classes (new or not) should be _LIBCPP_HIDE_FROM_ABI.

You're right. Required for new member functions regardless classes are new or not and here must be _LIBCPP_HIDE_FROM_ABI.

shouldn't the inline be in both cases? (or on the methods themselves, apparently since it's not already included in the existing 'hide' macros)?

Adding inline changes semantics and possibly causes ABI break especially due to -fvisibility-inlines-hidden.

@jeremyd2019
Copy link
Contributor Author

adding inline seems to be OK, the ABI test on linux seems to be passing. And I think it is required for the combo of gcc + abi-unstable. which I tested in my #140169 draft pr. we'll see if the rest of the ci will pass, or if it'll die for no apparent reason later.

@kikairoya
Copy link
Contributor

And I think it is required for the combo of gcc + abi-unstable.

Oh, that is. inline is needed.

@kikairoya
Copy link
Contributor

It's painful to run 'gcc-14', that requires massive amount of RAM...

1177680 docker    20   0 4309832   3.9g   3812 R   1.3  25.2   2:18.70 cc1plus
1177642 docker    20   0 3592476   1.0g    104 D   0.8   6.3   2:02.94 cc1plus
1177753 docker    20   0 2714800   1.9g    496 D   0.6  12.1   1:12.76 cc1plus
1177818 docker    20   0 2046964   1.8g    456 D   0.6  11.8   1:01.84 cc1plus
1177629 docker    20   0 3612096   1.1g    464 D   0.4   7.0   2:21.94 cc1plus
1177711 docker    20   0 3569908   2.7g    328 D   0.3  17.1   2:15.73 cc1plus

Perhaps instability of CI tests are caused by OOM killer destroying docker process.

@mstorsjo
Copy link
Member

Perhaps instability of CI tests are caused by OOM killer destroying docker process.

I guess it could be a contributing factor, but the custom runners also run on some sort of cluster where the runner can be interrupted for some paying customer job, so it’s somewhat by design that they will be interrupted, to some extent. There’s some workflow that should try to restart them automatically as needed, but it’s of course not the most convenient thing. (And having privileges to restart the jobs manually does help.)

@jeremyd2019
Copy link
Contributor Author

I think we're about at a point that a PR would be helpful. I know there's some wordsmithing left to do, but it's much more convenient to review and make suggestions against a PR than a bare branch.

@kikairoya
Copy link
Contributor

kikairoya commented May 22, 2025

I've posted the PR. Though CI is still unstable as ever, may I ask you to check it before set as "ready for review"?

And, I'm trying to customized CI build to verify ABI is kept on DLL built by mingw-clang and clang-cl. It would be nice to be able to verify ABI stability in CI like Linux, at least for MinGW target.

@mstorsjo
Copy link
Member

And, I'm trying to customized CI build to verify ABI is kept on DLL built by mingw-clang and clang-cl. It would be nice to be able to verify ABI stability in CI like Linux, at least for MinGW target.

Indeed it would. This was actually discussed quite recently in #140507 (comment) - I haven't had time to look into it myself yet. But patches that add abilist support for mingw (and msvc, even if the ABI is unstable) would probably be very welcome (and very much appreciated by me); that makes it clearer if there are changes to the set of symbols exported.

If working on that, it may be relevant to use llvm-readobj --coff-exports or something similar, on a DLL, rather than just listing symbols with nm, as those symbols on a DLL aren't the same thing (and DLLs can be entirely without a symbol table).

@kikairoya
Copy link
Contributor

Symbols in export table and import table of DLLs from clang-cl-dll, mingw-dll, mingw-dll-i686 are identical (and size of libc++.dll from mingw-dll and mingw-dll-i686 are different) between patched and not:
https://github.com/kikairoya/llvm-project/actions/runs/15219636933
https://github.com/kikairoya/llvm-project/actions/runs/15173215012

To extract:

> llvm-readobj --coff-imports --coff-exports libc++.dll | awk -F" *: *" -e '/\{/{m=substr($1,1,1);o=0;n="";r=0;};/Ordinal *:/{o=$2;}/Name *:/{n=$2;}/Symbol *:/{printf("%s,%s,%s\n",m,n,$2);}/\}/{if(m=="E"){printf("%s,%s,%d\n", m,n,o);}}' | sort

llvm-readobj has a option named --pretty-print but has no effect for --coff-imports or --coff-exports.

This includes export ordinals but they aren't used as I know (and, they are uncontrollable) so should be omitted when integrating such test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Application Binary Interface clang Clang issues not falling into any other category platform:windows
Projects
None yet
7 participants