Skip to content
This repository was archived by the owner on Nov 1, 2023. It is now read-only.

Sendable::InitSendable: force builder to be a reference #40

Merged
merged 1 commit into from
Dec 30, 2022

Conversation

virtuald
Copy link
Member

@virtuald virtuald commented Apr 20, 2022

Unfortunately, this isn't the whole story here. Actually using the builder causes ntcore to abort() on robot exit (which... that probably already happens when you add a listener? Need to test that) because the listener gets deregistered on python shutdown, which causes the std::function wrappers for get/set to be destroyed, which tries to take the GIL, which aborts because you can't take the GIL on another python thread on shutdown.

Related bugs:

The Right Fix is to force ntcore to use our thread instead of its own thread, but even if we added a ntcore poller thread that wouldn't resolve the issue (a la Java) because we can't intercept the builder's callback registration easily.

@virtuald
Copy link
Member Author

virtuald commented Apr 20, 2022

Stacktrace for posterity:

#!/usr/bin/env python3

import wpilib
import wpiutil


class Test(wpiutil.Sendable):
    def __init__(self):
        super().__init__()
        wpiutil.SendableRegistry.add(self, "Test", 1)

    def initSendable(self, builder):
        print("hi", builder)
        builder.addDoubleProperty("key", self._get, self._set)
    
    def _set(self, value):
        print("set", value)
    def _get(self):
        print("get")
        return 1


test = Test()
wpilib.SmartDashboard.putData("test", test)
(gdb) info threads
  Id   Target Id                                              Frame 
  1    Thread 0x7ffff78fa740 (LWP 1070510) "python"           0x00007ffff7d5de74 in property_traverse (self=<property at remote 0x7fffe76da720>, visit=0x7ffff7d46de0 <visit_reachable>, arg=0x55555555ce80)
    at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Objects/descrobject.c:1772
* 2    Thread 0x7fffe7635640 (LWP 1070561) "python" (Exiting) 0x00007ffff7aad2a2 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7aad2a2 in raise () from /lib64/libc.so.6
#1  0x00007ffff7a968a4 in abort () from /lib64/libc.so.6
#2  0x00007fffe9b43a06 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
#3  0x00007fffe9b4f20c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#4  0x00007fffe9b4f277 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007fffe9b4ebcc in __gxx_personality_v0 () from /lib64/libstdc++.so.6
#6  0x00007fffe9a98604 in _Unwind_ForcedUnwind_Phase2 () from /lib64/libgcc_s.so.1
#7  0x00007fffe9a98cf2 in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
#8  0x00007ffff7a61526 in __pthread_unwind () from /lib64/libpthread.so.0
#9  0x00007ffff7a59572 in pthread_exit () from /lib64/libpthread.so.0
#10 0x00007ffff7af5bfa in pthread_exit () from /lib64/libc.so.6
#11 0x00007ffff7d08acf in PyThread_exit_thread () at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Python/thread_pthread.h:370
#12 0x00007ffff7ca047e in take_gil (tstate=<optimized out>) at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Python/ceval_gil.h:224
#13 0x00007ffff7e16b36 in PyEval_AcquireThread (tstate=0x7fffe0000bb0) at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Python/ceval.c:385
#14 0x00007fffe7dd0915 in pybind11::gil_scoped_acquire::gil_scoped_acquire (this=0x7fffe7634c20) at /robotpy-build/robotpy_build/pybind11/include/pybind11/detail/../gil.h:78
#15 0x00007fffe80494a3 in pybind11::detail::type_caster<std::function<void (double)>, void>::load(pybind11::handle, bool)::func_handle::~func_handle() (this=0x7fffe0000cc0, __in_chrg=<optimized out>)
    at /robotpy-build/robotpy_build/pybind11/include/pybind11/functional.h:84
#16 pybind11::detail::type_caster<std::function<void (double)>, void>::load(pybind11::handle, bool)::func_wrapper::~func_wrapper() (this=0x7fffe0000cc0, __in_chrg=<optimized out>) at /robotpy-build/robotpy_build/pybind11/include/pybind11/functional.h:90
#17 std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (double)>, void>::load(pybind11::handle, bool)::func_wrapper>::_M_destroy(std::_Any_data&, std::integral_constant<bool, false>) (__victim=...) at /usr/include/c++/11/bits/std_function.h:175
#18 std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (double)>, void>::load(pybind11::handle, bool)::func_wrapper>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) (__op=<optimized out>, __source=..., __dest=...)
    at /usr/include/c++/11/bits/std_function.h:203
#19 std::_Function_handler<void (double), pybind11::detail::type_caster<std::function<void (double)>, void>::load(pybind11::handle, bool)::func_wrapper>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>)
    at /usr/include/c++/11/bits/std_function.h:282
#20 0x00007fffe8743d38 in std::_Function_base::~_Function_base (this=0x7fffe7634c90, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#21 std::function<void (double)>::~function() (this=0x7fffe7634c90, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:370
#22 frc::SendableBuilderImpl::<lambda(nt::NetworkTableEntry)>::<lambda(const nt::EntryNotification&)>::<lambda()>::~<lambda> (this=0x7fffe7634c90, __in_chrg=<optimized out>) at /__w/allwpilib/allwpilib/wpilibc/src/main/native/cpp/smartdashboard/SendableBuilderImpl.cpp:145
#23 frc::SendableBuilderImpl::<lambda(nt::NetworkTableEntry)>::<lambda(const nt::EntryNotification&)>::operator() (event=..., __closure=<optimized out>) at /__w/allwpilib/allwpilib/wpilibc/src/main/native/cpp/smartdashboard/SendableBuilderImpl.cpp:144
#24 std::_Function_handler<void(const nt::EntryNotification&), frc::SendableBuilderImpl::AddDoubleProperty(std::string_view, std::function<double()>, std::function<void(double)>)::<lambda(nt::NetworkTableEntry)>::<lambda(const nt::EntryNotification&)> >::_M_invoke(const std::_Any_data &, const nt::EntryNotification &) (__functor=..., __args#0=...) at /usr/include/c++/8/bits/std_function.h:297
#25 0x00007fffe917200c in std::function<void (nt::EntryNotification const&)>::operator()(nt::EntryNotification const&) const (__args#0=..., this=0x7fffe7634d90) at /usr/include/c++/8/bits/std_function.h:687
#26 nt::impl::EntryNotifierThread::DoCallback(std::function<void (nt::EntryNotification const&)>, nt::EntryNotification const&) (this=0x555555b52130, data=..., callback=...) at /__w/allwpilib/allwpilib/ntcore/src/main/native/cpp/EntryNotifier.h:70
#27 wpi::CallbackThread<nt::impl::EntryNotifierThread, nt::EntryNotification, nt::impl::EntryListenerData, nt::EntryNotification>::Main (this=0x555555b52130) at /__w/allwpilib/allwpilib/wpiutil/src/main/native/include/wpi/CallbackManager.h:142
#28 0x00007fffe9b7bcd4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#29 0x00007ffff7a582a5 in start_thread () from /lib64/libpthread.so.0
#30 0x00007ffff7b70323 in clone () from /lib64/libc.so.6

(gdb) t 1
[Switching to thread 1 (Thread 0x7ffff78fa740 (LWP 1070510))]
#0  0x00007ffff7d5de74 in property_traverse (self=<property at remote 0x7fffe76da720>, visit=0x7ffff7d46de0 <visit_reachable>, arg=0x55555555ce80) at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Objects/descrobject.c:1772
1772        Py_VISIT(pp->prop_del);
(gdb) bt
#0  0x00007ffff7d5de74 in property_traverse (self=<property at remote 0x7fffe76da720>, visit=0x7ffff7d46de0 <visit_reachable>, arg=0x55555555ce80) at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Objects/descrobject.c:1772
#1  0x00007ffff7d4629c in move_unreachable (unreachable=0x7fffffffc1b0, young=0x55555555ce80) at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Modules/gcmodule.c:580
#2  deduce_unreachable (base=base@entry=0x55555555ce80, unreachable=unreachable@entry=0x7fffffffc1b0) at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Modules/gcmodule.c:1130
#3  0x00007ffff7d45f85 in collect (tstate=0x55555555e1f0, generation=2, n_collected=0x0, n_uncollectable=0x0, nofail=1) at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Modules/gcmodule.c:1212
#4  0x00007ffff7dfa1d2 in _PyGC_CollectNoFail () at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Modules/gcmodule.c:2109
#5  0x00007ffff7df9e50 in _PyImport_Cleanup (tstate=0x55555555e1f0) at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Python/import.c:575
#6  0x00007ffff7df92df in Py_FinalizeEx () at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Python/pylifecycle.c:1426
#7  0x00007ffff7deabbd in Py_RunMain () at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Modules/main.c:679
#8  0x00007ffff7dbb73d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/python3.9-3.9.12-1.fc34.x86_64/Modules/main.c:731
#9  0x00007ffff7a97b75 in __libc_start_main () from /lib64/libc.so.6
#10 0x000055555555509e in _start ()

@auscompgeek
Copy link
Member

Hmm. This would also be a problem for moved objects in C++, right?

@auscompgeek
Copy link
Member

Ah, C++ has SendableHelper which calls SendableRegistry::Move and Remove. We'll probably want a SendableBase or something to handle that for Python subclasses?

@auscompgeek
Copy link
Member

I tried binding to

#pragma once

#include "wpi/sendable/Sendable.h"
#include "wpi/sendable/SendableHelper.h"

namespace rpy {

class SendableBase : public wpi::Sendable,
                     public wpi::SendableHelper<SendableBase> {};

}

with

classes:
  SendableBase:
    ignored_bases:
      - wpi::SendableHelper<SendableBase>

and trying the test example, subclassing SendableBase instead, deadlocks when exiting.

@auscompgeek
Copy link
Member

Heh, if I put it in a proper robot and run tests, it doesn't deadlock. sim still deadlocks though. Full code I ran:

#!/usr/bin/env python3

import wpilib
import wpiutil


class Test(wpiutil.SendableBase):
    def __init__(self):
        super().__init__()
        wpiutil.SendableRegistry.add(self, "Test", 1)

    def initSendable(self, builder):
        print("hi", builder)
        builder.addDoubleProperty("key", self._get, self._set)
    
    def _set(self, value):
        print("set", value)

    def _get(self):
        print("get")
        return 1


class Robot(wpilib.TimedRobot):
    def robotInit(self):
        self.test = Test()
        wpilib.SmartDashboard.putData("test", self.test)


if __name__ == "__main__":
    wpilib.run(Robot)

@auscompgeek
Copy link
Member

auscompgeek commented Apr 21, 2022

Here's the deadlock backtrace:

(lldb) bt
* thread #2, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff2090bcce libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff2093ee49 libsystem_pthread.dylib`_pthread_cond_wait + 1298
    frame #2: 0x000000010026bd6d Python`take_gil + 588
    frame #3: 0x000000010026c2b7 Python`PyEval_AcquireThread + 19
    frame #4: 0x0000000106b0b8b3 _wpilib.cpython-39-darwin.so`pybind11::gil_scoped_acquire::gil_scoped_acquire() + 83
    frame #5: 0x0000000106cb57d7 _wpilib.cpython-39-darwin.so`pybind11::detail::type_caster<std::__1::function<void (double)>, void>::load(pybind11::handle, bool)::func_handle::~func_handle() + 23
    frame #6: 0x0000000106cb5562 _wpilib.cpython-39-darwin.so`std::__1::__function::__func<pybind11::detail::type_caster<std::__1::function<void (double)>, void>::load(pybind11::handle, bool)::func_wrapper, std::__1::allocator<pybind11::detail::type_caster<std::__1::function<void (double)>, void>::load(pybind11::handle, bool)::func_wrapper>, void (double)>::destroy_deallocate() + 18
    frame #7: 0x0000000106183961 libwpilibc.dylib`std::__1::__function::__func<frc::SendableBuilderImpl::AddDoubleProperty(std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::function<double ()>, std::__1::function<void (double)>)::$_3::operator()(nt::NetworkTableEntry) const::'lambda'(nt::EntryNotification const&), std::__1::allocator<frc::SendableBuilderImpl::AddDoubleProperty(std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::function<double ()>, std::__1::function<void (double)>)::$_3::operator()(nt::NetworkTableEntry) const::'lambda'(nt::EntryNotification const&)>, void (nt::EntryNotification const&)>::destroy_deallocate() + 33
    frame #8: 0x0000000103e7eb6f libntcore.dylib`wpi::CallbackManager<nt::EntryNotifier, nt::impl::EntryNotifierThread>::Remove(unsigned int) + 111
    frame #9: 0x0000000106181b2b libwpilibc.dylib`frc::SendableBuilderImpl::Property::~Property() + 27
    frame #10: 0x0000000106181adc libwpilibc.dylib`frc::SendableBuilderImpl::~SendableBuilderImpl() + 220
    frame #11: 0x000000010618182e libwpilibc.dylib`frc::SendableBuilderImpl::~SendableBuilderImpl() + 14
    frame #12: 0x0000000103862912 libwpiutil.dylib`std::__1::unique_ptr<(anonymous namespace)::Component, std::__1::default_delete<(anonymous namespace)::Component> >::reset((anonymous namespace)::Component*) + 194
    frame #13: 0x000000010385f95e libwpiutil.dylib`std::__1::unique_ptr<(anonymous namespace)::SendableRegistryInst, std::__1::default_delete<(anonymous namespace)::SendableRegistryInst> >::~unique_ptr() + 110
    frame #14: 0x00007fff2086dd25 libsystem_c.dylib`__cxa_finalize_ranges + 316
    frame #15: 0x00007fff2086e010 libsystem_c.dylib`exit + 53
    frame #16: 0x00000001002c28f9 Python`Py_Exit + 30

@virtuald
Copy link
Member Author

Last night I tried adding a destructor to the trampoline class, and it.. sorta worked... in that I got a different error. I think the builder/NT needs to be forced to have weak references to the python functions somehow. Or the NT thread needs to be a python thread.

@virtuald virtuald marked this pull request as ready for review December 30, 2022 05:31
@virtuald
Copy link
Member Author

Clearing the SendableRegistry fixes the deadlock and the crashes. Not sure why I didn't think of that last year.

- initSendable needs to be passed a reference
- The SendableRegistry needs to be cleared on shutdown to ensure that ntcore doesn't call into python
- Add tests to make sure it continues to work
@virtuald virtuald merged commit 867c648 into main Dec 30, 2022
@virtuald virtuald deleted the fix-sendable branch December 30, 2022 06:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SmartDashboard.putData fails with custom Sendable subclasses
2 participants