-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Network flip flop #429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can confirm this issue on my device using arch arm distribution.
dmesg returns the following output at boot up:
|
Insufficient power supply would cause this behaviour: |
I've checked my internal voltage between TP1 and TP2 during the booting process. It's constantly between 4.75V and 4.77V (with and without HDMI/Keyboard connections). It can be caused by my power supply (my power cable/supply is currently only a workaround). I was also wondering at first because it seems this issue is only occuring at startup and later it stabilizes again (like it waits for all condensators until they are fully-charged). I'll try a more proper power supply. |
Hi there, it looks like that in my case it was a faulty network connection, not a raspberry or kernel problem. So, if ruuns agrees we can close this issue. Thanks for your work |
During the recent conversion of cgroup to kernfs, cgroup_tree_mutex which nests above both the kernfs s_active protection and cgroup_mutex is added to synchronize cgroup file type operations as cgroup_mutex needed to be grabbed from some file operations and thus can't be put above s_active protection. While this arrangement mostly worked for cgroup, this triggered the following lockdep warning. ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 Tainted: G W ------------------------------------------------------- trinity-c173/9024 is trying to acquire lock: (blkcg_pol_mutex){+.+.+.}, at: blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) but task is already holding lock: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283) which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (s_active#89){++++.+}: lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) __kernfs_remove (arch/x86/include/asm/atomic.h:27 fs/kernfs/dir.c:352 fs/kernfs/dir.c:1024) kernfs_remove_by_name_ns (fs/kernfs/dir.c:1219) cgroup_addrm_files (include/linux/kernfs.h:427 kernel/cgroup.c:1074 kernel/cgroup.c:2899) cgroup_clear_dir (kernel/cgroup.c:1092 (discriminator 2)) rebind_subsystems (kernel/cgroup.c:1144) cgroup_setup_root (kernel/cgroup.c:1568) cgroup_mount (kernel/cgroup.c:1716) mount_fs (fs/super.c:1094) vfs_kern_mount (fs/namespace.c:899) do_mount (fs/namespace.c:2238 fs/namespace.c:2561) SyS_mount (fs/namespace.c:2758 fs/namespace.c:2729) tracesys (arch/x86/kernel/entry_64.S:746) -> #1 (cgroup_tree_mutex){+.+.+.}: lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) cgroup_add_cftypes (include/linux/list.h:76 kernel/cgroup.c:3040) blkcg_policy_register (block/blk-cgroup.c:1106) throtl_init (block/blk-throttle.c:1694) do_one_initcall (init/main.c:789) kernel_init_freeable (init/main.c:854 init/main.c:863 init/main.c:882 init/main.c:1003) kernel_init (init/main.c:935) ret_from_fork (arch/x86/kernel/entry_64.S:552) -> #0 (blkcg_pol_mutex){+.+.+.}: __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) cgroup_file_write (kernel/cgroup.c:2714) kernfs_fop_write (fs/kernfs/file.c:295) vfs_write (fs/read_write.c:532) SyS_write (fs/read_write.c:584 fs/read_write.c:576) tracesys (arch/x86/kernel/entry_64.S:746) other info that might help us debug this: Chain exists of: blkcg_pol_mutex --> cgroup_tree_mutex --> s_active#89 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#89); lock(cgroup_tree_mutex); lock(s_active#89); lock(blkcg_pol_mutex); *** DEADLOCK *** 4 locks held by trinity-c173/9024: #0: (&f->f_pos_lock){+.+.+.}, at: __fdget_pos (fs/file.c:714) #1: (sb_writers#18){.+.+.+}, at: vfs_write (include/linux/fs.h:2255 fs/read_write.c:530) #2: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:283) #3: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283) stack backtrace: CPU: 3 PID: 9024 Comm: trinity-c173 Tainted: G W 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 ffffffff919687b0 ffff8805f6373bb8 ffffffff8e52cdbb 0000000000000002 ffffffff919d8400 ffff8805f6373c08 ffffffff8e51fb88 0000000000000004 ffff8805f6373c98 ffff8805f6373c08 ffff88061be70d98 ffff88061be70dd0 Call Trace: dump_stack (lib/dump_stack.c:52) print_circular_bug (kernel/locking/lockdep.c:1216) __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) cgroup_file_write (kernel/cgroup.c:2714) kernfs_fop_write (fs/kernfs/file.c:295) vfs_write (fs/read_write.c:532) SyS_write (fs/read_write.c:584 fs/read_write.c:576) This is a highly unlikely but valid circular dependency between "echo 1 > blkcg.reset_stats" and cfq module [un]loading. cgroup is going through further locking update which will remove this complication but for now let's use trylock on blkcg_pol_mutex and retry the file operation if the trylock fails. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Sasha Levin <[email protected]> References: http://lkml.kernel.org/g/[email protected]
Andrey reported the following while fuzzing the kernel with syzkaller: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff8800666d4200 task.stack: ffff880067348000 RIP: 0010:[<ffffffff833617ec>] [<ffffffff833617ec>] icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451 RSP: 0018:ffff88006734f2c0 EFLAGS: 00010206 RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018 RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003 R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000 R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0 FS: 00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0 Stack: ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460 ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046 ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000 Call Trace: [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557 [< inline >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88 [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157 [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663 [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191 ... icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both cases the dst->dev should be preferred for determining the L3 domain if the dst has been set on the skb. Fallback to the skb->dev if it has not. This covers the case reported here where icmp6_send is invoked on Rx before the route lookup. Fixes: 5d41ce2 ("net: icmp6_send should use dst dev to determine L3 domain") Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Hello, I have same problem, smsc95xx 1-1.1:1.0 eth0 : hardware isn't capable of remote wakeup |
Same problem since Maybe January (Rpi 3, Raspbian): smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup Ideas??? |
If that's the only error message then there isn't a problem - it's just a statement of fact. Is your Pi3 malfunctioning in some way? |
It dies every few days. It's remote so "dying" might not be "dying" but that I can't get into it without someone hard rebooting, then.... /var/log/messages shows this as the last message before I couldn't get in. I just installed ifplugd then ran sudo update-rc.d ifplugd enable. I don't know, maybe a bandaid, but we'll see. |
I can't check right now, but I think that message will be displayed every time the interface is brought up, so seeing it repeatedly in the log could be a symptom of some sort of network flakiness. |
Seems slow to login...... latest messages in /var/log/messages: Jun 4 17:09:35 mana31 kernel: [ 23.726264] random: 7 urandom warning(s) missed due to ratelimiting |
If you remove the DVB-T adapter, does it work correctly? |
There is no HDMI plugged in or any video if that's what you're asking. It runs text-only. |
The log you posted has a reference to a 'Realtek RTL2832U reference design:1-1.4. That's a DVB-T adaptor. Is it plugged in? Or is it an artefact of a particular kernel build? I'm not sure why its appearing in the log. |
Woops I read that quickly as the Ethernet adapter. I have two RF (radio frequency) USB sticks plugged in. But they've been plugged in for over a year, no problems at all, and used for service. I don't know why it keeps crashing. |
It would be worth removing them and seeing if the problem goes away. Could be a interaction between them. We've had similar before between Wireless and onboard ethernet - should've be independent. Turns out they were not under some very specific circumstances. |
Would sort of defeat the purpose of the Rpi there then. And as I mentioned no problems in over a year. Unless it's hardware failure. Would it help to install a USB hub so all those signals weren't "right next to" the ethernet port? |
I also uninstalled a bunch of packages recently I read in forums or wherever that were unnecessary and were just eating memory. I don't have a list of those, but is there any chance I uninstalled something that I needed? Is there a message that could help us understand if this were the case? |
When diagnosing a problem, you reduce the problem space until the problem goes away. We are simply trying to narrow down what the issue might be, and removing possible causing of the problem helps with that. You might have uninstalled something important, impossible to tell. You need to start afresh with a new SD card, and also try with unusual peripherals removed. Add then add things back until the problem reappears. |
This is a true statement. As I noted though it's a remote machine we're using for something. I can try that if nothing else works next time I'm there. It hasn't crashed yet after I installed the ifplugd I mentioned above but it's only been a day. Still even if that's the case it sounds like a bandaid. As you initially noted it could also be the ethernet switch it's plugged into. I can't test any now remotely. I was just wondering if anything glaring jumped out at you, software related. Thank you for your kind help. |
It seems you can safely ignore (or even suppress) the "action 17" messages: https://raspberrypi.stackexchange.com/questions/47781/what-is-action-17 |
Thanks. |
After upgrading to "testing" I got a serious problem where rpi would obtain IP through DHCP, but then bring Ethernet down, sometimes multiple times and then it would not respond, even to disconnecting and reconnecting the lan cable. The logs also showed a bunch of "hardware isn't capable of remote wakeup". After numerous tries I solved the problem with removing "avahi-daemon". I do suspect that the above message is related to the lan driver not been able to handle multicast properly and doing something wrong with the lan card. So, if you have similar problem, avoid everything that might be using multicast. Just to be clear, multicast is when networks uses groups with IP addresses from 224.0.0.0 to 239.255.255.255. Other than "avahi", it could be used by "upnp" and even "ntp" stuff. |
After upgrading my raspberry 3B from Jessie to Buster, I started experiencing "wlan0: carrier lost" problems once a day . |
Hi,
since when I upgraded the venerable 3.6.11+ I noted on all newer kernels a kind of network flip flop at boot. Something like this:
Network goes up and down many times and then it just stabilizes: not a major issue... just curiosity on my side. Did anyone saw something similar? If I remember correctly on 3.6.11+ network did not switch back and forth so many times...
Thanks
Bye
Piero
The text was updated successfully, but these errors were encountered: