Skip to content

Network flip flop #429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ottuzzi opened this issue Nov 11, 2013 · 23 comments
Closed

Network flip flop #429

ottuzzi opened this issue Nov 11, 2013 · 23 comments
Assignees

Comments

@ottuzzi
Copy link

ottuzzi commented Nov 11, 2013

Hi,

since when I upgraded the venerable 3.6.11+ I noted on all newer kernels a kind of network flip flop at boot. Something like this:

[   22.772034] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[   24.537515] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
[   25.279144] bcm2835-cpufreq: switching to governor ondemand
[   25.279174] bcm2835-cpufreq: switching to governor ondemand
[   25.408311] smsc95xx 1-1.1:1.0 eth0: link down
[   27.001218] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
[   27.833748] smsc95xx 1-1.1:1.0 eth0: link down
[   29.450200] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
[   29.647750] Adding 102396k swap on /var/swap.  Priority:-1 extents:1 across:102396k SS
[   30.266133] smsc95xx 1-1.1:1.0 eth0: link down
[   32.122655] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x4DE1
[   32.954562] smsc95xx 1-1.1:1.0 eth0: link down
[   34.571877] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x4DE1
[   35.386973] smsc95xx 1-1.1:1.0 eth0: link down
[   37.003071] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x4DE1
[   37.819097] smsc95xx 1-1.1:1.0 eth0: link down
[   39.453573] smsc95xx 1-1.1:1.0 eth0: link up, 10Mbps, full-duplex, lpa 0x4C61

Network goes up and down many times and then it just stabilizes: not a major issue... just curiosity on my side. Did anyone saw something similar? If I remember correctly on 3.6.11+ network did not switch back and forth so many times...

Thanks
Bye
Piero

@ghost ghost assigned P33M Nov 11, 2013
@ruuns
Copy link

ruuns commented Dec 15, 2013

I can confirm this issue on my device using arch arm distribution.

[root@alarmpi ~]# uname -a
Linux alarmpi 3.10.24-1-ARCH #1 PREEMPT Fri Dec 13 01:21:41 CST 2013 armv6l GNU/Linux

dmesg returns the following output at boot up:

[    8.834205] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[   10.401782] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[   46.150249] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[   46.254166] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[   47.812140] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1

@popcornmix
Copy link
Collaborator

Insufficient power supply would cause this behaviour:
http://elinux.org/R-Pi_Troubleshooting#Troubleshooting_power_problems

@ruuns
Copy link

ruuns commented Dec 15, 2013

I've checked my internal voltage between TP1 and TP2 during the booting process. It's constantly between 4.75V and 4.77V (with and without HDMI/Keyboard connections). It can be caused by my power supply (my power cable/supply is currently only a workaround).

I was also wondering at first because it seems this issue is only occuring at startup and later it stabilizes again (like it waits for all condensators until they are fully-charged). I'll try a more proper power supply.

@ottuzzi
Copy link
Author

ottuzzi commented Dec 28, 2013

Hi there, it looks like that in my case it was a faulty network connection, not a raspberry or kernel problem. So, if ruuns agrees we can close this issue.

Thanks for your work
Bye
Piero

@P33M P33M closed this as completed Dec 31, 2013
popcornmix pushed a commit that referenced this issue Jun 8, 2014
During the recent conversion of cgroup to kernfs, cgroup_tree_mutex
which nests above both the kernfs s_active protection and cgroup_mutex
is added to synchronize cgroup file type operations as cgroup_mutex
needed to be grabbed from some file operations and thus can't be put
above s_active protection.

While this arrangement mostly worked for cgroup, this triggered the
following lockdep warning.

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 Tainted: G        W
  -------------------------------------------------------
  trinity-c173/9024 is trying to acquire lock:
  (blkcg_pol_mutex){+.+.+.}, at: blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455)

  but task is already holding lock:
  (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283)

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (s_active#89){++++.+}:
  lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
  __kernfs_remove (arch/x86/include/asm/atomic.h:27 fs/kernfs/dir.c:352 fs/kernfs/dir.c:1024)
  kernfs_remove_by_name_ns (fs/kernfs/dir.c:1219)
  cgroup_addrm_files (include/linux/kernfs.h:427 kernel/cgroup.c:1074 kernel/cgroup.c:2899)
  cgroup_clear_dir (kernel/cgroup.c:1092 (discriminator 2))
  rebind_subsystems (kernel/cgroup.c:1144)
  cgroup_setup_root (kernel/cgroup.c:1568)
  cgroup_mount (kernel/cgroup.c:1716)
  mount_fs (fs/super.c:1094)
  vfs_kern_mount (fs/namespace.c:899)
  do_mount (fs/namespace.c:2238 fs/namespace.c:2561)
  SyS_mount (fs/namespace.c:2758 fs/namespace.c:2729)
  tracesys (arch/x86/kernel/entry_64.S:746)

  -> #1 (cgroup_tree_mutex){+.+.+.}:
  lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
  mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
  cgroup_add_cftypes (include/linux/list.h:76 kernel/cgroup.c:3040)
  blkcg_policy_register (block/blk-cgroup.c:1106)
  throtl_init (block/blk-throttle.c:1694)
  do_one_initcall (init/main.c:789)
  kernel_init_freeable (init/main.c:854 init/main.c:863 init/main.c:882 init/main.c:1003)
  kernel_init (init/main.c:935)
  ret_from_fork (arch/x86/kernel/entry_64.S:552)

  -> #0 (blkcg_pol_mutex){+.+.+.}:
  __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
  lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
  mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
  blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455)
  cgroup_file_write (kernel/cgroup.c:2714)
  kernfs_fop_write (fs/kernfs/file.c:295)
  vfs_write (fs/read_write.c:532)
  SyS_write (fs/read_write.c:584 fs/read_write.c:576)
  tracesys (arch/x86/kernel/entry_64.S:746)

  other info that might help us debug this:

  Chain exists of:
  blkcg_pol_mutex --> cgroup_tree_mutex --> s_active#89

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(s_active#89);
				 lock(cgroup_tree_mutex);
				 lock(s_active#89);
    lock(blkcg_pol_mutex);

   *** DEADLOCK ***

  4 locks held by trinity-c173/9024:
  #0: (&f->f_pos_lock){+.+.+.}, at: __fdget_pos (fs/file.c:714)
  #1: (sb_writers#18){.+.+.+}, at: vfs_write (include/linux/fs.h:2255 fs/read_write.c:530)
  #2: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:283)
  #3: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283)

  stack backtrace:
  CPU: 3 PID: 9024 Comm: trinity-c173 Tainted: G        W     3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429
   ffffffff919687b0 ffff8805f6373bb8 ffffffff8e52cdbb 0000000000000002
   ffffffff919d8400 ffff8805f6373c08 ffffffff8e51fb88 0000000000000004
   ffff8805f6373c98 ffff8805f6373c08 ffff88061be70d98 ffff88061be70dd0
  Call Trace:
  dump_stack (lib/dump_stack.c:52)
  print_circular_bug (kernel/locking/lockdep.c:1216)
  __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
  lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
  mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
  blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455)
  cgroup_file_write (kernel/cgroup.c:2714)
  kernfs_fop_write (fs/kernfs/file.c:295)
  vfs_write (fs/read_write.c:532)
  SyS_write (fs/read_write.c:584 fs/read_write.c:576)

This is a highly unlikely but valid circular dependency between "echo
1 > blkcg.reset_stats" and cfq module [un]loading.  cgroup is going
through further locking update which will remove this complication but
for now let's use trylock on blkcg_pol_mutex and retry the file
operation if the trylock fails.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Sasha Levin <[email protected]>
References: http://lkml.kernel.org/g/[email protected]
popcornmix pushed a commit that referenced this issue Dec 5, 2016
Andrey reported the following while fuzzing the kernel with syzkaller:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800666d4200 task.stack: ffff880067348000
RIP: 0010:[<ffffffff833617ec>]  [<ffffffff833617ec>]
icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
RSP: 0018:ffff88006734f2c0  EFLAGS: 00010206
RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
FS:  00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
Stack:
 ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
 ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
 ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
Call Trace:
 [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
 [<     inline     >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
 [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
 [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
 [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
 ...

icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
cases the dst->dev should be preferred for determining the L3 domain
if the dst has been set on the skb. Fallback to the skb->dev if it has
not. This covers the case reported here where icmp6_send is invoked on
Rx before the route lookup.

Fixes: 5d41ce2 ("net: icmp6_send should use dst dev to determine L3 domain")
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
@onzulinapps
Copy link

Hello, I have same problem, smsc95xx 1-1.1:1.0 eth0 : hardware isn't capable of remote wakeup
IPv6: eth0 link is not ready
how could I resolve my problem? regards

@shelteroperations
Copy link

Same problem since Maybe January (Rpi 3, Raspbian):

smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup

Ideas???

@pelwell
Copy link
Contributor

pelwell commented Jun 4, 2018

If that's the only error message then there isn't a problem - it's just a statement of fact. Is your Pi3 malfunctioning in some way?

@shelteroperations
Copy link

shelteroperations commented Jun 4, 2018

It dies every few days. It's remote so "dying" might not be "dying" but that I can't get into it without someone hard rebooting, then.... /var/log/messages shows this as the last message before I couldn't get in. I just installed ifplugd then ran sudo update-rc.d ifplugd enable. I don't know, maybe a bandaid, but we'll see.

@pelwell
Copy link
Contributor

pelwell commented Jun 4, 2018

I can't check right now, but I think that message will be displayed every time the interface is brought up, so seeing it repeatedly in the log could be a symptom of some sort of network flakiness.

@shelteroperations
Copy link

Seems slow to login...... latest messages in /var/log/messages:

Jun 4 17:09:35 mana31 kernel: [ 23.726264] random: 7 urandom warning(s) missed due to ratelimiting
Jun 4 17:09:37 mana31 kernel: [ 25.914791] r820t 4-001a: destroying instance
Jun 4 17:09:37 mana31 kernel: [ 25.916363] dvb_usb_v2: 'Realtek RTL2832U reference design:1-1.4' successfully deinitialized and disconnected
Jun 4 17:09:47 mana31 kernel: [ 35.813546] r820t 6-001a: destroying instance
Jun 4 17:09:47 mana31 kernel: [ 35.814084] dvb_usb_v2: 'Realtek RTL2832U reference design:1-1.5' successfully deinitialized and disconnected
Jun 4 17:16:08 mana31 kernel: [ 416.443273] device eth0 entered promiscuous mode
Jun 4 17:21:25 mana31 kernel: [ 733.407713] device eth0 left promiscuous mode
Jun 4 18:00:10 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:00:40 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:01:18 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:01:48 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:05:15 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:05:45 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:07:03 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:07:33 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:09:01 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:09:31 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:10:15 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:10:45 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:10:45 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:11:15 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:11:17 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:11:47 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:15:25 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:15:55 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:16:01 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:16:31 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:17:01 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:18:01 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:18:04 mana31 rsyslogd-2007: action 'action 17' suspended, next retry is Mon Jun 4 18:19:04 2018 [try http://www.rsyslog.com/e/2007 ]
Jun 4 18:18:12 mana31 kernel: [ 4140.721808] device eth0 entered promiscuous mode
Jun 4 18:18:22 mana31 kernel: [ 4150.665603] device eth0 left promiscuous mode

@JamesH65
Copy link
Contributor

JamesH65 commented Jun 5, 2018

If you remove the DVB-T adapter, does it work correctly?

@shelteroperations
Copy link

There is no HDMI plugged in or any video if that's what you're asking. It runs text-only.

@JamesH65
Copy link
Contributor

JamesH65 commented Jun 5, 2018

The log you posted has a reference to a 'Realtek RTL2832U reference design:1-1.4. That's a DVB-T adaptor. Is it plugged in? Or is it an artefact of a particular kernel build? I'm not sure why its appearing in the log.

@shelteroperations
Copy link

Woops I read that quickly as the Ethernet adapter. I have two RF (radio frequency) USB sticks plugged in. But they've been plugged in for over a year, no problems at all, and used for service. I don't know why it keeps crashing.

@JamesH65
Copy link
Contributor

JamesH65 commented Jun 5, 2018

It would be worth removing them and seeing if the problem goes away. Could be a interaction between them. We've had similar before between Wireless and onboard ethernet - should've be independent. Turns out they were not under some very specific circumstances.

@shelteroperations
Copy link

Would sort of defeat the purpose of the Rpi there then. And as I mentioned no problems in over a year. Unless it's hardware failure. Would it help to install a USB hub so all those signals weren't "right next to" the ethernet port?

@shelteroperations
Copy link

I also uninstalled a bunch of packages recently I read in forums or wherever that were unnecessary and were just eating memory. I don't have a list of those, but is there any chance I uninstalled something that I needed? Is there a message that could help us understand if this were the case?

@JamesH65
Copy link
Contributor

JamesH65 commented Jun 5, 2018

When diagnosing a problem, you reduce the problem space until the problem goes away. We are simply trying to narrow down what the issue might be, and removing possible causing of the problem helps with that.

You might have uninstalled something important, impossible to tell. You need to start afresh with a new SD card, and also try with unusual peripherals removed. Add then add things back until the problem reappears.

@shelteroperations
Copy link

This is a true statement. As I noted though it's a remote machine we're using for something. I can try that if nothing else works next time I'm there. It hasn't crashed yet after I installed the ifplugd I mentioned above but it's only been a day. Still even if that's the case it sounds like a bandaid. As you initially noted it could also be the ethernet switch it's plugged into. I can't test any now remotely. I was just wondering if anything glaring jumped out at you, software related. Thank you for your kind help.

@pelwell
Copy link
Contributor

pelwell commented Jun 6, 2018

It seems you can safely ignore (or even suppress) the "action 17" messages: https://raspberrypi.stackexchange.com/questions/47781/what-is-action-17

@shelteroperations
Copy link

Thanks.

@iiv3
Copy link
Contributor

iiv3 commented Dec 15, 2018

After upgrading to "testing" I got a serious problem where rpi would obtain IP through DHCP, but then bring Ethernet down, sometimes multiple times and then it would not respond, even to disconnecting and reconnecting the lan cable.

The logs also showed a bunch of "hardware isn't capable of remote wakeup".

After numerous tries I solved the problem with removing "avahi-daemon".

I do suspect that the above message is related to the lan driver not been able to handle multicast properly and doing something wrong with the lan card.

So, if you have similar problem, avoid everything that might be using multicast.

Just to be clear, multicast is when networks uses groups with IP addresses from 224.0.0.0 to 239.255.255.255. Other than "avahi", it could be used by "upnp" and even "ntp" stuff.

@lamazze
Copy link

lamazze commented Sep 21, 2021

After upgrading my raspberry 3B from Jessie to Buster, I started experiencing "wlan0: carrier lost" problems once a day .
I've tried different solutions but stopping avahi service apparently solved the problem. Thanks for the suggestion!
Edit: finally disabling wicd, avachi-daemon and networking solved all the wifi connection problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants