ethernet dropped: smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110 #552

davebiffuk · 2014-03-27T10:17:53Z

I've recently been using BRANCH=next firmware to successfully avoid "smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0", see #470

There are still occasional uvcvideo log messages due to the USB webcam, but they don't seem to affect either the webcam or the ethernet.

However I'm still seeing the Pi spontaneously lose its (wired) ethernet connection after some time - 5 days uptime in this case. It's running headless so needs powercycling to get connected again.

kern.log says:

Mar 25 18:55:02 davepi kernel: [414059.802294] uvcvideo: Non-zero status (-5) in video completion handler.
Mar 26 03:02:01 davepi kernel: [443279.877454] uvcvideo: Failed to resubmit video URB (-1).
Mar 26 20:22:07 davepi kernel: [505687.338164] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Mar 26 20:22:07 davepi kernel: [505687.338197] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Mar 26 20:22:07 davepi kernel: [505687.338213] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Mar 26 20:22:12 davepi kernel: [505692.338323] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Mar 26 20:22:12 davepi kernel: [505692.338355] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Mar 26 20:22:12 davepi kernel: [505692.338371] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Mar 26 20:22:18 davepi kernel: [505698.438523] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Mar 26 20:22:18 davepi kernel: [505698.438573] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Mar 26 20:22:18 davepi kernel: [505698.438593] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Mar 26 20:22:23 davepi kernel: [505703.438701] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Mar 26 20:22:23 davepi kernel: [505703.438732] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Mar 26 20:22:23 davepi kernel: [505703.438749] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Mar 26 20:22:23 davepi kernel: [505703.609811] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Mar 26 20:22:28 davepi kernel: [505708.678878] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:34 davepi kernel: [505714.679099] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:40 davepi kernel: [505720.679296] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:46 davepi kernel: [505726.679523] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:52 davepi kernel: [505732.679731] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:58 davepi kernel: [505738.679948] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:23:04 davepi kernel: [505744.680144] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110

Config: the Pi is a Model B Rev 2, 512MB RAM. There's a USB webcam attached via a powered hub. There are 3 one-wire temperature sensors and a 16x2 LCD attached by GPIO.

cmdline.txt is:

dwc_otg.fiq_enable=1 dwc_otg.fiq_fsm_enable=1 dwc_otg.fiq_fsm_mask=0x3 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait

The text was updated successfully, but these errors were encountered:

P33M · 2014-03-27T10:38:57Z

Thanks for the report. I have had a Pi do this on my desk this morning when just left overnight with nothing plugged in (compiling a kernel).

P33M · 2014-04-04T18:38:14Z

I note that you're using 0x3 as the mask: 0x7 would add FIQ acceleration for your webcam which should get rid of the error messages.

This problem is proving difficult to track down. I've been running an IRQ latency fuzz tester for all of today with multiple devices plugged in - no results.

These messages are usually only generated if there are no available host channels to do transfers with (or at least, the driver thinks it doesn't have any free) or the HCD no longer has any interrupts to drive things forward.

I think at this point I need to build a kernel that prints out necessary debug as and when things fall over, then send it to you.

P33M · 2014-04-06T11:50:55Z

40 hours later, the fuzz tester seems to have done the trick (or not, as the case may be).

Now to find out what is to blame... there's 8k interrupts per second which means SOF is still getting through.

P33M · 2014-04-06T20:42:12Z

@davebiffuk

Can you try this set of kernel+modules?

It's built from my fiq_fsm_test branch which has a hack in place to make the dwc_otg driver become much noisier if things break.

https://www.dropbox.com/s/0jmtfic91sw834l/fuzz-modules.tgz
https://www.dropbox.com/s/cr8mvsajxzyw3kw/kernel.img

kernel.img goes in /boot and overwrites the existing kernel.img (you may want to back it up first). To extract the modules do sudo tar xvf fuzz-modules.tgz -C/. I recommend leaving settings the way they are: the bug appears to be independent of the fiq_fsm_mask parameter.

davebiffuk · 2014-04-06T22:25:09Z

Thanks @P33M - your perseverance is much appreciated!

I've got that kernel+modules in place now. I had switched to mask=0x7 but it hung earlier today with that (fits what you said) - so I've gone back to mask=0x3.

I'll let you know as soon as there's anything to report... thanks again.

P33M · 2014-04-07T20:31:05Z

Woot. One of my Pis just fell off the radar.

Best part is, I have the whole log.

There's a clue here: I have multiple device reset messages before it falls over.

[14077.002225] fuzz_irqs: probe
[34002.541113] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[35260.616502] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[70276.888230] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[76605.088348] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[79126.190249] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[86617.483156] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[96625.214760] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
[96625.214819] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[96625.214842] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[96625.214907] dwc_otg: AHC=0 gintmsk_saved=0xffffffff haintmsk_saved=0x0000ffff kick_np=1 next_sched_frame=-1
[96625.214907] cur_frame=13804 haintmsk=0x00000004 gintmsk=0xf200080e haint=0x000000fb

And a smoking gun: HAINT has stale entries.

P33M · 2014-04-08T19:39:50Z

The root cause is bit errors.

It took 48 hours for me to replicate the problem with continuous bus use because that's how long it takes to accumulate 7 bit errors. Interesting to know that my Ebay special hub and Ebay special USB pendrive have a bit error rate of 9x10^-13.

The HCD will unmask various channel interrupts if the previous transaction to the same endpoint failed for some reason. My FSM FIQ relies on a single interrupt firing (channel halt) so that it can pass only one interrupt to the HCD.

Setting the error flag to 1 for all transfers results in USB breaking before the Pi even reaches login, so this is the root cause. Patch inbound.

davebiffuk · 2014-04-08T20:00:05Z

Great find! Thanks for the explanation. Can I buy you a drink? 🍺

Does that also cover the situation when this occurs with nothing plugged in? (no endpoints => no interrupts? I know little about USB internals)

P33M · 2014-04-15T20:24:37Z

BRANCH=next rpi-update firmware should contain a fix for this issue.

Bit errors can potentially happen during any traffic on the bus - as a function of power supply quality or ambient electrical noise. There is a mechanism for retrying failed transfers - the retry mechanism in this case is what caused host channels to never get acknowledged when the transfer completed.

davebiffuk · 2014-04-15T21:58:48Z

Thanks, my Pi is running with the updated version now.

Electrical noise sounds likely - the Pi is monitoring my tortoise enclosure, which is lit and heated with a mercury vapour lamp.

P33M · 2014-04-16T21:56:51Z

As an aside, I spent much of this morning reading about the beneficial properties of mercury arc vapour lamps when used in vivariums.

Fascinating.

davebiffuk · 2014-04-22T08:09:09Z

I'm not sure I'll go with mercury vapour next time around, this sort of bulb makes it hard to control the heat and UV levels separately. UV strip + incandescent for heat seems to be the way to go.

Anyway; bad news I'm afraid, it happened again. Uptime was about 5 days. The messages look the same as with previous occurrences (pasted below).

Can I just check that this is the kernel/firmware with the fix, to be sure I'm running a good one?

3.12.17+ #670

root@davepi:~# vcgencmd version
Apr 15 2014 18:34:07 
Copyright (c) 2012 Broadcom
version c1dc12512d5fe9340b6a2ca9badf9441fe6675a8 (tainted) (release)


Apr 21 07:35:02 davepi kernel: [466729.069993] uvcvideo: Non-zero status (-5) in video completion handler.
Apr 21 11:15:07 davepi kernel: [479934.470096] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Apr 21 11:15:07 davepi kernel: [479934.470127] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Apr 21 11:15:07 davepi kernel: [479934.470164] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Apr 21 11:15:12 davepi kernel: [479939.470156] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Apr 21 11:15:12 davepi kernel: [479939.470206] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Apr 21 11:15:12 davepi kernel: [479939.470224] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Apr 21 11:15:18 davepi kernel: [479945.570406] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Apr 21 11:15:18 davepi kernel: [479945.570460] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Apr 21 11:15:18 davepi kernel: [479945.570481] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Apr 21 11:15:23 davepi kernel: [479950.570539] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Apr 21 11:15:23 davepi kernel: [479950.570592] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Apr 21 11:15:23 davepi kernel: [479950.570614] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Apr 21 11:15:23 davepi kernel: [479950.743220] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Apr 21 11:15:28 davepi kernel: [479955.810777] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:34 davepi kernel: [479961.810969] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:40 davepi kernel: [479967.811151] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:46 davepi kernel: [479973.811351] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:52 davepi kernel: [479979.811558] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:58 davepi kernel: [479985.811752] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:16:04 davepi kernel: [479991.811985] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:16:07 davepi kernel: [479994.482053] uvcvideo: Failed to set UVC probe control : -110 (exp. 26).

P33M · 2014-04-22T12:49:43Z

Can you post the output of vmstat 2 on the Pi in the bad state?

davebiffuk · 2014-04-22T13:16:40Z

It's headless, so I've set it to capture the output of vmstat 2 5 once a minute, which will be retrievable after a recurrence. Let me know if that's not good enough.

P33M · 2014-04-22T13:27:45Z

That should suffice.

P33M · 2014-05-05T13:14:12Z

There was one other case that could cause this, which should now be fixed. Update the issue if it's still broken.

cjgriscom · 2019-02-23T18:51:09Z

@P33M I realize it's been a long time, but I have been discussing on the linux-usb mailing list a similar problem with the Chipidea USB driver. Would you be able to let me know which commits on this tree fixed the dwc_otg issue?

popcornmix · 2019-02-25T15:38:43Z

@cjgriscom Looking through the git commits, I believe 281627f is the fix referenced here.

Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX reads guest memory. Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN via sync_regs(), which already holds SRCU. I.e. trying to precisely use kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause problems. Acquiring SRCU isn't all that expensive, so for simplicity, grab it unconditionally for KVM_SET_VCPU_EVENTS. ============================= WARNING: suspicious RCU usage 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted ----------------------------- include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by repro/1071: #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm] stack backtrace: CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x7f/0x90 lockdep_rcu_suspicious+0x13f/0x1a0 kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm] kvm_vcpu_read_guest+0x3e/0x90 [kvm] nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel] load_vmcs12_host_state+0x432/0xb40 [kvm_intel] vmx_leave_nested+0x30/0x40 [kvm_intel] kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm] kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm] ? mark_held_locks+0x49/0x70 ? kvm_vcpu_ioctl+0x7d/0x970 [kvm] ? kvm_vcpu_ioctl+0x497/0x970 [kvm] kvm_vcpu_ioctl+0x497/0x970 [kvm] ? lock_acquire+0xba/0x2d0 ? find_held_lock+0x2b/0x80 ? do_user_addr_fault+0x40c/0x6f0 ? lock_release+0xb7/0x270 __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x6c/0x170 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7ff11eb1b539 </TASK> Fixes: f7e5707 ("KVM: x86: Forcibly leave nested virt when SMM state is toggled") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>

commit 4bcdd83 upstream. Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX reads guest memory. Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN via sync_regs(), which already holds SRCU. I.e. trying to precisely use kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause problems. Acquiring SRCU isn't all that expensive, so for simplicity, grab it unconditionally for KVM_SET_VCPU_EVENTS. ============================= WARNING: suspicious RCU usage 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted ----------------------------- include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by repro/1071: #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm] stack backtrace: CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x7f/0x90 lockdep_rcu_suspicious+0x13f/0x1a0 kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm] kvm_vcpu_read_guest+0x3e/0x90 [kvm] nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel] load_vmcs12_host_state+0x432/0xb40 [kvm_intel] vmx_leave_nested+0x30/0x40 [kvm_intel] kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm] kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm] ? mark_held_locks+0x49/0x70 ? kvm_vcpu_ioctl+0x7d/0x970 [kvm] ? kvm_vcpu_ioctl+0x497/0x970 [kvm] kvm_vcpu_ioctl+0x497/0x970 [kvm] ? lock_acquire+0xba/0x2d0 ? find_held_lock+0x2b/0x80 ? do_user_addr_fault+0x40c/0x6f0 ? lock_release+0xb7/0x270 __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x6c/0x170 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7ff11eb1b539 </TASK> Fixes: f7e5707 ("KVM: x86: Forcibly leave nested virt when SMM state is toggled") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

P33M closed this as completed May 5, 2014

davidjb mentioned this issue Dec 25, 2014

Ethernet drops: smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -71 #746

Closed

GeorgeSapkin mentioned this issue Jul 19, 2019

smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110 #3088

Open

ethernet dropped: smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110 #552

ethernet dropped: smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110 #552

Comments

davebiffuk commented Mar 27, 2014

P33M commented Mar 27, 2014

Uh oh!

P33M commented Apr 4, 2014

Uh oh!

P33M commented Apr 6, 2014

Uh oh!

P33M commented Apr 6, 2014

Uh oh!

davebiffuk commented Apr 6, 2014

Uh oh!

P33M commented Apr 7, 2014

Uh oh!

P33M commented Apr 8, 2014

Uh oh!

davebiffuk commented Apr 8, 2014

Uh oh!

P33M commented Apr 15, 2014

Uh oh!

davebiffuk commented Apr 15, 2014

Uh oh!

P33M commented Apr 16, 2014

Uh oh!

davebiffuk commented Apr 22, 2014

Uh oh!

P33M commented Apr 22, 2014

Uh oh!

davebiffuk commented Apr 22, 2014

Uh oh!

P33M commented Apr 22, 2014

Uh oh!

P33M commented May 5, 2014

Uh oh!

cjgriscom commented Feb 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

popcornmix commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjgriscom commented Feb 23, 2019 •

edited

Loading

popcornmix commented Feb 25, 2019 •

edited

Loading