Skip to content

ethernet dropped: smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110 #552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davebiffuk opened this issue Mar 27, 2014 · 18 comments

Comments

@davebiffuk
Copy link

I've recently been using BRANCH=next firmware to successfully avoid "smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0", see #470

There are still occasional uvcvideo log messages due to the USB webcam, but they don't seem to affect either the webcam or the ethernet.

However I'm still seeing the Pi spontaneously lose its (wired) ethernet connection after some time - 5 days uptime in this case. It's running headless so needs powercycling to get connected again.

kern.log says:

Mar 25 18:55:02 davepi kernel: [414059.802294] uvcvideo: Non-zero status (-5) in video completion handler.
Mar 26 03:02:01 davepi kernel: [443279.877454] uvcvideo: Failed to resubmit video URB (-1).
Mar 26 20:22:07 davepi kernel: [505687.338164] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Mar 26 20:22:07 davepi kernel: [505687.338197] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Mar 26 20:22:07 davepi kernel: [505687.338213] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Mar 26 20:22:12 davepi kernel: [505692.338323] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Mar 26 20:22:12 davepi kernel: [505692.338355] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Mar 26 20:22:12 davepi kernel: [505692.338371] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Mar 26 20:22:18 davepi kernel: [505698.438523] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Mar 26 20:22:18 davepi kernel: [505698.438573] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Mar 26 20:22:18 davepi kernel: [505698.438593] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Mar 26 20:22:23 davepi kernel: [505703.438701] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Mar 26 20:22:23 davepi kernel: [505703.438732] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Mar 26 20:22:23 davepi kernel: [505703.438749] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Mar 26 20:22:23 davepi kernel: [505703.609811] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Mar 26 20:22:28 davepi kernel: [505708.678878] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:34 davepi kernel: [505714.679099] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:40 davepi kernel: [505720.679296] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:46 davepi kernel: [505726.679523] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:52 davepi kernel: [505732.679731] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:22:58 davepi kernel: [505738.679948] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Mar 26 20:23:04 davepi kernel: [505744.680144] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110

Config: the Pi is a Model B Rev 2, 512MB RAM. There's a USB webcam attached via a powered hub. There are 3 one-wire temperature sensors and a 16x2 LCD attached by GPIO.

cmdline.txt is:

dwc_otg.fiq_enable=1 dwc_otg.fiq_fsm_enable=1 dwc_otg.fiq_fsm_mask=0x3 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait
@P33M
Copy link
Contributor

P33M commented Mar 27, 2014

Thanks for the report. I have had a Pi do this on my desk this morning when just left overnight with nothing plugged in (compiling a kernel).

@P33M
Copy link
Contributor

P33M commented Apr 4, 2014

I note that you're using 0x3 as the mask: 0x7 would add FIQ acceleration for your webcam which should get rid of the error messages.

This problem is proving difficult to track down. I've been running an IRQ latency fuzz tester for all of today with multiple devices plugged in - no results.

These messages are usually only generated if there are no available host channels to do transfers with (or at least, the driver thinks it doesn't have any free) or the HCD no longer has any interrupts to drive things forward.

I think at this point I need to build a kernel that prints out necessary debug as and when things fall over, then send it to you.

@P33M
Copy link
Contributor

P33M commented Apr 6, 2014

40 hours later, the fuzz tester seems to have done the trick (or not, as the case may be).

Now to find out what is to blame... there's 8k interrupts per second which means SOF is still getting through.

@P33M
Copy link
Contributor

P33M commented Apr 6, 2014

@davebiffuk

Can you try this set of kernel+modules?

It's built from my fiq_fsm_test branch which has a hack in place to make the dwc_otg driver become much noisier if things break.

https://www.dropbox.com/s/0jmtfic91sw834l/fuzz-modules.tgz
https://www.dropbox.com/s/cr8mvsajxzyw3kw/kernel.img

kernel.img goes in /boot and overwrites the existing kernel.img (you may want to back it up first). To extract the modules do sudo tar xvf fuzz-modules.tgz -C/. I recommend leaving settings the way they are: the bug appears to be independent of the fiq_fsm_mask parameter.

@davebiffuk
Copy link
Author

Thanks @P33M - your perseverance is much appreciated!

I've got that kernel+modules in place now. I had switched to mask=0x7 but it hung earlier today with that (fits what you said) - so I've gone back to mask=0x3.

I'll let you know as soon as there's anything to report... thanks again.

@P33M
Copy link
Contributor

P33M commented Apr 7, 2014

Woot. One of my Pis just fell off the radar.

Best part is, I have the whole log.

There's a clue here: I have multiple device reset messages before it falls over.

[14077.002225] fuzz_irqs: probe
[34002.541113] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[35260.616502] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[70276.888230] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[76605.088348] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[79126.190249] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[86617.483156] usb 1-1.3.6: reset high-speed USB device number 7 using dwc_otg
[96625.214760] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
[96625.214819] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[96625.214842] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[96625.214907] dwc_otg: AHC=0 gintmsk_saved=0xffffffff haintmsk_saved=0x0000ffff kick_np=1 next_sched_frame=-1
[96625.214907] cur_frame=13804 haintmsk=0x00000004 gintmsk=0xf200080e haint=0x000000fb

And a smoking gun: HAINT has stale entries.

@P33M
Copy link
Contributor

P33M commented Apr 8, 2014

The root cause is bit errors.

It took 48 hours for me to replicate the problem with continuous bus use because that's how long it takes to accumulate 7 bit errors. Interesting to know that my Ebay special hub and Ebay special USB pendrive have a bit error rate of 9x10^-13.

The HCD will unmask various channel interrupts if the previous transaction to the same endpoint failed for some reason. My FSM FIQ relies on a single interrupt firing (channel halt) so that it can pass only one interrupt to the HCD.

Setting the error flag to 1 for all transfers results in USB breaking before the Pi even reaches login, so this is the root cause. Patch inbound.

@davebiffuk
Copy link
Author

Great find! Thanks for the explanation. Can I buy you a drink? 🍺

Does that also cover the situation when this occurs with nothing plugged in? (no endpoints => no interrupts? I know little about USB internals)

@P33M
Copy link
Contributor

P33M commented Apr 15, 2014

BRANCH=next rpi-update firmware should contain a fix for this issue.

Bit errors can potentially happen during any traffic on the bus - as a function of power supply quality or ambient electrical noise. There is a mechanism for retrying failed transfers - the retry mechanism in this case is what caused host channels to never get acknowledged when the transfer completed.

@davebiffuk
Copy link
Author

Thanks, my Pi is running with the updated version now.

Electrical noise sounds likely - the Pi is monitoring my tortoise enclosure, which is lit and heated with a mercury vapour lamp.

@P33M
Copy link
Contributor

P33M commented Apr 16, 2014

As an aside, I spent much of this morning reading about the beneficial properties of mercury arc vapour lamps when used in vivariums.
spock _2267
Fascinating.

@davebiffuk
Copy link
Author

I'm not sure I'll go with mercury vapour next time around, this sort of bulb makes it hard to control the heat and UV levels separately. UV strip + incandescent for heat seems to be the way to go.

Anyway; bad news I'm afraid, it happened again. Uptime was about 5 days. The messages look the same as with previous occurrences (pasted below).

Can I just check that this is the kernel/firmware with the fix, to be sure I'm running a good one?

3.12.17+ #670

root@davepi:~# vcgencmd version
Apr 15 2014 18:34:07 
Copyright (c) 2012 Broadcom
version c1dc12512d5fe9340b6a2ca9badf9441fe6675a8 (tainted) (release)


Apr 21 07:35:02 davepi kernel: [466729.069993] uvcvideo: Non-zero status (-5) in video completion handler.
Apr 21 11:15:07 davepi kernel: [479934.470096] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Apr 21 11:15:07 davepi kernel: [479934.470127] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Apr 21 11:15:07 davepi kernel: [479934.470164] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Apr 21 11:15:12 davepi kernel: [479939.470156] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Apr 21 11:15:12 davepi kernel: [479939.470206] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Apr 21 11:15:12 davepi kernel: [479939.470224] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Apr 21 11:15:18 davepi kernel: [479945.570406] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Apr 21 11:15:18 davepi kernel: [479945.570460] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Apr 21 11:15:18 davepi kernel: [479945.570481] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Apr 21 11:15:23 davepi kernel: [479950.570539] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Apr 21 11:15:23 davepi kernel: [479950.570592] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Apr 21 11:15:23 davepi kernel: [479950.570614] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Apr 21 11:15:23 davepi kernel: [479950.743220] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Apr 21 11:15:28 davepi kernel: [479955.810777] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:34 davepi kernel: [479961.810969] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:40 davepi kernel: [479967.811151] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:46 davepi kernel: [479973.811351] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:52 davepi kernel: [479979.811558] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:15:58 davepi kernel: [479985.811752] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:16:04 davepi kernel: [479991.811985] smsc95xx 1-1.1:1.0 eth0: Failed to write reg index 0x00000014: -110
Apr 21 11:16:07 davepi kernel: [479994.482053] uvcvideo: Failed to set UVC probe control : -110 (exp. 26).

@P33M
Copy link
Contributor

P33M commented Apr 22, 2014

Can you post the output of vmstat 2 on the Pi in the bad state?

@davebiffuk
Copy link
Author

It's headless, so I've set it to capture the output of vmstat 2 5 once a minute, which will be retrievable after a recurrence. Let me know if that's not good enough.

@P33M
Copy link
Contributor

P33M commented Apr 22, 2014

That should suffice.

@P33M
Copy link
Contributor

P33M commented May 5, 2014

There was one other case that could cause this, which should now be fixed. Update the issue if it's still broken.

@cjgriscom
Copy link

cjgriscom commented Feb 23, 2019

@P33M I realize it's been a long time, but I have been discussing on the linux-usb mailing list a similar problem with the Chipidea USB driver. Would you be able to let me know which commits on this tree fixed the dwc_otg issue?

@popcornmix
Copy link
Collaborator

popcornmix commented Feb 25, 2019

@cjgriscom Looking through the git commits, I believe 281627f is the fix referenced here.

popcornmix pushed a commit that referenced this issue Sep 10, 2024
Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly
leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX
reads guest memory.

Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN
via sync_regs(), which already holds SRCU.  I.e. trying to precisely use
kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause
problems.  Acquiring SRCU isn't all that expensive, so for simplicity,
grab it unconditionally for KVM_SET_VCPU_EVENTS.

 =============================
 WARNING: suspicious RCU usage
 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted
 -----------------------------
 include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by repro/1071:
  #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm]

 stack backtrace:
 CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Call Trace:
  <TASK>
  dump_stack_lvl+0x7f/0x90
  lockdep_rcu_suspicious+0x13f/0x1a0
  kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm]
  kvm_vcpu_read_guest+0x3e/0x90 [kvm]
  nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel]
  load_vmcs12_host_state+0x432/0xb40 [kvm_intel]
  vmx_leave_nested+0x30/0x40 [kvm_intel]
  kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm]
  kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm]
  ? mark_held_locks+0x49/0x70
  ? kvm_vcpu_ioctl+0x7d/0x970 [kvm]
  ? kvm_vcpu_ioctl+0x497/0x970 [kvm]
  kvm_vcpu_ioctl+0x497/0x970 [kvm]
  ? lock_acquire+0xba/0x2d0
  ? find_held_lock+0x2b/0x80
  ? do_user_addr_fault+0x40c/0x6f0
  ? lock_release+0xb7/0x270
  __x64_sys_ioctl+0x82/0xb0
  do_syscall_64+0x6c/0x170
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 RIP: 0033:0x7ff11eb1b539
  </TASK>

Fixes: f7e5707 ("KVM: x86: Forcibly leave nested virt when SMM state is toggled")
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
popcornmix pushed a commit that referenced this issue Sep 12, 2024
commit 4bcdd83 upstream.

Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly
leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX
reads guest memory.

Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN
via sync_regs(), which already holds SRCU.  I.e. trying to precisely use
kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause
problems.  Acquiring SRCU isn't all that expensive, so for simplicity,
grab it unconditionally for KVM_SET_VCPU_EVENTS.

 =============================
 WARNING: suspicious RCU usage
 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted
 -----------------------------
 include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by repro/1071:
  #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm]

 stack backtrace:
 CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Call Trace:
  <TASK>
  dump_stack_lvl+0x7f/0x90
  lockdep_rcu_suspicious+0x13f/0x1a0
  kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm]
  kvm_vcpu_read_guest+0x3e/0x90 [kvm]
  nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel]
  load_vmcs12_host_state+0x432/0xb40 [kvm_intel]
  vmx_leave_nested+0x30/0x40 [kvm_intel]
  kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm]
  kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm]
  ? mark_held_locks+0x49/0x70
  ? kvm_vcpu_ioctl+0x7d/0x970 [kvm]
  ? kvm_vcpu_ioctl+0x497/0x970 [kvm]
  kvm_vcpu_ioctl+0x497/0x970 [kvm]
  ? lock_acquire+0xba/0x2d0
  ? find_held_lock+0x2b/0x80
  ? do_user_addr_fault+0x40c/0x6f0
  ? lock_release+0xb7/0x270
  __x64_sys_ioctl+0x82/0xb0
  do_syscall_64+0x6c/0x170
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 RIP: 0033:0x7ff11eb1b539
  </TASK>

Fixes: f7e5707 ("KVM: x86: Forcibly leave nested virt when SMM state is toggled")
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Sep 13, 2024
commit 4bcdd83 upstream.

Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly
leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX
reads guest memory.

Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN
via sync_regs(), which already holds SRCU.  I.e. trying to precisely use
kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause
problems.  Acquiring SRCU isn't all that expensive, so for simplicity,
grab it unconditionally for KVM_SET_VCPU_EVENTS.

 =============================
 WARNING: suspicious RCU usage
 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted
 -----------------------------
 include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by repro/1071:
  #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm]

 stack backtrace:
 CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Call Trace:
  <TASK>
  dump_stack_lvl+0x7f/0x90
  lockdep_rcu_suspicious+0x13f/0x1a0
  kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm]
  kvm_vcpu_read_guest+0x3e/0x90 [kvm]
  nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel]
  load_vmcs12_host_state+0x432/0xb40 [kvm_intel]
  vmx_leave_nested+0x30/0x40 [kvm_intel]
  kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm]
  kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm]
  ? mark_held_locks+0x49/0x70
  ? kvm_vcpu_ioctl+0x7d/0x970 [kvm]
  ? kvm_vcpu_ioctl+0x497/0x970 [kvm]
  kvm_vcpu_ioctl+0x497/0x970 [kvm]
  ? lock_acquire+0xba/0x2d0
  ? find_held_lock+0x2b/0x80
  ? do_user_addr_fault+0x40c/0x6f0
  ? lock_release+0xb7/0x270
  __x64_sys_ioctl+0x82/0xb0
  do_syscall_64+0x6c/0x170
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 RIP: 0033:0x7ff11eb1b539
  </TASK>

Fixes: f7e5707 ("KVM: x86: Forcibly leave nested virt when SMM state is toggled")
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants