-
Notifications
You must be signed in to change notification settings - Fork 5.2k
ethernet dropped: smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110 #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. I have had a Pi do this on my desk this morning when just left overnight with nothing plugged in (compiling a kernel). |
I note that you're using 0x3 as the mask: 0x7 would add FIQ acceleration for your webcam which should get rid of the error messages. This problem is proving difficult to track down. I've been running an IRQ latency fuzz tester for all of today with multiple devices plugged in - no results. These messages are usually only generated if there are no available host channels to do transfers with (or at least, the driver thinks it doesn't have any free) or the HCD no longer has any interrupts to drive things forward. I think at this point I need to build a kernel that prints out necessary debug as and when things fall over, then send it to you. |
40 hours later, the fuzz tester seems to have done the trick (or not, as the case may be). Now to find out what is to blame... there's 8k interrupts per second which means SOF is still getting through. |
Can you try this set of kernel+modules? It's built from my fiq_fsm_test branch which has a hack in place to make the dwc_otg driver become much noisier if things break. https://www.dropbox.com/s/0jmtfic91sw834l/fuzz-modules.tgz kernel.img goes in /boot and overwrites the existing kernel.img (you may want to back it up first). To extract the modules do |
Thanks @P33M - your perseverance is much appreciated! I've got that kernel+modules in place now. I had switched to mask=0x7 but it hung earlier today with that (fits what you said) - so I've gone back to mask=0x3. I'll let you know as soon as there's anything to report... thanks again. |
Woot. One of my Pis just fell off the radar. Best part is, I have the whole log. There's a clue here: I have multiple device reset messages before it falls over.
And a smoking gun: HAINT has stale entries. |
The root cause is bit errors. It took 48 hours for me to replicate the problem with continuous bus use because that's how long it takes to accumulate 7 bit errors. Interesting to know that my Ebay special hub and Ebay special USB pendrive have a bit error rate of 9x10^-13. The HCD will unmask various channel interrupts if the previous transaction to the same endpoint failed for some reason. My FSM FIQ relies on a single interrupt firing (channel halt) so that it can pass only one interrupt to the HCD. Setting the error flag to 1 for all transfers results in USB breaking before the Pi even reaches login, so this is the root cause. Patch inbound. |
Great find! Thanks for the explanation. Can I buy you a drink? 🍺 Does that also cover the situation when this occurs with nothing plugged in? (no endpoints => no interrupts? I know little about USB internals) |
Bit errors can potentially happen during any traffic on the bus - as a function of power supply quality or ambient electrical noise. There is a mechanism for retrying failed transfers - the retry mechanism in this case is what caused host channels to never get acknowledged when the transfer completed. |
Thanks, my Pi is running with the updated version now. Electrical noise sounds likely - the Pi is monitoring my tortoise enclosure, which is lit and heated with a mercury vapour lamp. |
I'm not sure I'll go with mercury vapour next time around, this sort of bulb makes it hard to control the heat and UV levels separately. UV strip + incandescent for heat seems to be the way to go. Anyway; bad news I'm afraid, it happened again. Uptime was about 5 days. The messages look the same as with previous occurrences (pasted below). Can I just check that this is the kernel/firmware with the fix, to be sure I'm running a good one?
|
Can you post the output of |
It's headless, so I've set it to capture the output of |
That should suffice. |
There was one other case that could cause this, which should now be fixed. Update the issue if it's still broken. |
@P33M I realize it's been a long time, but I have been discussing on the linux-usb mailing list a similar problem with the Chipidea USB driver. Would you be able to let me know which commits on this tree fixed the dwc_otg issue? |
@cjgriscom Looking through the git commits, I believe 281627f is the fix referenced here. |
Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX reads guest memory. Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN via sync_regs(), which already holds SRCU. I.e. trying to precisely use kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause problems. Acquiring SRCU isn't all that expensive, so for simplicity, grab it unconditionally for KVM_SET_VCPU_EVENTS. ============================= WARNING: suspicious RCU usage 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted ----------------------------- include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by repro/1071: #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm] stack backtrace: CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x7f/0x90 lockdep_rcu_suspicious+0x13f/0x1a0 kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm] kvm_vcpu_read_guest+0x3e/0x90 [kvm] nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel] load_vmcs12_host_state+0x432/0xb40 [kvm_intel] vmx_leave_nested+0x30/0x40 [kvm_intel] kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm] kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm] ? mark_held_locks+0x49/0x70 ? kvm_vcpu_ioctl+0x7d/0x970 [kvm] ? kvm_vcpu_ioctl+0x497/0x970 [kvm] kvm_vcpu_ioctl+0x497/0x970 [kvm] ? lock_acquire+0xba/0x2d0 ? find_held_lock+0x2b/0x80 ? do_user_addr_fault+0x40c/0x6f0 ? lock_release+0xb7/0x270 __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x6c/0x170 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7ff11eb1b539 </TASK> Fixes: f7e5707 ("KVM: x86: Forcibly leave nested virt when SMM state is toggled") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
commit 4bcdd83 upstream. Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX reads guest memory. Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN via sync_regs(), which already holds SRCU. I.e. trying to precisely use kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause problems. Acquiring SRCU isn't all that expensive, so for simplicity, grab it unconditionally for KVM_SET_VCPU_EVENTS. ============================= WARNING: suspicious RCU usage 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted ----------------------------- include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by repro/1071: #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm] stack backtrace: CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x7f/0x90 lockdep_rcu_suspicious+0x13f/0x1a0 kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm] kvm_vcpu_read_guest+0x3e/0x90 [kvm] nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel] load_vmcs12_host_state+0x432/0xb40 [kvm_intel] vmx_leave_nested+0x30/0x40 [kvm_intel] kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm] kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm] ? mark_held_locks+0x49/0x70 ? kvm_vcpu_ioctl+0x7d/0x970 [kvm] ? kvm_vcpu_ioctl+0x497/0x970 [kvm] kvm_vcpu_ioctl+0x497/0x970 [kvm] ? lock_acquire+0xba/0x2d0 ? find_held_lock+0x2b/0x80 ? do_user_addr_fault+0x40c/0x6f0 ? lock_release+0xb7/0x270 __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x6c/0x170 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7ff11eb1b539 </TASK> Fixes: f7e5707 ("KVM: x86: Forcibly leave nested virt when SMM state is toggled") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 4bcdd83 upstream. Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX reads guest memory. Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN via sync_regs(), which already holds SRCU. I.e. trying to precisely use kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause problems. Acquiring SRCU isn't all that expensive, so for simplicity, grab it unconditionally for KVM_SET_VCPU_EVENTS. ============================= WARNING: suspicious RCU usage 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted ----------------------------- include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by repro/1071: #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm] stack backtrace: CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x7f/0x90 lockdep_rcu_suspicious+0x13f/0x1a0 kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm] kvm_vcpu_read_guest+0x3e/0x90 [kvm] nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel] load_vmcs12_host_state+0x432/0xb40 [kvm_intel] vmx_leave_nested+0x30/0x40 [kvm_intel] kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm] kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm] ? mark_held_locks+0x49/0x70 ? kvm_vcpu_ioctl+0x7d/0x970 [kvm] ? kvm_vcpu_ioctl+0x497/0x970 [kvm] kvm_vcpu_ioctl+0x497/0x970 [kvm] ? lock_acquire+0xba/0x2d0 ? find_held_lock+0x2b/0x80 ? do_user_addr_fault+0x40c/0x6f0 ? lock_release+0xb7/0x270 __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x6c/0x170 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7ff11eb1b539 </TASK> Fixes: f7e5707 ("KVM: x86: Forcibly leave nested virt when SMM state is toggled") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
I've recently been using BRANCH=next firmware to successfully avoid "smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0", see #470
There are still occasional uvcvideo log messages due to the USB webcam, but they don't seem to affect either the webcam or the ethernet.
However I'm still seeing the Pi spontaneously lose its (wired) ethernet connection after some time - 5 days uptime in this case. It's running headless so needs powercycling to get connected again.
kern.log says:
Config: the Pi is a Model B Rev 2, 512MB RAM. There's a USB webcam attached via a powered hub. There are 3 one-wire temperature sensors and a 16x2 LCD attached by GPIO.
cmdline.txt is:
The text was updated successfully, but these errors were encountered: