-
Notifications
You must be signed in to change notification settings - Fork 13.3k
OTA of large files results in device hangs due to bootloader compare error #7458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Oh poop, I think I see what you're saying. You have a firmware of 100K and a 1MB flash chip. You OTA a 200K app placed at 100K-300K and copied down to 0-200K...overwriting the 1st 100K of the original OTA space. Is that the general gist of the issue? |
From what I see here:
it first does a copy_raw loop moving the downloaded data to it's destination. It then calls the copy_raw loop again comparing the data. |
Yes. That's what I meant |
I suggest to get rid of the verify anyway. If verify fails it will stall as it does now. Without verify it may run but terminate with exceptions allowing to re-ota a correct binary. It provides additional functionality which a simple verify doesn't. BTW until now, three years in the process, I never observed any bootloader binary failure so I think there is no need for this functionality at all. As an additional advantage it also reboots faster without the verify ;-) If you still want verify why not make a hash of the source binary while copying it to it's final destination and then re-read the copied locations and compare to the hash. |
If you still want verify why not make a hash of the source binary while copying it to it's final destination and then re-read the copied locations and compare to the hash.
It feels like this also could be immediate read of the written data? By changing ‘verify’ option to select between ‘just write the data’ and ‘write the data, read it back, compare’
|
Have you encountered flash chips like Puya or XMC? Paranoia like this wasn't needed until write failures were reported, and after lots of pain we figured out it had to do with the chip brand.
Because the hash algorithm would need to be in the eboot code, and there's almost no more space (4KB size max). In theory we could go above 4KB, but that would jump to 8KB of bootloader, and most of it would be empty. And we're not sure it will work. |
It could but then it might be you're reading the ram part (cache) of the flash (if it exists) instead of the real flash area. The hash is a better verify process I think. It shouldn't be that tricky something like:
I have met Puya but they never failed me on restart. They missed the binary OTA upload indeed but then they still rebooted as the current implementation will halt waiting for a power cycle. |
The point is that the verify is supposed to detect other potential cases like that which are currently unknown. New flash brands are showing up every so often, they can have quirks like this, and we've already bled twice as a result, so removing all verification is not an option. About what @earlephilhower said:
The received image is end-aligned in the empty space area, so what would happen is something like this: So, in your case:
Then:
So for your next OTA you're constrained to 440KB size, down from 560KB. Can you handle that? ln general, hitting this case means that future updates must be done with a 2-step process by first flashing a small stub app, in which case you won't have this verify problem anyways. |
@devyte , FWIW there are some hashing functions in ROM (SHA1, MD5) which could be used. Might still be issues with fitting in the code on our end, but it's not clear it's going to cross the 4k boundary. |
@devyte Yes I can handle that. Tasmota uses OTAMagic. This detects that the requested image won't fit in the space allocated and switches to an intermediate binary called This mechanism worked fine for over a year until the bootloader verify option came in play; with the minimal binary active all requested OTA binaries would be larger and were bound to fail. |
Just to be on the same page, the idea is for eboot to continuously update hash based on write buffer? Doesn't this still hit the caching problem when re-reading the data to calculate the hash for a 2nd time? Speed is another question though, as right now it is only a single memcmp. As a sidenote, I was wondering about caching mechanisms, it looks like SDK rom .ld has something that looks like related to the flash operations - Cache_Read_{Enable,Disable}. There are also same functions with _2 suffix, without arguments, but those are not listed in headers, but are included in the spi_flash.o of the SDK. Cache_Read_Enable(u8, u8, u8) needs some specific arguments which does not seem to be documented anywhere, and _2-suffixed also don't really work with bootleg version I c/p from the RTOS sources Even then, is caching even enabled by default (i.e. is it a hw property) or does sdk 'enable' it only after the app is loaded? |
@mcspr, As I understand it, ICACHE is out of the picture at boot time. You have 64K of IRAM with eboot sitting at the end. At least until After this, to do flash read/writes you need to use SDK system APIs to handle flash read/writes. Direct ROM flash API calls will fail. The SDK system calls use calls to My memory is a little fuzzy, If you want to see some examples using it, I have 3 active PRs that turn on One note, In summary, while eboot is running there should not be any cache interference while accessing the flash with ROM API calls. |
Yes, problems in normal use of the flash. NEVER when writting with the bootloader routines. |
Is it sufficient to revert #7317 from 2.7.2 for a clean 2.7.3 ? |
Fine with me. |
As discussed internally, reverting #7317 entirely is not an option. Only the verify step added in that merge must be disabled. |
Meant for #7458 , but still requires a recompiled eboot.elf.
2.7.3 branch has been force-updated. Edit: link to the changes (including recompiled eboot.elf) |
No issues so far, OTA or otherwise. Both xmc d1mini clone and esp8285 board |
@mcspr did you try the new 2.7.3 branch (not the link provided above which I just |
Last one posted, where 59bbfc7 is the last commit.
|
@d-a-v the branch 2.7.3 need to set build flags for the compiler version.
|
The branch generates warnings which are not in core 2.7.2.
|
Sorry guys. Forget my posts above. I messed up something in my fork with git commands... |
With your two confirmations we are going to release it. |
A few thoughts - yes just commenting the verify call is the way to go at least for a quick fix. |
Alternately - what about simply flagging that data was written beyond the start address of the compressed image, and skipping the verify when that happens? |
Sorry, I've been offline a few days. I'll test it this evening, as I have a bunch of modules with XMC flash. Haven't had OTA issues since the last XMC fix, but I'm not up to current build. |
Earle rebuilt the eboot.elf with current git + #7468 and I've tested it with two different D1 Minis with XMC, both at 80 & 160 and going back and forth. No errors with my test code, which is nowhere near as complex as Tasmota is. Mine's only 326032 bytes, 31% of 1meg program storage space on 4meg XMC flash. I also tested with d-a-v's eboot.elf, same results: no problem with 2 boards at any compile speed. |
* Comment out verify step in eboot.c Meant for #7458 , but still requires a recompiled eboot.elf. * Rebuild eboot.elf from changed source Co-authored-by: Earle F. Philhower, III <[email protected]>
2.7.3 is now released with the patch in #7468 (it's also in master), so I'm closing this. If there still is any issue, please let us know. |
Thx. |
After a recent core change Tasmota fails to load binaries over 560k due to new bootloader functionality.
After the file is loaded this is reported:
It seems the bootloader fails to compare the copied binary with the uploaded binary (for obvious reasons as part of the uploaded file are being overwritten by the bootloader copy routine).
A recent commit 51daecc introduces this cmp functionality not present in previous bootloader versions and leads to the device hangs.
Pls remove the compare functionality as it breaks OTA functionality
EDIT: Using core 2.7.1 the OTA uploads work fine (as it has no compare functionality)
The text was updated successfully, but these errors were encountered: