Skip to content

rp2040 core does not use rom floating point functions #614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
WestfW opened this issue Jan 30, 2023 · 7 comments
Open

rp2040 core does not use rom floating point functions #614

WestfW opened this issue Jan 30, 2023 · 7 comments
Labels
enhancement New feature or request RP2040

Comments

@WestfW
Copy link

WestfW commented Jan 30, 2023

The rp2040 includes optimized floating point functions in ROM, using linker magic to replace the standard gcc floating point functions when building applications from the pico SDK or Philhower core.

But the Arduino mbed core for rp2040 does not do the linker magic, causing applications that use floating point to be much slower (and larger) than on other rp2040 platforms.

the linker magic is included here: https://github.com/earlephilhower/arduino-pico/blob/master/lib/platform_wrap.txt

Arduino's (much smaller) link options: https://github.com/arduino/ArduinoCore-mbed/blob/master/variants/RASPBERRY_PI_PICO/ldflags.txt

@facchinm facchinm added enhancement New feature or request RP2040 labels Jan 30, 2023
@facchinm
Copy link
Member

Hi @WestfW ,
at the time we first ported RP2040 on mbed-os I benchmarked the two solutions, and soft fp were not THAT bad (sometimes even faster than the ROM version) while I agree that the flash occupation increases a lot (which was not considered a problem due to the large flash space available).
Anyway, we would gladly accept a PR that implements all the wraps (while taking care not to break the other mcus supported by this core) 🙂

@WestfW
Copy link
Author

WestfW commented Feb 1, 2023

Here's a quick benchmark of tangent (which was one of the more computationally intensive of the library functions.) Nearly 10x improvement using the ROM functions!

Philhower core, variant rpipico (using ROM float functions)
tanf Elapsed time: 983
tan (double) Elapsed time: 1982
tanf Elapsed time: 876
tan (double) Elapsed time: 1962

Arduinocore (using gcc float functions.)
tanf Elapsed time: 8023
tan (double) Elapsed time: 12048
tanf Elapsed time: 7360
tan (double) Elapsed time: 11701

void setup() {
  Serial.begin(115200);
  while (!Serial) ;
#ifdef ARDUINO_VARIANT
  Serial.println("\nPhilhower core, variant " ARDUINO_VARIANT);
#else
  Serial.println("\nArduinocore");
#endif
}

uint32_t startmicros, endmicros;
void loop() {
  startmicros = micros();
  for (float f = 0.0; f < 2 * (float)PI; f += PI / (float)60.0) {
    digitalWrite(5, ((int)tanf(f)) & 1);
  }
  endmicros = micros();
  Serial.print("tanf Elapsed time: ");
  Serial.println(endmicros - startmicros);
  delay(5000);
  startmicros = micros();
  for (double f = 0.0; f < 2 * PI; f += PI / 60.0) {
    digitalWrite(5, ((int)tan(f)) & 1);
  }
  endmicros = micros();
  Serial.print("tan (double) Elapsed time: ");
  Serial.println(endmicros - startmicros);
  delay(5000);
}

@daveythacher
Copy link

daveythacher commented Feb 16, 2023

What happens if this is RAM only?

Edit:
Does Arduino MBED recommend precomputation as a general practice for RP2040?

@WestfW
Copy link
Author

WestfW commented Feb 19, 2023

What? RAM-only executables can still call ROM. What does precomputation have to do with anything?
The Pico SDK does floats by "wrapping" calls to the standard FP functions in short bits of code that call the ROM functions, but
the way they've done it requires a bunch of -Wl,--wrap=__aeabi_fadd options in the link step.

@daveythacher
Copy link

daveythacher commented Feb 19, 2023

What? RAM-only executables can still call ROM.

RAM only is to see if the access time plays a role in the performance or not. RAM and ROM should be about the same.

What does precomputation have to do with anything?

Nothing, it was just a guess for why.

The Pico SDK does floats by "wrapping" calls to the standard FP functions in short bits of code that call the ROM functions, but
the way they've done it requires a bunch of -Wl,--wrap=__aeabi_fadd options in the link step.

Can you degrade performance using SDK to recreate the numbers you see with MBED? If so, you can test RAM only with copy to RAM to prove if flash is a problem.

Flash is likely a problem no matter what. Unless there is a defect in the ROM version, I would guess the ROM version will always be better.

Edit:
My guess being this is your point?

@rivimey
Copy link

rivimey commented Jan 17, 2024

Can't comment on the linker opts, but I would remind you that the rom functions are faster partly because they are intentionally less accurate (and in particular the rom code is not IEEE754 conformant). I would therefore argue that giving users a choice between smaller-faster-but-less-accurate and larger-slower-but-more-accurate was useful, rather than forcing one selection on all users.

@WestfW
Copy link
Author

WestfW commented Jan 18, 2024

Not true, at least in principle...

Qfplib-M0-full is a library of IEEE 754 single- and double-precision floating-point arithmetic routines for microcontrollers based on the ARM Cortex-M0 core (ARMv6-M architecture).

It provides correctly rounded (to nearest, even-on-tie) addition, subtraction, multiplication, division and square root operations, and sine, cosine, tangent, arctangent, logarithm and exponential functions that give a high degree of accuracy.

Limitations and deviations from the IEEE 754 standard
On input and output NaNs are converted to infinities and denormals are flushed to zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request RP2040
Projects
None yet
Development

No branches or pull requests

4 participants