Skip to content

SPI slow performance #2624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mkeyno opened this issue Oct 20, 2016 · 22 comments
Closed

SPI slow performance #2624

mkeyno opened this issue Oct 20, 2016 · 22 comments

Comments

@mkeyno
Copy link

mkeyno commented Oct 20, 2016

Hardware

Hardware: ?ESP-12F
Core Version: 2.2.0-rc2

Description

following is the simple sketch to test HW SPI speed , According to loop function in sketch , 470 byte has been written to SPI and take time stamp each 200th. With pre set SPI frequency ,whereas it suppose to reach around 200*(470/16)=6ms but I got unexpected result (1000 ms) for all data writing
is there any setting missing ?
thanks

Settings in IDE

Module: ?Generic ESP8266 Module?
Flash Size: 2MB/1MB
CPU Frequency: 80Mhz
Flash Mode: qio
Flash Frequency: 40Mhz
Upload Using: SERIAL
Reset Method: ck

Sketch

#include "SPI.h"
#include <Arduino.h>
#include <ESP8266WiFi.h>

IPAddress apIP(192, 168, 4, 1);
static int coun=0;
static uint32_t last_time=0;

void setup() {
Serial.begin(115200);
delay(500);
Serial.setDebugOutput(true);

SPI.setFrequency(16000000L);
SPI.setBitOrder(MSBFIRST);
SPI.begin();

WiFi.mode(WIFI_AP);
WiFi.softAPConfig(apIP, apIP, IPAddress(255, 255, 255, 0));
WiFi.softAP("ESP_SPI","12345678");                                
delay(500);  
Serial.print("AP IP address: ");Serial.println(WiFi.softAPIP());

}
void loop() 
{

 for (int i = 0; i < 470; i++)SPI.transfer(0xE0);
 coun++; 
  if(coun>=200)
  {
    Serial.printf("Write took %u us \n",  micros() - last_time);
    coun=0;
    last_time=micros();  
  }     
}

Write took 1095536 us
Write took 1095480 us
Write took 1095398 us
Write took 1095614 us
Write took 1095517 us
Write took 1095447 us
Write took 1095493 us
Write took 1095544 us
Write took 1095542 us
Write took 1095443 us
Write took 1095496 us



@WereCatf
Copy link
Contributor

You're trying to set up SPI-frequency and such before you have initialized the SPI-peripheral in the first place. Call SPI.begin() first, then set the frequency.

@mkeyno
Copy link
Author

mkeyno commented Oct 20, 2016

@WereCatf I did but still slow and far beyond expectation

Write took 255323 us 
Write took 255418 us 
Write took 255335 us 
Write took 255561 us 
Write took 255325 us 
Write took 255376 us 
Write took 255370 us 
Write took 255371 us 

@igrr
Copy link
Member

igrr commented Oct 20, 2016

In addition to the time taken by SPI to transfer data your measurments include the delay between loop() iterations, and the time it takes to Serial.printf. Furthermore, you may only be able to achieve maximum SPI throughput if you use transfer32 function to transfer one word at a time.

@WereCatf
Copy link
Contributor

Indeed, like igrr says, you're not counting the time it takes to transfer a byte over SPI, you're counting the time it takes to run the whole loop()! Of course you'll get entirely different values.

You could try something like e.g. the following to get a better look at how long 200 transfers took:

void loop() 
{
 uint32_t timerStart=micros();
 for (int i = 0; i < 200; i++) SPI.transfer(0xE0);
 uint32_t timerEnd=micros();
 Serial.printf("200 SPI-transfers took %u us \n",  timerEnd - timerStart);
 for (int i = 0; i < 270; i++) SPI.transfer(0xE0);
}

@mkeyno
Copy link
Author

mkeyno commented Oct 21, 2016

thanks to both of you @igrr , @WereCatf , but still don't get how to get the max possible speed ?

you may only be able to achieve maximum SPI throughput if you use transfer32 function to transfer one word at a time

let me explain my target project for requesting such feature , in fact I've tried to inject the byte array in size of data[ start_byte +number_of_LED_pixel+end_byte] 2oo times in less than 40ms, in to one LED strip.
which instruction should I use ,to be sure time it takes to run the whole loop()! be less than my requirements

byte *p=data[370];
while(p++) SPI.transfer(&p);

please if you know an example which is used the correct way , I really appreciated

@WereCatf
Copy link
Contributor

How long it takes to run through your whole loop()-function obviously depends on what else you do in there. You have to remember that e.g. Serial.printf() also takes some time to execute -- it doesn't just magically spit out bytes without taking any time at all.

Just transferring 470 bytes at 16MHz over SPI, with totally non-optimized code, took 845uS on my ESP, I just tested it. With optimized code it could be shaved down some more, though the theoretical minimum would be 235uS. So, the SPI-speed is not the problem.

Also, if I modify your code a bit and remove that Serial.printf() from there each iteration of loop() takes about ~900uS -- with those 470 bytes transferred over the SPI-bus -- and thus way below your 40mS requirement.

@mkeyno
Copy link
Author

mkeyno commented Oct 21, 2016

@WereCatf you mean just remove the Serial.printf() in my code ? can you share your optimized code ? also I might add following is my main code , but when I commented other parts in loop function such as webSocket.loop(); or server.handleClient(); the serial print shows just 1 or 2 ms miner change on my timing , even direct flash reading took around 12 ms which is so good , but the SPI writing was huge , I really appreciated if I have your advise to how to insert the SPI writing as fast as possible inside such code

void loop() 
{
if(!SHOW ) server.handleClient();   
webSocket.loop();  
   if(SHOW)
  {   
    if( (millis()- OpenlastTime) >DURATION[image_index]*1000) // just to change the image number
        {      
                     image_index++; 
                   if(image_index>=IMAGE_NUM) image_index=0; 
                _memory_pointer=start_address_of_imagefile[image_index];
                  Current_imageLine=0; 
                OpenlastTime=millis();
            }


       ESP.flashRead(_memory_pointer,(uint32_t *) LED_BUFFER,NUM_LEDS*3 );  
       Spi.write();// optimized SPI writing method
     _memory_pointer+=(NUM_LEDS*3);
     Current_imageLine++;      

  if(Current_imageLine>=IMAGES_LINES   )// just to repeat the image frame 
   {
  //  Serial.printf("\nFrame took %u ms line=%d  RPM=%d\n",  micros() -  frame_time,Current_imageLine,RPM*IMAGES_LINES);
    Current_imageLine=0;
     _memory_pointer=start_address_of_imagefile[image_index-1];
     frame_time=micros();  
    }

  }
}

@WereCatf
Copy link
Contributor

No, I already told you SPI-speed is not the problem. The problem was your Serial.printf()-call and the way you calculated the time it took to execute whole loop(). Writing 470 bytes over SPI at 16MHz didn't even take a whole 1ms, even when just using a simple for()-clause.

Don't do something as stupid as writing things over Serial on every loop()-iteration and it will run a lot faster.

@mkeyno
Copy link
Author

mkeyno commented Oct 26, 2016

hi again @WereCatf , @igrr , I've went after your suggestions , but I'm so surprise not big changes happened , even as per Ivan suggestion, use SPI.wirte32() but not big changes also, so I've just prepare following example ,if I comment out the AP Wifi setting it took about 265 ms(258 without delay(0)) but when I enable wifi setting its get exception and endlessly reset , would you mind upload this simple sketch and check it by your self , I've really got anxious about this problem

#include "SPI.h"
#include <Arduino.h>
#include <ESP8266WiFi.h>

IPAddress apIP(192, 168, 4, 1);

void setup() {
Serial.begin(115200);
delay(500);
SPI.begin();
SPI.setFrequency(16000000L);
SPI.setBitOrder(MSBFIRST);
delay(500);
// WiFi.mode(WIFI_AP);
// WiFi.softAPConfig(apIP, apIP, IPAddress(255, 255, 255, 0));
// WiFi.softAP("ESP_SPI","12345678");                                
// delay(500);  
// Serial.print("AP IP address: ");Serial.println(WiFi.softAPIP());
}
void loop() 
{
  uint32_t timerStart=micros();
 for (int i = 0; i < 200; i++) {for (int j = 0; j < 470; j++) SPI.transfer(0xE0); delay(0);}
 uint32_t timerEnd=micros();
 Serial.printf("200 SPI-transfers took %u us \n",  timerEnd - timerStart);
}

enable wifi setting
``
scandone
del if0
usl
mode : softAP(1a:fe:34:da:4d:f0)
add if1
dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1)
bcn 100

ets Jan 8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v00000000
~ld

disable wifi

200 SPI-transfers took 265329 us
200 SPI-transfers took 263643 us
200 SPI-transfers took 265958 us
200 SPI-transfers took 261463 us

@me-no-dev
Copy link
Collaborator

so... you delay every 470 bytes... Obviously we can not explain it in a way that you will understand it, but for reference I want to tell you that I can push 220x240x2 bytes through SPI for 27ms, you do the math ;)

@me-no-dev
Copy link
Collaborator

if your code runs as it should, it in perfect world take 47ms to transfer the bytes.
You have set the freq to 16MHz which means that at most you can push 2 bytes every 1 us.
200 * 470 = 94000 => 94000 / 2 = 47000us (in a perfect world)
but the you delay.... so it takes more.

@WereCatf
Copy link
Contributor

@mkeyno Take a look at this example:

#include <ESP8266WiFi.h>
#include <SPI.h>

void setup() {
  WiFi.mode(WIFI_STA);
  WiFi.begin("SSID", "password");
  Serial.begin(115200);
  delay(500);
  Serial.println("Begin...");
  Serial.print("Connecting to wifi");
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("Connected.");
  SPI.begin();
  SPI.setFrequency(16000000L);
  SPI.setBitOrder(MSBFIRST);
}

uint32_t loopStart;
uint32_t loopEnd;
uint8_t spiBuffer[470];

void loop() {
    loopStart = micros();
    for (int i = 0; i < 200; i++) {for (int j = 0; j < 470; j++) SPI.transfer(0xE0); delay(0);}
    loopEnd = micros();
    Serial.printf("Run 1 took %duS.\n", loopEnd - loopStart);

    loopStart = micros();
    for (int i = 0; i < 200; i++) {for (int j = 0; j < 470; j++) SPI.transfer(0xE0); }
    delay(1);
    loopEnd = micros();
    Serial.printf("Run 2 took %duS.\n", loopEnd - loopStart);

    loopStart = micros();
    for(int i=0; i<=470; i++) spiBuffer[i]=0xE0; //Fill the buffer with whatever you want
    for (int i = 0; i < 200; i++) SPI.writeBytes(spiBuffer, 470);
    delay(1);
    loopEnd = micros();
    Serial.printf("Run 3 took %duS.\n", loopEnd - loopStart);

    loopStart = micros();
    spiBuffer[0]=0xE0;
    SPI.writePattern(spiBuffer, 1, 200 * 470); //Write a pattern of 1 byte 200*470 times
    //Only good for writing patterns up to 64 bytes. DOES NOT WORK FOR BIGGER PATTERNS
    delay(1);
    loopEnd = micros();
    Serial.printf("Run 4 took %duS.\n\n", loopEnd - loopStart);

    delay(2000); //Delay for 2 seconds, just so it's easier to follow Serial-output. Can be removed.
}

It gives me the results:
Run 1 took 273723uS.
Run 2 took 260755uS.
Run 3 took 55961uS.
Run 4 took 48973uS.

As you can see, using writeBytes() is much faster. writePattern() may or may not work for your needs, I do not know. I am only including it so you can decide for yourself if it works for you or not. Now, there is another way of speeding things up, too, and that is by changing the IDE-settings:

Results with 80MHz Flash, QIO-mode, 160MHz CPU:
Run 1 took 175881uS.
Run 2 took 171417uS.
Run 3 took 51970uS.
Run 4 took 48590uS.

@WereCatf
Copy link
Contributor

WereCatf commented Oct 26, 2016

Also, I do have to ask: have you simply tried to use faster SPI-frequency? How high SPI-frequency your LED-strip can handle depends on the LED-strip, obviously, but the ESP8266 can drive an SPI-bus up to 80MHz.

@mkeyno
Copy link
Author

mkeyno commented Oct 26, 2016

no dear @me-no-dev , please check my notes again which is mentioned if I remover delay(0); its drop from 265 to 258 ms which is not big deal , I really appreciated if all gentleman here, upload my sketch to check it by himself , and please if any one can share a simple fast spi example code I really really thanks you to use it in my sketch

@me-no-dev
Copy link
Collaborator

uhmmm... did you read the above from @WereCatf ? Do you notice how his fourth run gives about the time from my calculations?

@mkeyno
Copy link
Author

mkeyno commented Oct 26, 2016

I know that @WereCatf and test it up to 80MHz, although I expected at least 50% higher speed when double the SPI speed but in practical it was below the 25% and it dwindling when set to max SPI speed (less than 10% ), however I can't use higher than 16MHz because my LED strip goes to noisy for higher SPI frequency

@me-no-dev
Copy link
Collaborator

oh man.... you have loops, jumps and so may things behind the scenes calling SPI-transfer... really pointless to try to explain further. I know that you want us to write the function for you so you can write faster to the SPI, but I do not want to at all ;) we are not here to write you the code or fix your own coding problems. We are here to ensure that everything works as it should, so you can be sure that the problem is in your code and not ESP-Arduino itself.
Why don't you try to run it on AVR or ARM and see the results there?

@WereCatf
Copy link
Contributor

@mkeyno Of course it's going to be noisy with your original code, it's just way too slow.

Look at my example, use SPI.writeBytes() like I did in it, and set the IDE-settings to 80MHz Flash, QIO-mode, 160MHz CPU. Then test with higher SPI-frequency, like e.g. 24MHz.

There isn't really much else to be said about this, you have to look at the examples I've given and try to understand.

@mkeyno
Copy link
Author

mkeyno commented Oct 26, 2016

thanks @WereCatf ,I was so terrified that I did somethings so stupid , but as you said I should use the faster function which was so unknown for me or probably ordinary user . In my project I use pointer to byte array instead of for loop and as @igrr suggest I've tried to convert it to word pointer , but no body believe me , even @Makuna lib which is use the normal writing was so slow and he not believed my results

@mkeyno
Copy link
Author

mkeyno commented Oct 26, 2016

@me-no-dev I really appreciated for all your time to spare in gitter room and here to reply my unprofessional questions but believe me I don't have any intention to ask anyone to write the code for me although you did me favor sometime and help me out , but I think the way I use the codes in github it may related to my practical application , for example I'm sure no body use yet ESP-Arduino for POV project which is need high speed SPI, hence every body thought I was mistaken about using the SPI, but the @WereCatf result shows it kindda naturally slow due to somethings in behind , I did test same program with Arduino uno and tennsy3.1 and surprisingly was so faster than ESP, for example I could easily reach to 152ms with 8MHz in Arduino uno
thanks to all you guys

@me-no-dev
Copy link
Collaborator

@mkeyno AVR is 8 bit and you can only write bytes to it (SPI.transfer) with two lines.
ESP as you should know already is 32bit controller and as such requires more than 2 lines and register fiddling to make sure that it sends 1 byte and not 4. There are many things that can be done to optimize SPI on any level, but you need to know and understand the BUS on ESP and how it all works for that to happen.
Also neither teensy nor AVR have WiFi and other stuff to deal with. They are plain controllers.
So on and so on.
If you want to succeed, you need to read the SPI library, understand how it works and write your own method to set the LEDs. That is the only way you have success faster. Arduino API is made simple and for convenience, the real speed comest from bare-metal programming.

@themindfactory
Copy link
Contributor

55000uS in theory 47000uS much closer

uint8_t buf[470];

void loop()
{

SPI.writeBytes(buf, 470);
if (++coun >= 200) {
long t = micros();
Serial.printf("Write took %u us \n", t - last_time);
coun = 0;
last_time = t;
}
}

On Wed, Oct 26, 2016 at 9:38 AM, Me No Dev [email protected] wrote:

@mkeyno https://github.com/mkeyno AVR is 8 bit and you can only write
bytes to it (SPI.transfer) with two lines.
ESP as you should know already is 32bit controller and as such requires
more than 2 lines and register fiddling to make sure that it sends 1 byte
and not 4. There are many things that can be done to optimize SPI on any
level, but you need to know and understand the BUS on ESP and how it all
works for that to happen.
Also neither teensy nor AVR have WiFi and other stuff to deal with. They
are plain controllers.
So on and so on.
If you want to succeed, you need to read the SPI library, understand how
it works and write your own method to set the LEDs. That is the only way
you have success faster. Arduino API is made simple and for convenience,
the real speed comest from bare-metal programming.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2624 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AINdaed7xtG5DZ5NSZtUu3P-CbFCIQX7ks5q31fcgaJpZM4KcBmw
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants