Skip to content

ESP stops answering network requests #2267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
slompf18 opened this issue Jan 3, 2019 · 32 comments
Closed

ESP stops answering network requests #2267

slompf18 opened this issue Jan 3, 2019 · 32 comments
Labels
Status: Stale Issue is stale stage (outdated/stuck)

Comments

@slompf18
Copy link

slompf18 commented Jan 3, 2019

Hardware:

Board: esp32doit-devkit-v1
IDE name: Platform.io
Flash Frequency: 80Mhz
Upload Speed: 115200
Computer OS: Mac OSX

Description:

I'm trying to deliver a website for configuration of ssid/password. The site contains several js/css/ajax calls. Everything works flawlessly when using it in Chrome. When using it in Safari (ios or macos) the board stops answering when reloading the site (second or third time, not deterministic).

To serve the site I was trying ESPAsyncWebServer and my own little webserver (essentially a pool of WiFiClients). Every solution shows the same results.

So my question is, who to debug this problem?

The sketch I'm providing shows the symptoms. I know I could scan the networks async and then it would work in this little sketch. But doing it sync was the only way I found to demonstrate the problem. In the real application the scan is done async, but the symptoms remain.

Sketch:

#include <Arduino.h>
#include <ESPAsyncWebServer.h>
#include <SPIFFS.h>

AsyncWebServer server(80);

const char* ssid = "YOUR_SSID";
const char* password = "YOUR_PASSWORD";

void setup() {
    Serial.begin(115200);
    WiFi.begin(ssid, password);
    SPIFFS.begin();
    if (WiFi.waitForConnectResult() != WL_CONNECTED) {
        Serial.printf("WiFi Failed!\n");
        return;
    }

    Serial.print("IP Address: "); Serial.println(WiFi.localIP());

    server.serveStatic("/", SPIFFS, "/").setDefaultFile("index.htm");

    server.on("/getNetworkDetails", HTTP_GET, [](AsyncWebServerRequest *request){
        String result = "Network details: ";
        result += WiFi.SSID() + "/" + WiFi.getHostname();
        request->send(200, "plain/text", result);
    });

    server.on("/getNetworks", HTTP_GET, [](AsyncWebServerRequest *request){
        String result = "Found networks: ";
        result += WiFi.scanNetworks();
        request->send(200, "plain/text", result);
    });

    server.begin();
}

void loop() {
}

index.htm:

<html>
<body>
    <div>
        ESP TEST
    </div>
    <div id="networkDetails">loading network details ...</div>
    <div id="networks">loading number of networks ...</div>

    <script>
        document.addEventListener("DOMContentLoaded", function(event) { 
            var xhr1 = new XMLHttpRequest();
            xhr1.open('GET', 'getNetworkDetails');
            xhr1.onload = function() {
                document.getElementById("networkDetails").innerHTML = xhr1.responseText;
            };
            xhr1.send();

            var xhr2 = new XMLHttpRequest();
            xhr2.open('GET', 'getNetworks');
            xhr2.onload = function() {
                document.getElementById("networks").innerHTML = xhr2.responseText;
            };
            xhr2.send();
        });
    </script>
</body>
</html>

Debug Messages:

There are no error messages, even when setting debug level on verbose.

@me-no-dev
Copy link
Member

I have been catching the browsers to open extra connections when requesting the site. I do not guarantee that this is what is going on, but could be the case of having some connection hanging. Generally Async can deal with that and timeout, but who knows what exactly happens... :) maybe with more clues we will come up to a conclusion and fix.

@slompf18
Copy link
Author

slompf18 commented Jan 3, 2019

There are no extra connections in networks tab of the browsers and I did not see any extra connections when debugging my own test web server. And the sample is simple enough, to not animate the browser to do so.

Do you have any idea how to find the other clues. ;)

@me-no-dev
Copy link
Member

Ahh you have even debug enabled... hmmm... ESP8266 users have also complained about this since they switched to newer LwIP (same as on esp32)... question is to trace it down to what is causing it.

@slompf18
Copy link
Author

slompf18 commented Jan 4, 2019

Because the logs I see are not that verbose, I want to make sure we are talking about the same when saying "enabling logs". I was setting the following defines at build time:

  • LOG_LOCAL_LEVEL ESP_LOG_VERBOSE
  • CORE_DEBUG_LEVEL ARDUHAL_LOG_LEVEL_VERBOSE

and calling the following line in setup: esp_log_level_set("*", ESP_LOG_VERBOSE);

Is there something else?

Here are the logs I get, even when the board stops answering:

I (199) wifi: mode : sta (30:ae:a4:20:51:44)
[D][WiFiGeneric.cpp:345] _eventCallback(): Event: 2 - STA_START
[D][WiFiGeneric.cpp:345] _eventCallback(): Event: 0 - WIFI_READY
I (334) wifi: n:10 0, o:1 0, ap:255 255, sta:10 0, prof:1
I (1068) wifi: state: init -> auth (b0)
I (1075) wifi: state: auth -> assoc (0)
I (1080) wifi: state: assoc -> run (10)
I (1106) wifi: connected with Ways, channel 10
I (1111) wifi: pm start, type: 1

[D][WiFiGeneric.cpp:345] _eventCallback(): Event: 4 - STA_CONNECTED
[D][WiFiGeneric.cpp:345] _eventCallback(): Event: 7 - STA_GOT_IP
[D][WiFiGeneric.cpp:389] _eventCallback(): STA IP: 192.168.1.168, MASK: 255.255.255.0, GW: 192.168.1.1
IP Address: 192.168.1.168
[D][WiFiGeneric.cpp:345] _eventCallback(): Event: 1 - SCAN_DONE

@luc-github
Copy link
Contributor

@me-no-dev looks like similar behavior I reported with telnet connection stopping answering after few exchange, no ?

@plewka
Copy link

plewka commented Jan 18, 2019

Hello, my first post...I'm in trouble with this problem, too.

Hardware:

Board: Olimex ESP32-EVB
CPU: ESP32D0WDQ6 (revision 1)
Core Installation/update date: 12/jan/2019
IDE name: Arduino IDE 1.8.8
Cores Frequency: 80..240Mhz
Flash Frequency: 80 MHz
PSRAM enabled: yes/no
Upload Speed: 460800
Partitioning: Standard
Core DebugLevel: Verbose
Computer OS: Ubuntu18.04-AMD64
Wired Ethernet

Description:

ESPAsyncWebServer hangs any Network connection.
Even a ping to the ESP stops permanently. Events like disconnect (plug cable) etc. are dead, too.
No log output on core debug level VERBOSE.
More difficult to cause via a WLAN-Client (will try to cause failure by WLAN, next).
Propability rises with rising time to process requests.
I recogniced halted stack after minutes to hours, but can be forced to happen immediately.
Higher CPU clock, smaller files, non-simultan requests decrease propability.

What I did:

I isolated the problem to a simple webserver using ESPAsyncWebServer, SPIFF over wired Ethernet.
These few lines are enough to fail. I started with browser, but recursive wget works, too.

I even tried to put files to RAM. This only decreases propability. If there is only one (big) transfer
per time it seems to be stable forever...many small ones, too.

If I access via a WLAN-based client the propability is much lower, than with 1:1 direct cable connection without switch etc. but it still fails some time. I now doubt it is caused by the wired ethernet.

It's enough to do a simple html which makes the browser load a few pictures with simultanous requests. One bigger file of 300kB and some small pictures. Immediate loss of connection.

No success: Find a way how to detect the failure within the system itself to do a reboot...

Sketch:

removed it here, see next post....


@plewka
Copy link

plewka commented Jan 19, 2019

WLAN == Ethernet == Hanging Network, while loop() is fine
I just tried via WLAN-Client and ESP via WLAN, too:

Force the bug to happen immediately:

It is more difficult to force the bug, but it is there. Some base traffic and some reloads with browser cache deactivated and it hangs. I used

watch -n 2 wget \<url\> 

to load a "big" file of 300kB and and Firefox in parallel with some reload.
In Firefox (web developer-> Network analysis) requests show up to be are less simultanous
than with wired ethernet. For sure I deactivated the cache inside web developer to force the browser
to get the small files on any reload.

SKETCH:

//#include <Arduino.h>
#include <SPIFFS.h>
#include <ESPAsyncWebServer.h>
#include <ETH.h>
#define WLAN

AsyncWebServer server(80);
void setup()
{

  Serial.begin(230400);   

#ifdef WLAN
  WiFi.begin("***", "***");
  WiFi.mode(WIFI_STA);

  while (WiFi.status() != WL_CONNECTED) {
    delay(1000);
    Serial.println("Connecting to WiFi..");
  }
  Serial.println(WiFi.localIP());

#else
  ETH.begin();
  ETH.config(0xc805a8c0, 0x0105a8c0, 0x00ffffff); // 192.168.5.200 / 192.168.5.1 / 255.255.255.0
#endif 

  if (!SPIFFS.begin(true)) {
    Serial.println("An Error has occurred while mounting SPIFFS");
    return;
  }
   server.serveStatic("/", SPIFFS, "/");
  server.begin();
}
void loop()
{
  delay(1000);
  Serial.println(".");
}

@slompf18
Copy link
Author

I did not geht this bug fixed, because nobody was able to tell me how to analyse it. Now looking into RTOS. Seems to run more stable.

@plewka
Copy link

plewka commented Jan 20, 2019

If FreeRTOS is stable doing something equivalent here this sounds like an issue at the interfacing of AsyncTCP and lwIP. AsyncTCP is full of great features you won't implement by your own, though. But something fully freezes the lwip stack. I tried to put load to my SOnOffs on Tasmota, but they limit traffic/connections quite soon but continue to answer even if it takes a minute or more.

Maybe good idea to disable one Core to prevent SMT? I don't need the speed. Is there anybody having access to one of the single core ESP32s or knowing how to manipulate FreeRTOS options?!

@slompf18
Copy link
Author

When working with RTOS I experienced random panics that lead to a reboot. The error looks like the one I experienced when working with Arduino.

The problem had its root in the Watchdogs running in the background (described here for example). The default time out in RTOS is 20 seconds. But that doesn’t seem to matter. If the system is in idle for 19 seconds and then starts a job that takes 3 seconds, the system seems to reboot. I made this work by calling esp_task_wdt_reset() before certain operations.

Maybe we do have the same problem here?

@plewka
Copy link

plewka commented Jan 25, 2019

No, this is fully different. The system is basically fine and fully responsible, no reboot and no (verbose) message!
It simply doesn't respond to network requests anymore including PING.
It even doesn't detect a cable disconnect when used over cable (not related to cable though).
Up to my limited knowledge I tend to say the stack fully hangs.
I recognized a delay(250) ms in main loop() strongly increases probability, too.
There only has been an if which doesn't trigger and the delay in the main loop.
Anything which takes some time anywhere seems to be harmful.

Is Arduino really using both cores?!

@allex1978
Copy link
Contributor

The same issue on latest (30/01/19) Core and AsyncWebServer . Esp32 randomly stops answer to network including a ping...but CPU works fine. i see it on display.

@malbrook
Copy link

malbrook commented Mar 27, 2019

I have a similar problem using AsyncWebServer on a couple of different ESP32 projects. The web browser on a PC is connected to the ESP32 via a WiFi network displaying the html pages which use regular javascript ajax calls to the server to update sections of the screen without updating the whole screen. These calls happen every 2 seconds on one system and around 30 seconds an the second project. On both projects the web browser loses connection after an indeterminate time period after which the ESP32 cannot be detected on the network using a network scanner, no ping etc.

Both projects also have an access point running on the ESP32 and this also disappears as well. Using verbose debug I find a message rx timeout and ack timeout just before the WiFi packs up. I can see the ESP32 is still running as IO signals are still working changing LEDs in response to inputs, I also run a second process on the second core and I can see that this is also running even outputting via a serial port so the ESP32 itself is still operating.

Looking for the source of the messages I found they are generated in AsyncTCP and I noticed that when these messages are generated there is a call to _close() which I suspect is closing down the WiFi , so as a test I added a global variable in AsyncTCP that I could monitor this in the rest of the process, and if the message is triggered this is used to stop and then restart the WiFi. This seems to have reduced the problems by up to 50%, however I am now getting system crashes after some of the restarts, mainly relating to heap poisoning, so it clearly needs more work to solve the problem.

The problem can be regularly caused by opening a web page that contains configuration information, make a change and click the update button which causes the page to send a POST to the server which in turn will cause a write to the preferences, which are stored in SPIFFS, and then immediately select a new page on the browser causing the browser to request a new page from the server which in turn is served from the SPIFFS area. Does this imply that there is a problem when writing to the preferences area of the flash and at the same time reading from a different area of the flash causing an issue with the WiFi.

System is coded using Arduino IDE 1.8.9 with ESP32 1.0.2rc1 and SDK V3.3, AsyncWebServer and AsyncTCP are latest versions from github.

@BlackBird77
Copy link

BlackBird77 commented May 24, 2019

Me too...
Arduino 1.8.7 with ESP32 1.0.2 freezes the "Network Stack" completely. CPU is fine! Ping no chance!
Same with PlatformIO!

Arduino 1.8.7 and ESP32 1.0.1 is better, but some pings are lost. The latency goes higher and higher up
to 600ms and than fall back to 1ms.

Here my minimal code, to reproduce. Take the IP from Serial and Ping the ESP!

#include "WiFi.h"

void setup()
{
  Serial.begin(115200);
  Serial.setDebugOutput(true);

  Serial.println("Start Wifi..");
  WiFi.begin("***", "***");

  Serial.println("Started.. Wait for IP...");

  while (WiFi.status() != WL_CONNECTED) {
     delay(500);
     Serial.print(".");
  }  
  Serial.println();
  Serial.print("Got Ip: ");
  Serial.println(WiFi.localIP());
}

void loop()
{
  
}

image

@BlackBird77
Copy link

Ok it looks to me, that the newest platform with ESP-IDF3.2.0 has the freeze problem!

The slow pings come from WiFi power saveing, which can be turned off. Then the pings are okay with ESP-IDF 3.1.3

@stale
Copy link

stale bot commented Aug 2, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale Issue is stale stage (outdated/stuck) label Aug 2, 2019
@stale
Copy link

stale bot commented Aug 16, 2019

This stale issue has been automatically closed. Thank you for your contributions.

@stale stale bot closed this as completed Aug 16, 2019
@DhrubojyotiDey
Copy link

So can someone please guide on how to close this network freezing issue. It really does pose a problem. And is there any micropython code for this?

@zekageri
Copy link

Still an issue.

@aleuarore
Copy link

Still?

@phil31
Copy link

phil31 commented Jun 23, 2021

i can confirme this trouble too .. no update/news ?
thanks

@chegewara
Copy link
Contributor

chegewara commented Jun 23, 2021

I tested with example code from this issue and here is result:

64 bytes from 192.168.0.103: icmp_seq=153 ttl=255 time=94.0 ms
64 bytes from 192.168.0.103: icmp_seq=154 ttl=255 time=15.1 ms
64 bytes from 192.168.0.103: icmp_seq=155 ttl=255 time=38.0 ms
^C
--- 192.168.0.103 ping statistics ---
155 packets transmitted, 155 received, 0% packet loss, time 154227ms
rtt min/avg/max/mdev = 12.645/63.517/133.061/30.096 ms

How many pings do i have to send to confirm the issue or to confirm it works fine?

EDIT test performed with some commit that was v4.2 once:

commit beedeea4541116106b38fc5c3a03821cdf6fe288 (HEAD, origin/idf-release/v4.2, idf-release/v4.2)

@phil31
Copy link

phil31 commented Jun 25, 2021

u right, my problem is maybe not similar, PING continue to works, as you, but webserver hang time to time, for 10/20 seconds, then restart to work !

it's not stopping LAN requests, it stop HTTP requests for 10 or 20 seconds

@chegewara
Copy link
Contributor

What i mean is that maybe 150 ping requests is not long enough to reproduce issue and i should have wait a bit longer.

@phil31 if your problem is different, please open new issue with minimal code to reproduce and informations about version/branch etc

@Darktemp
Copy link

Hi, it seems, that I have a similar issue trying several approaches to avoid it, but it always ends in the situation that I can still ping the ESP (like mentioned with increasing latency) but I cannot connect to it (TCP/UDP) and it cannot connect to mqtt.
Is there any hint what I could do to narrow down the reason?
Annoyingly it only happens very randomly after around 5-7 days; the longest it took was 41 days until it froze.

@Darktemp
Copy link

If someone finds this via google, I found two reasons which should improve the situation:
#5487 which should be part of the next release (hopefully) and
issue #4736 .
Still takes ~ 41 days until I know if these were the last causes of the problem 😆

@Darktemp
Copy link

ok, I can confirm, that it never froze up anymore since 16.August!

@szerwi
Copy link

szerwi commented May 22, 2023

Any updates regarding this issue? Is there any fix planned to be implemented?

@VojtechBartoska
Copy link
Contributor

@szerwi Do you still face this on latest Arduino Core version 2.0.9?

@szerwi
Copy link

szerwi commented May 23, 2023

@VojtechBartoska I do have similar issue on arduino-esp32 2.0.7.
Sometimes my ESP32 looses WiFi connection (I cannot enter web server or ping it), but WiFi.status() is probably still returning WL_CONNECTED, as the ESP does not try to reconnect (I do have my own mechanism to disconnect and reconnect to the network again when it detects that it is not connected).
This issue happens only when there is some client connected to the web server. Sometimes it is also causing the ESP32 to crash.

I've heard that there are many bugs in ESPAsyncWebServer and AsyncTCP libraries and there are some forks of those libraries that are more stable, but I'm not sure which fork is the best available at this time.

@zekageri
Copy link

@szerwi Everyone has a problem with ESPAsyncWebServer. Unfortunatelly it is buggy.

My best shot was these forks

https://github.com/yubox-node-org/AsyncTCPSock and https://github.com/yubox-node-org/ESPAsyncWebServer

These are really stable for me but they inherit the same buggy design from the original library.

Websocket clients can stuck in there since there is a possibility that the client does not close the socket cleanly.

@Edzelf
Copy link

Edzelf commented May 23, 2023

Check the result of heap_caps_get_largest_free_block ( MALLOC_CAP_8BIT ). It should be well over 20k for a stable WiFi connection at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale Issue is stale stage (outdated/stuck)
Projects
None yet
Development

No branches or pull requests