Skip to content

Fix mem leak in SSL server, allow for concurrent client and server connections w/o interference #4305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 8, 2018

Conversation

earlephilhower
Copy link
Collaborator

The refcnt setup for the WiFiClientSecure's SSLContext and ClientContext
had issues in certain conditions, causing a massive memory leak on each
SSL server connection. Depending on the state of the machine, after two or
three connections it would OOM and crash.

This patch replaces most of the refcnt operations with C++11 shared_ptr
operations, cleaning up the code substantially and removing the leakage.

Also fixes a race condition where ClientContext was free'd before the SSLContext
was stopped/shutdown. When the SSLContext tried to do ssl_free, axtls would
attempt to send out the real SSL disconnect bits over the wire, however by
this time the ClientContext is invalid and it would fault.

Fixes esp8266#4302

The refcnt setup for the WiFiClientSecure's SSLContext and ClientContext
had issues in certain conditions, causing a massive memory leak on each
SSL server connection.  Depending on the state of the machine, after two or
three connections it would OOM and crash.

This patch replaces most of the refcnt operations with C++11 shared_ptr
operations, cleaning up the code substantially and removing the leakage.

Also fixes a race condition where ClientContext was free'd before the SSLContext
was stopped/shutdown.  When the SSLContext tried to do ssl_free, axtls would
attempt to send out the real SSL disconnect bits over the wire, however by
this time the ClientContext is invalid and it would fault.
@earlephilhower
Copy link
Collaborator Author

Just to be explicit, the issues were seen only with the WiFiServerSecure + WebServerSecure's access pattern and not any of the pre 2.4.1 WiFiClientSecure bits.

@earlephilhower
Copy link
Collaborator Author

earlephilhower commented Feb 6, 2018

Testing using the HTTPSRequest example (i.e. WiFiClientSecure) worked as well, but found a problem with the example (expired certificate). Ugh! Pull #4306 submitted to fix that. No problem with the actual WiFiClient/ServerSecure seen.

@earlephilhower
Copy link
Collaborator Author

Platform.io builds are all now failing. Will resubmit once the platform infrastructure is working again.

@earlephilhower
Copy link
Collaborator Author

Tested this also using MQTT-ESP8266 sample with client certificates to verify it doesn't break the client side.

Unfortunately, by doing everything "properly" we can no longer share the server and client context if they have different client and server keys. It was an abuse of axtls, but did work before simply because we were unilaterally killing the context on server creation.

So as of now with this fix you can have a SSL client or a SSL server, but not both at the same time.

Refactor to use a separate client SSL_CTX and server SSL_CTX.  This
allows for separate certificates to be installed on each, and means
that you can now have both a *single* client and a *single* server
running in parallel at the same time, as they'll have separate memory
areas.

Tested using mqtt_esp8266 SSL client with a client certificate and a
WebServerSecure with its own custom certificate and key in parallel.
@earlephilhower
Copy link
Collaborator Author

Latest commit keeps track of separate client and server contexts. If you only use one or the other in your app, no change in memory or usage. If you want to do both, then they can both now run, in parallel, each with its own set of certificates (client/server), without any interference. You'll need 2x the memory, of course, but in my test app which includes both a mqtt_esp8266 w/client cert SSL + WebServerSecure I still have ~11KB free when both are connected and running. When the server connection is closed (i.e. after a webpage is sent) the memory is freed back up.

This also means for folks who were having trouble w/client certs disappearing after a server connection, it's fixed as well and you need NO code changes from your original, client-only setup.

Fixes #4302 and updates #3001 significantly.

I've had it going for about an hour with mqtt publishing every 5 seconds and a https get loop on my PC pulling a webpage, no issues seen. So I think it's ready for review and testing, folks.

@earlephilhower earlephilhower changed the title Fix leak on multiple SSL server connections, fix #4302 Fix mem leak in SSL server, allow for concurrent client and server connections w/o interference Feb 6, 2018
Copy link
Collaborator

@d-a-v d-a-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for the amazing work on SSL, for both old and new implementation.

}

static ClientContext* getIOContext(int fd)
{
return reinterpret_cast<SSLContext*>(fd)->io_ctx;
if (!fd) return nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent return to new line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I did a 'grep if | grep ";"' and didn't see any other spots. You've got good eyes to have found them!

if (_ssl_ctx_refcnt == 0) {
_ssl_ctx = ssl_ctx_new(SSL_SERVER_VERIFY_LATER | SSL_DEBUG_OPTS | SSL_CONNECT_IN_PARTS | SSL_READ_BLOCKING | SSL_NO_DEFAULT_KEY, 0);
_isServer = isServer;
if (!_isServer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These if(server) else client statements hint at separating SSLContext into SSLServerContext and SSLClientContext. However, I suspect that doing that would require even more changes, and right now we need to improve stability. So let's handle that later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be straightforward to make the SSLServerContext from the unmodified SSLContext, because there's only 3 or spots where you'll see this check or a different codepath dependent on this. That said, I'm not sure it helps understanding by breaking it into another subclass...frankly most of my time was spent going through the inherited classes to see WTH was going on instead of just being able to peek at one file/class and see the inner workings. We can revisit if axtls sticks around. BearSSL doesn't need this kind of abstraction at all, so the code there is simpler.

As for stability, this code spent 5 hours being beaten by a while(true) mosquiit_pub sending it data, it mqtt publishing to a SSL mosquitto server using a Client cert every 5 seconds, and a while(true) wget https://esp8266 loop to make it serve web pages. Not even a hiccup noted...

Wish I could package this setup as a test, but it needs a mosquitto server, fixed IPs (for the certs), and a couple monitors to make sure the mqtt mesages and the wget continue working. I wouldn't know where to start...

io_ctx = nullptr;
}

bool connected()
{
if (_isServer) return _ssl != nullptr;
else return _ssl != nullptr && ssl_handshake_status(_ssl) == SSL_OK;
else return _ssl != nullptr && ssl_handshake_status(_ssl.get()) == SSL_OK;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent return line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@devyte
Copy link
Collaborator

devyte commented Feb 7, 2018

Just minor indents from me.

@devyte devyte merged commit bf5a0f2 into esp8266:master Feb 8, 2018
@earlephilhower earlephilhower deleted the sslserverfixes branch March 9, 2018 05:39
@bhaskar-anil429
Copy link

Hi,
we are trying to send a https request to a server using wifiClientsecure. It works fine for few few hours but then it crashes. seems like a memory leak issue. This detailed info can be found over here as well @ #4569 . any help will be much appreciated.
esp board package is 2.4.1
Thanks

@earlephilhower
Copy link
Collaborator Author

@bhaskar-anil429 Make sure you are running git head, not 2.4.1 release. There were some TCP-level fixes after the release as well related to memory leaks on connections.

Other than that, only thing I've seen otherwise is that the ESP is running out of memory due to app usage or the heap's being fragmented so that even if you have enough total RAM there's not a large enough contiguous block to fulfil an axtls request. Not much can be done there...

If you can open a new bug with a small working example that shows the mem after each loop and a way of repeatable testing we could see if there's anything else there. The SSL server uses a lot of memory, I'd be surprised to see a small leak...more likely a large one which would cause OOM after a few connections (like the original bug fix).

@comino
Copy link
Contributor

comino commented Apr 12, 2018

As reported in #4569 I can confirm this, I also observe a leakage (latest git version). It's unlikely a heap fragmentation.

@earlephilhower
Copy link
Collaborator Author

I'm not seeing any leak per connection using a modified version of the script posted at the top here (to point to my public web server, and do GET instead of POST, but nothing materially different). I'm running with debug level "None", not even the larger mem free option of NoAssert+NDEBUG///

#include <WiFiServerSecure.h>
#include <WiFiClientSecure.h>
#include <ESP8266WiFi.h>
#include <ESP8266WiFiMulti.h>
#include <WiFiUdp.h>
#include <ESP8266WiFiType.h>
#include <ESP8266WiFiAP.h>
#include <WiFiClient.h>
#include <WiFiServer.h>
#include <ESP8266WiFiScan.h>
#include <ESP8266WiFiGeneric.h>
#include <ESP8266WiFiSTA.h>

unsigned long wifiConnectTimeout = 5000;
WiFiClientSecure client;

const char* host = "ziplabel.com";
int httpsPort = 443;
const char* fingerprint = "37 00 53 F1 10 64 F2 41 99 E6 84 07 C5 71 3F 3D 53 25 91 1E";

char* httpRequestContent = "GET /index.html HTTP/1.0\r\n"
    "User-Agent: MatrixSSL/ 3.7.2-OPEN \r\n"
    "Host: ziplabel.com\r\n"
    "Accept: */*\r\n"
    "\r\n";

int requests = 0;

void setup() {
  // put your setup code here, to run once:
  WiFi.begin("*", "*");
  Serial.begin(115200);
  Serial.print("Connecting");

}

void loop() {
  Serial.printf("Free: %d\n", ESP.getFreeHeap());
  while(WiFi.status() != WL_CONNECTED){
    Serial.print('.');
    delay(500);
  }
  Serial.print("Requests: ");
  Serial.println(requests);

  // put your main code here, to run repeatedly:
  if(WiFi.status() == WL_CONNECTED){
    Serial.println("Sending request");
    if(!client.connect(host, httpsPort)){
      Serial.println("server connection failed");
      delay(1000);
      return;
    }
    if(!client.verify(fingerprint, host)){
      Serial.println("Cannot verify Server");
      delay(1000);
      return;
    }
    client.print(httpRequestContent);
    while(client.connected()){
      if(client.available()){
        String line = client.readStringUntil('\n');
        Serial.println(line);
      }
    }
    requests++;
  }
}

Mem free stabilized at 31056 bytes after 2 or 3 cycles and is now at 200 loops without any change.

</BODY>
</HTML>
Free: 31056
Requests: 164
Sending request
HTTP/1.1 200 OK
Date: Thu, 12 Apr 2018 15:25:04 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/5.4.16
Last-Modified: Tue, 01 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants