-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Fix mem leak in SSL server, allow for concurrent client and server connections w/o interference #4305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes esp8266#4302 The refcnt setup for the WiFiClientSecure's SSLContext and ClientContext had issues in certain conditions, causing a massive memory leak on each SSL server connection. Depending on the state of the machine, after two or three connections it would OOM and crash. This patch replaces most of the refcnt operations with C++11 shared_ptr operations, cleaning up the code substantially and removing the leakage. Also fixes a race condition where ClientContext was free'd before the SSLContext was stopped/shutdown. When the SSLContext tried to do ssl_free, axtls would attempt to send out the real SSL disconnect bits over the wire, however by this time the ClientContext is invalid and it would fault.
Just to be explicit, the issues were seen only with the WiFiServerSecure + WebServerSecure's access pattern and not any of the pre 2.4.1 WiFiClientSecure bits. |
Testing using the HTTPSRequest example (i.e. WiFiClientSecure) worked as well, but found a problem with the example (expired certificate). Ugh! Pull #4306 submitted to fix that. No problem with the actual WiFiClient/ServerSecure seen. |
Platform.io builds are all now failing. Will resubmit once the platform infrastructure is working again. |
Tested this also using MQTT-ESP8266 sample with client certificates to verify it doesn't break the client side. Unfortunately, by doing everything "properly" we can no longer share the server and client context if they have different client and server keys. It was an abuse of axtls, but did work before simply because we were unilaterally killing the context on server creation. So as of now with this fix you can have a SSL client or a SSL server, but not both at the same time. |
Refactor to use a separate client SSL_CTX and server SSL_CTX. This allows for separate certificates to be installed on each, and means that you can now have both a *single* client and a *single* server running in parallel at the same time, as they'll have separate memory areas. Tested using mqtt_esp8266 SSL client with a client certificate and a WebServerSecure with its own custom certificate and key in parallel.
Latest commit keeps track of separate client and server contexts. If you only use one or the other in your app, no change in memory or usage. If you want to do both, then they can both now run, in parallel, each with its own set of certificates (client/server), without any interference. You'll need 2x the memory, of course, but in my test app which includes both a mqtt_esp8266 w/client cert SSL + WebServerSecure I still have ~11KB free when both are connected and running. When the server connection is closed (i.e. after a webpage is sent) the memory is freed back up. This also means for folks who were having trouble w/client certs disappearing after a server connection, it's fixed as well and you need NO code changes from your original, client-only setup. Fixes #4302 and updates #3001 significantly. I've had it going for about an hour with mqtt publishing every 5 seconds and a https get loop on my PC pulling a webpage, no issues seen. So I think it's ready for review and testing, folks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for the amazing work on SSL, for both old and new implementation.
} | ||
|
||
static ClientContext* getIOContext(int fd) | ||
{ | ||
return reinterpret_cast<SSLContext*>(fd)->io_ctx; | ||
if (!fd) return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent return to new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I did a 'grep if | grep ";"' and didn't see any other spots. You've got good eyes to have found them!
if (_ssl_ctx_refcnt == 0) { | ||
_ssl_ctx = ssl_ctx_new(SSL_SERVER_VERIFY_LATER | SSL_DEBUG_OPTS | SSL_CONNECT_IN_PARTS | SSL_READ_BLOCKING | SSL_NO_DEFAULT_KEY, 0); | ||
_isServer = isServer; | ||
if (!_isServer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These if(server) else client statements hint at separating SSLContext into SSLServerContext and SSLClientContext. However, I suspect that doing that would require even more changes, and right now we need to improve stability. So let's handle that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be straightforward to make the SSLServerContext from the unmodified SSLContext, because there's only 3 or spots where you'll see this check or a different codepath dependent on this. That said, I'm not sure it helps understanding by breaking it into another subclass...frankly most of my time was spent going through the inherited classes to see WTH was going on instead of just being able to peek at one file/class and see the inner workings. We can revisit if axtls sticks around. BearSSL doesn't need this kind of abstraction at all, so the code there is simpler.
As for stability, this code spent 5 hours being beaten by a while(true) mosquiit_pub sending it data, it mqtt publishing to a SSL mosquitto server using a Client cert every 5 seconds, and a while(true) wget https://esp8266 loop to make it serve web pages. Not even a hiccup noted...
Wish I could package this setup as a test, but it needs a mosquitto server, fixed IPs (for the certs), and a couple monitors to make sure the mqtt mesages and the wget continue working. I wouldn't know where to start...
io_ctx = nullptr; | ||
} | ||
|
||
bool connected() | ||
{ | ||
if (_isServer) return _ssl != nullptr; | ||
else return _ssl != nullptr && ssl_handshake_status(_ssl) == SSL_OK; | ||
else return _ssl != nullptr && ssl_handshake_status(_ssl.get()) == SSL_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent return line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Just minor indents from me. |
Hi, |
@bhaskar-anil429 Make sure you are running git head, not 2.4.1 release. There were some TCP-level fixes after the release as well related to memory leaks on connections. Other than that, only thing I've seen otherwise is that the ESP is running out of memory due to app usage or the heap's being fragmented so that even if you have enough total RAM there's not a large enough contiguous block to fulfil an axtls request. Not much can be done there... If you can open a new bug with a small working example that shows the mem after each loop and a way of repeatable testing we could see if there's anything else there. The SSL server uses a lot of memory, I'd be surprised to see a small leak...more likely a large one which would cause OOM after a few connections (like the original bug fix). |
As reported in #4569 I can confirm this, I also observe a leakage (latest git version). It's unlikely a heap fragmentation. |
I'm not seeing any leak per connection using a modified version of the script posted at the top here (to point to my public web server, and do GET instead of POST, but nothing materially different). I'm running with debug level "None", not even the larger mem free option of NoAssert+NDEBUG///
Mem free stabilized at 31056 bytes after 2 or 3 cycles and is now at 200 loops without any change.
|
The refcnt setup for the WiFiClientSecure's SSLContext and ClientContext
had issues in certain conditions, causing a massive memory leak on each
SSL server connection. Depending on the state of the machine, after two or
three connections it would OOM and crash.
This patch replaces most of the refcnt operations with C++11 shared_ptr
operations, cleaning up the code substantially and removing the leakage.
Also fixes a race condition where ClientContext was free'd before the SSLContext
was stopped/shutdown. When the SSLContext tried to do ssl_free, axtls would
attempt to send out the real SSL disconnect bits over the wire, however by
this time the ClientContext is invalid and it would fault.