Skip to content

Refactor into separate files, additional features, addition of missing typing #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Nov 22, 2022
Merged

Refactor into separate files, additional features, addition of missing typing #25

merged 25 commits into from
Nov 22, 2022

Conversation

michalpokusa
Copy link
Contributor

@michalpokusa michalpokusa commented Nov 13, 2022

This PR, other that separating adafruit_httpserver into multiple files #8 adds following functionality:

  • Access to additional attributes of HTTPRequest:
    • http_version extracted from Start line, e.g. "HTTP/1.1"
    • query_params extracted from path, e.g. /path?foo=bar is parsed to {"foo": "bar"}
    • headers as a dict, e.g. {'user-agent': '...', 'host': 'esp32-s2-tft.local', 'connection': 'close'}
    • body as bytes, so both text content and images can be processed Add support for reading contents of POST requests #16
  • Added new parameter to HTTPResponse which allows to set it's headers Allow setting of HTTP headers in HTTP response #24

Other than that, this PR also includes:

Despite major refactor, the usage of library stays nearly unchanged, so it won't be neccessary to rewrite existing projects from scratch, only minor changes might be required.
It surely is not perfect, but I belive it is a step in the right direction.

Closes #2
Closes #6
Closes #8
Closes #16
Closes #17
Closes #21
Closes #24

@michalpokusa michalpokusa marked this pull request as ready for review November 13, 2022 23:40
@dhalbert
Copy link
Contributor

Take a look at using pre-commit if you want to test the black and pylint errors locally: https://learn.adafruit.com/creating-and-sharing-a-circuitpython-library/check-your-code

Solved problem when data is sent in chunks and is not received in full.
Bypassed ESP32 TCP buffer limit of 2880 bytes.
@anecdata
Copy link
Member

anecdata commented Nov 19, 2022

I'm not able to get the revised examples/httpserver_simplepolling.py to work from a browser. I put some debug prints in and it looks like the socket accepts the connection, but the received_data is sometimes empty, but even when it looks complete the server just hangs and never returns the response. The desktop browser just keeps waiting, I think browsers tend to have infinite timeout. I'll keep at it.

Copy link
Member

@anecdata anecdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions. The only way I could complete a request from a browser was to use an arbitrary timeout, incurring a performance hit and assuming that whatever was received in that time was complete.

while "Receiving data":
try:
length = conn.recv_into(self._buffer)
received_data += self._buffer[:length]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the blocking case, we need a way to know when the request is complete?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized there is no blocking case currently allowed (timeout must be >=0, not None), but perhaps there should be?

Copy link
Contributor Author

@michalpokusa michalpokusa Nov 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that giving an option to set timeout to None (blocking) should be possible, thus not liming if one wants to do so.

It might also be logical to increase default timeout from 0 to 0.5 or 1 that would allow receiving request without explicitly setting timeout using server.socket_timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I do not really see any other way of determining whether it is the end of mesage other than timeout, I do not like that it unnecessarily waits after last part of message is received but it is the only way that seems to work no matter which client wants to connect to server. But, compared to current main branch, each request takes at least 2x longer.

One positive here is that it makes it possible to send data longer that max buffer size of 2880 bytes, which makes it more universal.

It is hard to find a middle point here, and we have to remember that we are talking about microcontrollers, not load-balanced web servers, so maybe a longer delay is not as much of a problem as the buffer size limit.

try:
length = conn.recv_into(self._buffer)
received_data += self._buffer[:length]
except OSError as ex:
Copy link
Member

@anecdata anecdata Nov 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the non-blocking (no timeout) case, we need to handle EAGAIN here? And again I think we'll hit the question of how to determine when a request is complete.

And perhaps defensively add a raise ex to catch-all any other OSErrors.

@michalpokusa
Copy link
Contributor Author

michalpokusa commented Nov 20, 2022

After more testing, it seems to me that the most reliable way of solving the socket blocking problem is making it possible only to set it to positive number, no 0 and None, this way eliminates infinitely blocked socket and enables request exceeding 2880 bytes.

I know it is not perfect and has an impact on loading speed, but I don't see any other way around it. Even if library api enabled setting socket_timeout to 0 or None, then it would simple hang on forever on first request. Preventing that from happening I believe is more user friendly.

Maybe there is other solution that I am missing, I do not completely understand all the quirks of TCP sockets and blocking behaviour. But from testing this way seems most reasonable to me and the impact itself is not that noticeable.

I would leave it at only socket_timeout > 0, even though is is not most optimal solution, is enables more functionality that current main build. Maybe in future someone will have a better solution to this problem. 🤔

@dhalbert
Copy link
Contributor

What is different about the way HTTPServer is operating compared to, say, the simple CPython HTTP server, or Apache, nginx, etc. Do we need to fix something underlying?

@michalpokusa
Copy link
Contributor Author

michalpokusa commented Nov 20, 2022

As I mentioned above, I am not very familiar with TCP communication on low-level, but I found that a client should send a shutdown(SHUT_WR) to socket.recv() (which is not implemented yet in CircuitPython) to indicate the end of message [Detect client is done with sending (shutdown)].

Also when analysing CPython http.server and going deeper in call stack i found similar condition. [CPython/Lib/socketserver.py:216]

Maybe that is the problem, but it is hard for me to say, as it is very low-level.

@anecdata
Copy link
Member

anecdata commented Nov 20, 2022

The issue with a timeout, aside from just the dead time, is that it can trip up async by extending the minimum overall async loop time. It's also a guess how long to make the timeout to fully capture any given request, so the chosen timeout will have to err on the long side.

Someone do please correct me if I'm wrong, but my understanding is that HTTP/1.1 adds some protocol assumptions over simple TCP: if a request has a message body it will either have a transfer-encoding (e.g., chunked) OR a content-length header, OR possibly (to maintain compatibility with HTTP/1.0 clients) will handle if the client sends neither header but closes the socket - but that's strongly discouraged in HTTP/1.1. I think the relevant specs are here and here.

Does the client closing not trigger a ECONNRESET? Maybe not as there is a Requests issue that could be related.

If we can't detect the client closing, we can still have a timeout applied on top of handling content-length, and return NotImplemented for now for transfer-encoding? (There is chunk handling code in requests that could be useful in future.) Or, we could raise an exception if neither content-length nor transfer-encoding is present.

addendum 1: Apache seems to handle the blocking + no-content-length + no-transfer-encoding case (full code here):

s.settimeout(None)
s.connect((HOST, PORT))
size = s.send(f"GET {PATH} HTTP/1.1\r\nHost: {HOST}:{PORT}\r\n\r\n".encode())
print("Sent", size, "bytes")

buf = bytearray(MAXBUF)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])

gets an immediate response from the server with the headers and content. Interestingly, POST also works immediately despite having no message body where one would normally be expected.

adendum 2: python3 -m http.server (HTTP/1.0) also returns immediately for GET (but POST gets a 501).

I know we are looking for some happy medium here, I'm not sure what it is... performance, buffer size, protocol conformance, etc. Hard to know what aspects will be most expected by the community. In general, this PR is a big step up in functionality and we could push some of this to future exploration.

@michalpokusa
Copy link
Contributor Author

Finally I was able to get the expected result. As @anecdata suggested I used Content-Length to determine when there is no more data expected. Now timeout is not the only thing that can end the connection and it works with curl, browsers, adafruit_requests and other clients I tested, and is is fast, much faster than before.

I adjusted default socket_timeout and prevented from setting it to None or 0.

I also changed how HTTPRequest.headers are parsed, as they, unlike query params, are case-insensitive. I used the same approach as implemented in adafruit_wsgi/request.py:103, which means that the are all lower case.

Right now there is no support for Transfer-Encoding, but I believe it is better to make it a separate PR later.

To be honest I am quite happy with current state of this PR, and I don't really see much that can be changed now.

Copy link
Contributor

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with the reactle game from https://learn.adafruit.com/wordle-personal-esp32-s2-web-server, on Firefox and Chrome. It worked fine.

@michalpokusa We really appreciate your work on this!

@anecdata Do you want to test further? I will merge after hearing if you if you think this is good to go.

I will release this as 1.0.0 since there are API changes.

Copy link
Member

@anecdata anecdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. Tested with and without content-length client header.

Thanks, @michalpokusa!

@dhalbert dhalbert merged commit cf5b2aa into adafruit:main Nov 22, 2022
@todbot
Copy link

todbot commented Nov 22, 2022

Thank you @michalpokusa !

@nicolasff
Copy link

Great job @michalpokusa! Glad to see it was merged.

@michalpokusa
Copy link
Contributor Author

Thank you! 😊

adafruit-adabot added a commit to adafruit/Adafruit_CircuitPython_Bundle that referenced this pull request Nov 23, 2022
@paul-1 paul-1 mentioned this pull request Dec 2, 2022
@michalpokusa michalpokusa deleted the refactor-and-additional-features branch December 19, 2022 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment