-
Notifications
You must be signed in to change notification settings - Fork 88
Swap http-parse to llhttp #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @elprans ! All tests are now passing, I was able to understand the problem to The thing here that I don't think it is ok is the upgrade behavior and tests, so would love the feedback! Most specifically, I don't think that I should be calling
|
I think the comment doesn't mean |
Yeah, the order is irrelevant as long as you free both. |
Hey @elprans ! Were you able to take a look here? |
@@ -1,3 +1,7 @@ | |||
[submodule "vendor/http-parser"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we no longer need the http-parser
submodule, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Actually it is used to parse urls, as llhttp don't have this feature!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uparser.pxd
is used only to extern url parsing methods from http_parser.h
, and url_parser..pyx
contains previous code related to url parsing. I couldn't extern both llhttp.h
and http_parser.h
on the same file as they have naming conflicts
@tomchristie @sethmlarson Do you guys still use httptools in your projects? Could you please check if this branch works with them? I'd really like to merge this, and overall it looks like it should be a drop in replacement, but I still worry a bit because our test suite could be better. |
I don't use httptools for anything, looks like uvicorn pins |
If httptools is installed, uvicorn will use it, but is an optional dependency! |
Sure, but we still need to make sure it will work when uvicorn pins a new version. E.g. |
@victoraugustolls fwiw if you could verify that starlette and uvicorn work fine and their test suites pass with the updated http-tools it would be great! |
Oh, this I totally agree! |
Will do! |
@1st1 regarding
|
@1st1 regarding Starlette, it seems like they don't force you to use Also, can you give me your 2cent on if self._cparser.upgrade == 1 and nb == cparser.HPE_PAUSED_UPGRADE:
read_bytes = cparser.llhttp_get_error_pos(self._cparser)
cparser.llhttp_resume_after_upgrade(self._cparser)
raise HttpParserUpgrade(read_bytes) I don't think I should call |
I also just ran a quick load test against an api of mine (locally and on Azure) using a forked version of |
One other point that concerns me, besides upgrade, is this none on
Should we be calling UPDATE: I just added a test to add this behavior against this pr and against master, and this I modified this test: def test_parser_request_3(self):
m = mock.Mock()
p = httptools.HttpRequestParser(m)
with self.assertRaises(httptools.HttpParserInvalidURLError):
p.feed_data(b'POST HTTP/1.2')
p.feed_data(CHUNKED_REQUEST1_3)
self.assertEqual(p.get_method(), b'POST')
m.on_url.assert_called_once_with(b'/test.php?a=b+c')
self.assertEqual(p.get_http_version(), '1.2')
m.on_header.assert_called_with(b'Transfer-Encoding', b'chunked')
m.on_chunk_header.assert_called_with()
m.on_chunk_complete.assert_called_with()
self.assertTrue(m.on_message_complete.called) And in both cases |
Yeah, similar requirement exists for http_parser too. After receiving a fatal error out of it you have to reinitialize it before feeding a new request into it. |
I plan to soon (in a few days) work on a project that will involve httptools. I plan to merge this PR then. Sorry for the delay, guys. |
Great to hear @1st1 ! Just let me know if you need anything from me here! |
Hey @1st1 ! Hope you're doing well! Was just wondering if you had the chance to take a look at this PR! |
Hi @victoraugustolls , sorry for the delay and appreciate your help on this! I'll be helping with getting this PR merged. Looks like llhttp fixed a CVE in a recent 4.0.0 release (or the 2.2.1 patch), I think it makes sense to include that fix.
We should pin the submodule to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll include comments for the pyx files in the next review.
.gitignore
Outdated
vendor/llhttp/node_modules | ||
vendor/llhttp/build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we pinned to llhttp release/*
tags, this is not needed any more. (llhttp should have its own .gitignore
on the vx.x.x
tags and in the master
branch, right?)
.gitmodules
Outdated
[submodule "vendor/llhttp"] | ||
path = vendor/llhttp | ||
url = https://github.com/nodejs/llhttp.git | ||
branch = release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned earlier, we should remove the confusing branch = release
.
httptools/parser/cparser.pxd
Outdated
|
||
|
||
cdef extern from "../../vendor/http-parser/http_parser.h": | ||
ctypedef int (*http_data_cb) (http_parser*, | ||
cdef extern from "../../vendor/llhttp/include/llhttp.h": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cdef extern from "../../vendor/llhttp/include/llhttp.h": | |
cdef extern from "llhttp.h": |
This was a leftover issue - we should let the compiler find the right header files instead of hard-coding here - as we do support --use-system-http-parser
.
httptools/parser/uparser.pxd
Outdated
from libc.stdint cimport uint16_t | ||
|
||
|
||
cdef extern from "../../vendor/http-parser/http_parser.h": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cdef extern from "../../vendor/http-parser/http_parser.h": | |
cdef extern from "http_parser.h": |
Ditto, leftover issue.
httptools/parser/url_parser.pyx
Outdated
from .errors import HttpParserInvalidURLError | ||
|
||
cimport cython | ||
from . cimport uparser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name uparser
feels weird to me comparing to cparser.pxd
standing for "the Cython header file to the parser written in C". Perhaps url_cparser
is more aligned, if we don't want to name the pxd files after llhttp or http-parser for good reasons like using future llurl for url_cparser
. So let's have:
parser.pyx
->cparser.pxd
for HTTP parsingurl_parser.pyx
->url_cparser.pxd
for URL parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing here that I don't think it is ok is the upgrade behavior and tests, so would love the feedback! Most specifically, I don't think that I should be calling
cparser.llhttp_resume_after_upgrade(self._cparser)
, but expose this as a method likedef resume_after_upgrade(self)
, but left it there to hear your opinion.
Looks like what llhttp_resume_after_upgrade()
does is quite simple - it reset the parser's internal error state if it was paused for upgrade. So according to another doc you quoted:
* NOTE: if this function ever returns a non-pause type error, it will continue * to return the same error upon each successive call up until `llhttp_init()` * is called. llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len);
It's all about how the parser behaves after detecting an upgrade. In short, I think you're right to call llhttp_resume_after_upgrade()
in feed_data()
(the only missing step in your PR is to calculate the pointer's difference, see below), or successive calls to feed_data()
will raise HttpParserUpgrade
errors, which is inconsistent with the current released httptools. What's more, according to the docs of llhttp_get_error_pos()
:
Returns the pointer to the last parsed byte before the returned error. The
pointer is relative to thedata
argument ofllhttp_execute()
.Note: this method might be useful for counting the number of parsed bytes.
There's no guarantee that the 2nd call to feed_data()
is using the same data
parameter - in other words, the return value of llhttp_get_error_pos()
is considered an invalid pointer without the context of the first call to feed_data()
, so we should calculate the number of parsed bytes while we can.
For the behavior of calling feed_data()
again after handling HttpParserUpgrade
errors, we should add a test for it, even though this is a rare use case.
TL;DR: Please see suggested patch below:
diff --git a/httptools/parser/parser.pyx b/httptools/parser/parser.pyx
index 6656bef..3158d0e 100644
--- a/httptools/parser/parser.pyx
+++ b/httptools/parser/parser.pyx
@@ -161,13 +161,15 @@ cdef class HttpParser:
def feed_data(self, data):
cdef:
size_t data_len
- cparser.llhttp_errno_t nb
+ cparser.llhttp_errno_t err
Py_buffer *buf
+ bint owning_buf = False
+ char* err_pos
if PyMemoryView_Check(data):
buf = PyMemoryView_GET_BUFFER(data)
data_len = <size_t>buf.len
- nb = cparser.llhttp_execute(
+ err = cparser.llhttp_execute(
self._cparser,
<char*>buf.buf,
data_len)
@@ -175,21 +177,34 @@ cdef class HttpParser:
else:
buf = &self.py_buf
PyObject_GetBuffer(data, buf, PyBUF_SIMPLE)
+ owning_buf = True
data_len = <size_t>buf.len
- nb = cparser.llhttp_execute(
+ err = cparser.llhttp_execute(
self._cparser,
<char*>buf.buf,
data_len)
- PyBuffer_Release(buf)
-
- if self._cparser.upgrade == 1 and nb == cparser.HPE_PAUSED_UPGRADE:
- read_bytes = cparser.llhttp_get_error_pos(self._cparser)
- cparser.llhttp_resume_after_upgrade(self._cparser)
- raise HttpParserUpgrade(read_bytes)
-
- if nb != cparser.HPE_OK:
+ try:
+ if self._cparser.upgrade == 1 and err == cparser.HPE_PAUSED_UPGRADE:
+ err_pos = cparser.llhttp_get_error_pos(self._cparser)
+
+ # Immediately free the parser from "error" state, simulating
+ # http-parser behavior here because 1) we never had the API to
+ # allow users manually "resume after upgrade", and 2) the use
+ # case for resuming parsing is very rare.
+ cparser.llhttp_resume_after_upgrade(self._cparser)
+
+ # The err_pos here is specific for the input buf. So if we ever
+ # switch to the llhttp behavior (re-raise HttpParserUpgrade for
+ # successive calls to feed_data() until resume_after_upgrade is
+ # called), we have to store the result and keep our own state.
+ raise HttpParserUpgrade(err_pos - <char*>buf.buf)
+ finally:
+ if owning_buf:
+ PyBuffer_Release(buf)
+
+ if err != cparser.HPE_OK:
ex = parser_error_from_errno(
self._cparser,
<cparser.llhttp_errno_t> self._cparser.error)
diff --git a/tests/test_parser.py b/tests/test_parser.py
index 101da22..cd6f59c 100644
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -332,11 +332,11 @@ class TestRequestParser(unittest.TestCase):
try:
p.feed_data(UPGRADE_REQUEST1)
except httptools.HttpParserUpgrade as ex:
- offset = len(ex.args[0])
+ offset = ex.args[0]
else:
self.fail('HttpParserUpgrade was not raised')
- self.assertEqual(UPGRADE_REQUEST1[-offset:], b'Hot diggity dogg')
+ self.assertEqual(UPGRADE_REQUEST1[offset:], b'Hot diggity dogg')
self.assertEqual(headers, {
b'Sec-WebSocket-Key2': b'12998 5 Y3 1 .P00',
@@ -347,6 +347,11 @@ class TestRequestParser(unittest.TestCase):
b'Host': b'example.com',
b'Upgrade': b'WebSocket'})
+ # The parser can be used again for further parsing - this is a legacy
+ # behavior from the time we were still using http-parser.
+ p.feed_data(CHUNKED_REQUEST1_1)
+ self.assertEqual(p.get_method(), b'POST')
+
def test_parser_request_upgrade_flag(self):
class Protocol:
httptools/parser/parser.pyx
Outdated
def feed_data(self, data): | ||
cdef: | ||
size_t data_len | ||
size_t nb | ||
cparser.llhttp_errno_t nb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is no longer nb
(number of bytes parsed).
cparser.llhttp_errno_t nb | |
cparser.llhttp_errno_t err |
tests/test_parser.py
Outdated
@@ -200,11 +200,11 @@ def test_parser_upgrade_response_1(self): | |||
try: | |||
p.feed_data(UPGRADE_RESPONSE1) | |||
except httptools.HttpParserUpgrade as ex: | |||
offset = ex.args[0] | |||
offset = len(ex.args[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like mentioned earlier - we shouldn't change the HttpParserUpgrade API here.
I will address the 3.9 issue in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I've added a few fixes and let's get this merged first. Please let me know if you have concerns about my fixes, we could address them in a new PR.
Hi!
First of all, I'm pretty new to Cython, so I'm sorry if I made any dumb mistakes here (pretty sure I might have messed up
setup.py
).Some observations:
The tests are failing due to some import error, and I'm pretty sure it is because of the changes I made toTests are now passingsetup.py
offset
that I couldn't reproduce withllhttp
so I might have missed something@elprans I would love some feedback here to get this going!
Also tagging @indutny in case you have any 2cents or some warnings about something we might miss on this migration, or that we need extra care :D