-
Notifications
You must be signed in to change notification settings - Fork 64
Support for servers with broken line endings #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
(NB I'm probably not going to work on this any time soon) |
CC: @durin42 since it's his fault I know about this |
From RFC 7230 Section 3.5 (Message Parsing Robustness):
This suggests that allowing simply |
Sadly, I’ve seen \r without \n in the wild. :( |
Oh, so have I. But I am a strong advocate of telling implementations that haven't read RFCs to go take a long walk off a short pier. |
I'm not sure it makes much difference in terms of implementation simplicity, because h11's trick for simple fast HTTP header parsing is use BTW, I assume that servers that screw this up for headers also screw it up for chunked encoding? |
@njsmith No need to duplicate the check: better to swap to a fairly simple regex. Either way you desperately need to avoid having too much cleverness here in Python, because each time you come back into Python code to process the bytestream you slow down a load on CPython (which for better or worse is what users will use to judge your perf). |
@Lukasa: yeah, agreed that |
More notes on this for future reference: It looks like most extant parsers accept (Searching the web for that .NET error message -- "The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF" -- is a good way to find other examples of broken servers. Apparently .NET is the one HTTP client that really enforces this. The error message is emitted any time a .NET client sees a CR without an LF, or an LF without a CR, unless its been configured in this horrible global be-sloppy-and-insecure more.) I'm not 100% convinced we need to accept mixed headers, but it would probably be good to at least error out early if we see lone OTOH it looks like for chunked encoding, at least |
Hello @njsmith Problem descriptionI'm sorry for bumping up so old ticket, but not so far ago I stuck with some problem in As the server is a bit obsolete, and used in embedded development, it might be impossible to update it. May be, operating with such systems, we might be a bit more tolerant. Some extra details:Server sends a Response like that: hexdump of the responseMore details you can find in encode/httpx#1378 and Kane610/axis#55
As you can see, the headers section ends with: The questionDo you think, if it's possible to make |
It's definitely possible and we should do it. Just, no one has written the code yet :-) It'll be a bit hacky because of how our buffering works. Right now we search for |
v0.12.0 (2021-01-01) -------------------- Features ~~~~~~~~ - Added support for servers with broken line endings. After this change h11 accepts both ``\r\n`` and ``\n`` as a headers delimiter. (`#7 <https://github.com/python-hyper/h11/issues/7>`__) - Add early detection of invalid http data when request line starts with binary (`#122 <https://github.com/python-hyper/h11/issues/122>`__) (NEWS truncated at 15 lines)
I'm informed that some allegedly-HTTP servers use
\r
or\n
for line endings, and that robust clients need to support this too.Sigh.
Maybe the thing to do is to watch for
\r\r
or\n\n
in the headers block and then go into a special case mode if we see it?The text was updated successfully, but these errors were encountered: