Support for servers with broken line endings #7

njsmith · 2016-06-04T18:26:38Z

I'm informed that some allegedly-HTTP servers use \r or \n for line endings, and that robust clients need to support this too.

Sigh.

Maybe the thing to do is to watch for \r\r or \n\n in the headers block and then go into a special case mode if we see it?

The text was updated successfully, but these errors were encountered:

njsmith · 2016-06-04T18:27:11Z

(NB I'm probably not going to work on this any time soon)

njsmith · 2016-06-04T18:40:58Z

CC: @durin42 since it's his fault I know about this

Lukasa · 2016-06-04T20:06:13Z

From RFC 7230 Section 3.5 (Message Parsing Robustness):

Although the line terminator for the start-line and header fields is the sequence CRLF, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR.

This suggests that allowing simply \n and not \r is acceptable, which I think is probably the way to go. While technically just '\r might happen, I really don't think it's worth it in terms of compat cost.

durin42 · 2016-06-04T20:18:04Z

On Jun 4, 2016, at 4:06 PM, Cory Benfield [email protected] wrote:

This suggests that allowing simply \n and not \r is acceptable, which I think is probably the way to go. While technically just '\r might happen, I really don't think it's worth it in terms of compat cost.

Sadly, I’ve seen \r without \n in the wild. :(

Lukasa · 2016-06-04T20:19:00Z

Oh, so have I. But I am a strong advocate of telling implementations that haven't read RFCs to go take a long walk off a short pier.

njsmith · 2016-06-04T20:51:25Z

I'm not sure it makes much difference in terms of implementation simplicity, because h11's trick for simple fast HTTP header parsing is use bytearray.find("\r\n\r\n"), so we'd still have to duplicate this check to look for \n\n at least.

BTW, I assume that servers that screw this up for headers also screw it up for chunked encoding?

Lukasa · 2016-06-04T20:56:41Z

@njsmith No need to duplicate the check: better to swap to a fairly simple regex. Either way you desperately need to avoid having too much cleverness here in Python, because each time you come back into Python code to process the bytestream you slow down a load on CPython (which for better or worse is what users will use to judge your perf).

njsmith · 2016-06-04T22:19:11Z

@Lukasa: yeah, agreed that re.search is probably the way to go, and that's why we're already doing header framing handling in a single call into the C runtime... but my point is that if we were doing line-by-line scanning then searching for \n would be just as easy as searching for \r\n (and would handle \r\n implicitly)... but since we're scanning for the whole headers at once then we need to do some sort of explicit handling of both \r\n\r\n and \n\n, and once you have two cases then adding a third probably isn't that hard.

njsmith · 2016-12-08T21:54:39Z

More notes on this for future reference:

It looks like most extant parsers accept \n or \r\n on a line-by-line basis (i.e., even mixed headers are accepted). Here's an example of a broken API that returned a mix of both: http://mehdi.me/a-tale-of-debugging-the-linkedin-api-net-and-http-protocol-violations/

(Searching the web for that .NET error message -- "The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF" -- is a good way to find other examples of broken servers. Apparently .NET is the one HTTP client that really enforces this. The error message is emitted any time a .NET client sees a CR without an LF, or an LF without a CR, unless its been configured in this horrible global be-sloppy-and-insecure more.)

I'm not 100% convinced we need to accept mixed headers, but it would probably be good to at least error out early if we see lone \r or \n -- right now if we'll just block forever waiting for \r\n\r\n, which might never arrive.

OTOH it looks like for chunked encoding, at least http.client and go both require \r\n only, no exceptions. So we probably don't need to worry about that. I guess server implementors who are too lazy to implement headers properly are also too lazy to implement chunked encoding.

cdeler · 2020-11-05T15:51:50Z

Hello @njsmith

Problem description

I'm sorry for bumping up so old ticket, but not so far ago I stuck with some problem in httpx (encode/httpx#1378) which is caused by an old webserver (there is an assumption that it's Boa web server), which splits Headers and Body with "\n".

As the server is a bit obsolete, and used in embedded development, it might be impossible to update it. May be, operating with such systems, we might be a bit more tolerant.

Some extra details:

Server sends a Response like that:

hexdump of the response

More details you can find in encode/httpx#1378 and Kane610/axis#55

00000000  48 54 54 50 2f 31 2e 31  20 32 30 30 20 4f 4b 0d   HTTP/1.1  200 OK.
00000010  0a 58 2d 46 72 61 6d 65  2d 4f 70 74 69 6f 6e 73   .X-Frame -Options
00000020  3a 20 73 61 6d 65 6f 72  69 67 69 6e 0d 0a 44 61   : sameor igin..Da
00000030  74 65 3a 20 54 68 75 2c  20 32 39 20 4f 63 74 20   te: Thu,  29 Oct 
00000040  32 30 32 30 20 31 31 3a  33 37 3a 34 30 20 47 4d   2020 11: 37:40 GM
00000050  54 0d 0a 41 63 63 65 70  74 2d 52 61 6e 67 65 73   T..Accep t-Ranges
00000060  3a 20 62 79 74 65 73 0d  0a 43 6f 6e 6e 65 63 74   : bytes. .Connect
00000070  69 6f 6e 3a 20 63 6c 6f  73 65 0d 0a 41 75 74 68   ion: clo se..Auth
00000080  65 6e 74 69 63 61 74 69  6f 6e 2d 49 6e 66 6f 3a   enticati on-Info:
00000090  20 71 6f 70 3d 61 75 74  68 2c 20 72 73 70 61 75    qop=aut h, rspau
000000A0  74 68 3d 22 37 39 65 62  61 37 39 63 62 63 30 30   th="79eb a79cbc00
000000B0  64 38 61 35 32 37 62 31  63 31 39 64 34 62 34 33   d8a527b1 c19d4b43
000000C0  65 35 31 31 22 2c 20 63  6e 6f 6e 63 65 3d 22 38   e511", c nonce="8
000000D0  61 64 32 37 34 39 33 35  38 62 61 39 39 38 39 22   ad274935 8ba9989"
000000E0  2c 20 6e 63 3d 30 30 30  30 30 30 30 31 0d 0a 43   , nc=000 00001..C
000000F0  6f 6e 74 65 6e 74 2d 74  79 70 65 3a 20 74 65 78   ontent-t ype: tex
00000100  74 2f 70 6c 61 69 6e 0a  0a                        t/plain. .
00000109  72 6f 6f 74 3d 22 72 6f  6f 74 22 0a 62 69 6e 3d   root="ro ot".bin=
00000119  22 72 6f 6f 74 2c 62 69  6e 2c 64 61 65 6d 6f 6e   "root,bi n,daemon
00000129  22 0a 64 61 65 6d 6f 6e  3d 22 72 6f 6f 74 2c 62   ".daemon ="root,b
00000139  69 6e 2c 64 61 65 6d 6f  6e 22 0a 73 79 73 3d 22   in,daemo n".sys="
00000149  72 6f 6f 74 2c 62 69 6e  22 0a 74 74 79 3d 22 73   root,bin ".tty="s
...

As you can see, the headers section ends with:
00000100 74 2f 70 6c 61 69 6e 0a 0a t/plain. . i.e. with \n\n

The question

Do you think, if it's possible to make h11 less strict and to allow h11 accepting the LF as a response header/body delimiter?

njsmith · 2020-11-05T20:58:30Z

It's definitely possible and we should do it. Just, no one has written the code yet :-)

It'll be a bit hacky because of how our buffering works. Right now we search for \r\n\r\n and track how much of the string we've searched. For this I think the fastest approach will be to use a regex to search for \n\r?\n, and also track how much we've searched.

cdeler · 2020-11-10T09:54:13Z

Hello @njsmith ,

Is it possible to ask you to review the PR associated with this ticket (#115)?

v0.12.0 (2021-01-01) -------------------- Features ~~~~~~~~ - Added support for servers with broken line endings. After this change h11 accepts both ``\r\n`` and ``\n`` as a headers delimiter. (`#7 <https://github.com/python-hyper/h11/issues/7>`__) - Add early detection of invalid http data when request line starts with binary (`#122 <https://github.com/python-hyper/h11/issues/122>`__) (NEWS truncated at 15 lines)

njsmith mentioned this issue Dec 8, 2016

[WIP] Remove httplib. urllib3/urllib3#1067

Closed

njsmith mentioned this issue Jun 21, 2017

discussion: what does mitmproxy need? #47

Closed

njsmith mentioned this issue Mar 20, 2018

Fails to decode a header that requests can handle #57

Closed

njsmith mentioned this issue Dec 3, 2018

Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks python-trio/trio#796

Open

cdeler mentioned this issue Nov 5, 2020

httpx cannot process http/1.1 responses with headers and body delimited by \n\n or \n\n\n\n (but requests can) encode/httpx#1378

Closed

2 tasks

This was referenced Nov 5, 2020

REST sensors show as "Unavailable" state after upgrading to Home Assistant 0.117.0 home-assistant/core#42608

Closed

Rest sensor KO since 0.117 home-assistant/core#42589

Closed

cdeler mentioned this issue Nov 6, 2020

Added ability to use LF, not only CRLF delimiter for response Headers and Body #115

Merged

pgjones closed this as completed in #115 Dec 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for servers with broken line endings #7

Support for servers with broken line endings #7

njsmith commented Jun 4, 2016

njsmith commented Jun 4, 2016

njsmith commented Jun 4, 2016

Lukasa commented Jun 4, 2016

durin42 commented Jun 4, 2016

Lukasa commented Jun 4, 2016

njsmith commented Jun 4, 2016

Lukasa commented Jun 4, 2016

njsmith commented Jun 4, 2016

njsmith commented Dec 8, 2016

cdeler commented Nov 5, 2020 •

edited

Loading

njsmith commented Nov 5, 2020

cdeler commented Nov 10, 2020

Support for servers with broken line endings #7

Support for servers with broken line endings #7

Comments

njsmith commented Jun 4, 2016

njsmith commented Jun 4, 2016

njsmith commented Jun 4, 2016

Lukasa commented Jun 4, 2016

durin42 commented Jun 4, 2016

Lukasa commented Jun 4, 2016

njsmith commented Jun 4, 2016

Lukasa commented Jun 4, 2016

njsmith commented Jun 4, 2016

njsmith commented Dec 8, 2016

cdeler commented Nov 5, 2020 • edited Loading

Problem description

Some extra details:

The question

njsmith commented Nov 5, 2020

cdeler commented Nov 10, 2020

cdeler commented Nov 5, 2020 •

edited

Loading