Skip to content

Support for servers with broken line endings #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
njsmith opened this issue Jun 4, 2016 · 12 comments · Fixed by #115
Closed

Support for servers with broken line endings #7

njsmith opened this issue Jun 4, 2016 · 12 comments · Fixed by #115

Comments

@njsmith
Copy link
Member

njsmith commented Jun 4, 2016

I'm informed that some allegedly-HTTP servers use \r or \n for line endings, and that robust clients need to support this too.

Sigh.

Maybe the thing to do is to watch for \r\r or \n\n in the headers block and then go into a special case mode if we see it?

@njsmith
Copy link
Member Author

njsmith commented Jun 4, 2016

(NB I'm probably not going to work on this any time soon)

@njsmith
Copy link
Member Author

njsmith commented Jun 4, 2016

CC: @durin42 since it's his fault I know about this

@Lukasa
Copy link
Member

Lukasa commented Jun 4, 2016

From RFC 7230 Section 3.5 (Message Parsing Robustness):

Although the line terminator for the start-line and header fields is the sequence CRLF, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR.

This suggests that allowing simply \n and not \r is acceptable, which I think is probably the way to go. While technically just '\r might happen, I really don't think it's worth it in terms of compat cost.

@durin42
Copy link

durin42 commented Jun 4, 2016

On Jun 4, 2016, at 4:06 PM, Cory Benfield [email protected] wrote:

This suggests that allowing simply \n and not \r is acceptable, which I think is probably the way to go. While technically just '\r might happen, I really don't think it's worth it in terms of compat cost.

Sadly, I’ve seen \r without \n in the wild. :(

@Lukasa
Copy link
Member

Lukasa commented Jun 4, 2016

Oh, so have I. But I am a strong advocate of telling implementations that haven't read RFCs to go take a long walk off a short pier.

@njsmith
Copy link
Member Author

njsmith commented Jun 4, 2016

I'm not sure it makes much difference in terms of implementation simplicity, because h11's trick for simple fast HTTP header parsing is use bytearray.find("\r\n\r\n"), so we'd still have to duplicate this check to look for \n\n at least.

BTW, I assume that servers that screw this up for headers also screw it up for chunked encoding?

@Lukasa
Copy link
Member

Lukasa commented Jun 4, 2016

@njsmith No need to duplicate the check: better to swap to a fairly simple regex. Either way you desperately need to avoid having too much cleverness here in Python, because each time you come back into Python code to process the bytestream you slow down a load on CPython (which for better or worse is what users will use to judge your perf).

@njsmith
Copy link
Member Author

njsmith commented Jun 4, 2016

@Lukasa: yeah, agreed that re.search is probably the way to go, and that's why we're already doing header framing handling in a single call into the C runtime... but my point is that if we were doing line-by-line scanning then searching for \n would be just as easy as searching for \r\n (and would handle \r\n implicitly)... but since we're scanning for the whole headers at once then we need to do some sort of explicit handling of both \r\n\r\n and \n\n, and once you have two cases then adding a third probably isn't that hard.

@njsmith
Copy link
Member Author

njsmith commented Dec 8, 2016

More notes on this for future reference:

It looks like most extant parsers accept \n or \r\n on a line-by-line basis (i.e., even mixed headers are accepted). Here's an example of a broken API that returned a mix of both: http://mehdi.me/a-tale-of-debugging-the-linkedin-api-net-and-http-protocol-violations/

(Searching the web for that .NET error message -- "The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF" -- is a good way to find other examples of broken servers. Apparently .NET is the one HTTP client that really enforces this. The error message is emitted any time a .NET client sees a CR without an LF, or an LF without a CR, unless its been configured in this horrible global be-sloppy-and-insecure more.)

I'm not 100% convinced we need to accept mixed headers, but it would probably be good to at least error out early if we see lone \r or \n -- right now if we'll just block forever waiting for \r\n\r\n, which might never arrive.

OTOH it looks like for chunked encoding, at least http.client and go both require \r\n only, no exceptions. So we probably don't need to worry about that. I guess server implementors who are too lazy to implement headers properly are also too lazy to implement chunked encoding.

@cdeler
Copy link
Contributor

cdeler commented Nov 5, 2020

Hello @njsmith

Problem description

I'm sorry for bumping up so old ticket, but not so far ago I stuck with some problem in httpx (encode/httpx#1378) which is caused by an old webserver (there is an assumption that it's Boa web server), which splits Headers and Body with "\n".

As the server is a bit obsolete, and used in embedded development, it might be impossible to update it. May be, operating with such systems, we might be a bit more tolerant.

Some extra details:

Server sends a Response like that:

hexdump of the response

More details you can find in encode/httpx#1378 and Kane610/axis#55

00000000  48 54 54 50 2f 31 2e 31  20 32 30 30 20 4f 4b 0d   HTTP/1.1  200 OK.
00000010  0a 58 2d 46 72 61 6d 65  2d 4f 70 74 69 6f 6e 73   .X-Frame -Options
00000020  3a 20 73 61 6d 65 6f 72  69 67 69 6e 0d 0a 44 61   : sameor igin..Da
00000030  74 65 3a 20 54 68 75 2c  20 32 39 20 4f 63 74 20   te: Thu,  29 Oct 
00000040  32 30 32 30 20 31 31 3a  33 37 3a 34 30 20 47 4d   2020 11: 37:40 GM
00000050  54 0d 0a 41 63 63 65 70  74 2d 52 61 6e 67 65 73   T..Accep t-Ranges
00000060  3a 20 62 79 74 65 73 0d  0a 43 6f 6e 6e 65 63 74   : bytes. .Connect
00000070  69 6f 6e 3a 20 63 6c 6f  73 65 0d 0a 41 75 74 68   ion: clo se..Auth
00000080  65 6e 74 69 63 61 74 69  6f 6e 2d 49 6e 66 6f 3a   enticati on-Info:
00000090  20 71 6f 70 3d 61 75 74  68 2c 20 72 73 70 61 75    qop=aut h, rspau
000000A0  74 68 3d 22 37 39 65 62  61 37 39 63 62 63 30 30   th="79eb a79cbc00
000000B0  64 38 61 35 32 37 62 31  63 31 39 64 34 62 34 33   d8a527b1 c19d4b43
000000C0  65 35 31 31 22 2c 20 63  6e 6f 6e 63 65 3d 22 38   e511", c nonce="8
000000D0  61 64 32 37 34 39 33 35  38 62 61 39 39 38 39 22   ad274935 8ba9989"
000000E0  2c 20 6e 63 3d 30 30 30  30 30 30 30 31 0d 0a 43   , nc=000 00001..C
000000F0  6f 6e 74 65 6e 74 2d 74  79 70 65 3a 20 74 65 78   ontent-t ype: tex
00000100  74 2f 70 6c 61 69 6e 0a  0a                        t/plain. .
00000109  72 6f 6f 74 3d 22 72 6f  6f 74 22 0a 62 69 6e 3d   root="ro ot".bin=
00000119  22 72 6f 6f 74 2c 62 69  6e 2c 64 61 65 6d 6f 6e   "root,bi n,daemon
00000129  22 0a 64 61 65 6d 6f 6e  3d 22 72 6f 6f 74 2c 62   ".daemon ="root,b
00000139  69 6e 2c 64 61 65 6d 6f  6e 22 0a 73 79 73 3d 22   in,daemo n".sys="
00000149  72 6f 6f 74 2c 62 69 6e  22 0a 74 74 79 3d 22 73   root,bin ".tty="s
...

As you can see, the headers section ends with:
00000100 74 2f 70 6c 61 69 6e 0a 0a t/plain. . i.e. with \n\n

The question

Do you think, if it's possible to make h11 less strict and to allow h11 accepting the LF as a response header/body delimiter?

@njsmith
Copy link
Member Author

njsmith commented Nov 5, 2020

It's definitely possible and we should do it. Just, no one has written the code yet :-)

It'll be a bit hacky because of how our buffering works. Right now we search for \r\n\r\n and track how much of the string we've searched. For this I think the fastest approach will be to use a regex to search for \n\r?\n, and also track how much we've searched.

@cdeler
Copy link
Contributor

cdeler commented Nov 10, 2020

Hello @njsmith ,

Is it possible to ask you to review the PR associated with this ticket (#115)?

clrpackages referenced this issue in clearlinux-pkgs/h11 Jan 5, 2021
v0.12.0 (2021-01-01)
--------------------

Features
~~~~~~~~

- Added support for servers with broken line endings.

  After this change h11 accepts both ``\r\n`` and ``\n`` as a headers
  delimiter. (`#7 <https://github.com/python-hyper/h11/issues/7>`__)
- Add early detection of invalid http data when request line starts
  with binary (`#122
  <https://github.com/python-hyper/h11/issues/122>`__)

(NEWS truncated at 15 lines)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants