Make email/message.py read headers more robustly #123742

zsolt-sb · 2024-09-05T19:34:13Z

Bug report

Bug description:

if variable 'attachment' is of type email.message.Message

attachment.get_payload(decode=True)

Will yield the base64 content rather than the unecoded content if the sender provides the header 'base64 '

IE

attachment['Content-Transfer-Encoding']
'base64 '  <-notice the extra space

If we adjusted this code
https://github.com/python/cpython/blob/3.12/Lib/email/message.py#L290
to also strip(), this would resolve those edge cases where email clients erroneously add spaces.

This was an actual problem I have run into with a vendor.

CPython versions tested on:

3.9, 3.12

Operating systems tested on:

macOS

Linked PRs

gh-123742: Fixed get_payload not being able to parse headers with spaces #123761

The text was updated successfully, but these errors were encountered:

picnixz · 2024-09-05T20:08:57Z

CPython versions tested on: 3.9

Does the behaviour change in 3.10, 3.11, 3.12, 3.13 or 3.14? (I don't know by heart every feature that has been added and whether this behaviour has been changed). In addition, I'm not sure whether this should be considered as a security issue or not (if this is not the case, 3.8 to 3.11 won't get any updates since they are security-only).

zsolt-sb · 2024-09-05T22:43:06Z

Ah yes, I was using 3.9, but the code example I am pointing to is 3.12. It has not changed since [3.9](https://github.com/python/cpython/blob/3.9/Lib/email/message.py#L258)

rruuaanng · 2024-09-06T00:37:08Z

Bug report

Bug description:

if variable 'attachment' is of type email.message.Message
attachment.get_payload(decode=True) 
Will yield the base64 content rather than the unecoded content if the sender provides the header 'base64 '

IE
attachment['Content-Transfer-Encoding']
'base64 '  <-notice the extra space
If we adjusted this code https://github.com/python/cpython/blob/3.12/Lib/email/message.py#L290 to also strip(), this would resolve those edge cases where email clients erroneously add spaces.

This was an actual problem I have run into with a vendor.

CPython versions tested on:

3.9, 3.12

Operating systems tested on:

macOS

Could you show me the code that's causing the error?¹

Excessive verbose quoting hidden by @erlend-aasland ↩

zsolt-sb · 2024-09-06T03:07:28Z

Could you show me the code that's causing the error？

Minimal python code

import email
import mimetypes
import os

with open('example.eml') as fp:
    msg = email.message_from_string(fp.read())
    counter = 1
    for part in msg.walk():
        if part.get_content_maintype() == 'multipart':
            continue
        filename = part.get_filename()
        if not filename:
            ext = mimetypes.guess_extension(part.get_content_type())
            filename = f'part-{counter:03d}{ext}'
        counter += 1
        with open(os.path.join('.', filename), 'wb') as fp:
            fp.write(part.get_payload(decode=True))

This code will write out the the attachment as well as the text. What is important here though is the actual data
Attached is a bare bones email file. Please remove the .txt once downloaded.

If you open with something like the thunderbird email client, you can successfully download report named TheReport.csv. If you run the above sample code you will see TheReport.csv will come back still base64 encoded. If you look into the file with vi, you can see the space after the base64 declaration.

example.eml.txt

erlend-aasland · 2024-11-12T08:49:33Z

Instead of silently accepting and normalising ill-formatted fields, another option could be to raise an exception. What do you think, @picnixz?

picnixz · 2024-11-12T10:04:00Z

Since the current behaviour is anyway buggy, whether we fix it or raise an exception or do something else shouldn't really break any existing code (existing code is already "broken" in some sense). So here are a few suggestions:

Let's have a look at whether the RFC for email and headers (if any) specify how to handle those cases. If the grammar does not allow for extra whitespaces or if one shouldn't sanitize the input, then I think we should not implicitly fix the input.
Note that we already do some normalization by lowercasing the name so it's not entirely inconsistent nor incoherent to also strip leading and trailing whitespaces.
More generally, sanitizing ill-formed fields may probably be what users expect. However, the boundary between what we clean up and what we don't should be well drawn and probably documented (just in case and because it could help debugging if someone looks at their input and wonder why this or that happened).

So, I have the following proposal: we either raise an exception saying that base64 is not recognized or we emit a warning saying that it's being ignored.

Now, I'm always pondering whether warnings are even used in practice. I feel that there are two categories of people: those for which warnings are ignored and those for which no warning should be emitted. So, emitting a warning for the first category is useless while for the other, an exception would be better.

One way to be flexible is to add a strict keyword which makes the method raise an exception if the CTE is not recognized and a normalize keyword which would normalize the headers as much as possible (in addition to lowercasing which will always be done for comparison purposes). But those additional parameters may not necessarily be changed in practice so we should take strict=False and normalize=True by default.

For instance:

def get_payload(self, i=None, decode=False, *, sanitize=True, strict=False):
	... 
	raw_cte = self.get('content-transfer-encoding')
    cte = str(raw_cte or '').lower()
	if sanitize:
		cte = cte.strip()
		# do other stuff in the future maybe
	...
	if cte == ...: ...
	elif cte in (...): ...
	elif cte in (...): ...
	elif raw_cte is not None:
		if strict:
			raise ValueError(f"unknown content-transfer-encoding: {raw_cte!r}")
		else:
			import warnings
			# maybe introduce a specific warning class instead of using UserWarning
			warnings.warn(f"unknown content-transfer-encoding: {raw_cte!r}")
	...

Emitting a warning may not be needed and we could just silently ignore the CTE. What do you think of this proposal @erlend-aasland?

rruuaanng · 2024-11-26T04:52:12Z

That is to say, we should default that the message body must be correct, otherwise it will not conform to rfc, right?
(Instead of implicitly processing it.)

picnixz · 2024-11-26T15:29:25Z

I don't know if the corresponding RFC (if any) specifies the allowed header formats. For instance, if the RFC allows for trailing whitespaces to exist and if it also says that they are to be ignored, then we can just call strip(). If it explicitly states that the header must follow some specific pattern, then training whitespaces should be warned (and stripped) or an exception should be raised.

From a practical PoV it makes sense to automatically strip them. It's the easiest and probably the most natural way to parse such header. But if we want to be RFC-compliant, we may need to do something else. I don't have time for checking the RFC so any decision should be made after the RFC is consulted (again, this becomes a non-issue if the RFC is vague enough, and we would likely choose auto-normalization for practical reasons).

I don't think we'll have an issue in the future because I don't see how a trailing whitespace should be considered (namely, what would base64 (with trailing whitespace) mean if not base64?). In addition, we already use a case-insensitive comparison.

Btw, it may be interesting to see how Perl deals with that (I usually use Perl for mail-related stuff).

rruuaanng · 2024-11-26T15:46:47Z

As far as I remember, RFC822 describes everything about MIME. However, it's quite lenient regarding the handling of spaces, meaning the behavior is determined by the parser. I believe it's
essential to remove field spaces, as it can affect the parsing behavior.

rruuaanng · 2025-01-08T01:58:57Z

We can close this issue as it is a duplicate of issue #98188, and the upstream has already resolved it. Thank you all for your attention to this matter :). @picnixz @erlend-aasland

zsolt-sb added the type-bug An unexpected behavior, bug, or error label Sep 5, 2024

picnixz added stdlib Python modules in the Lib dir topic-email labels Sep 5, 2024

bedevere-app bot mentioned this issue Sep 6, 2024

gh-123742: Fixed get_payload not being able to parse headers with spaces #123761

Closed

This comment was marked as outdated.

Sign in to view

erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make email/message.py read headers more robustly #123742

Make email/message.py read headers more robustly #123742

zsolt-sb commented Sep 5, 2024 •

edited by bedevere-app bot

Loading

picnixz commented Sep 5, 2024

Uh oh!

zsolt-sb commented Sep 5, 2024

Uh oh!

rruuaanng commented Sep 6, 2024 •

edited by erlend-aasland

Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Uh oh!

zsolt-sb commented Sep 6, 2024 •

edited by hugovk

Loading

Uh oh!

This comment was marked as outdated.

erlend-aasland commented Nov 12, 2024

Uh oh!

picnixz commented Nov 12, 2024 •

edited

Loading

Uh oh!

rruuaanng commented Nov 26, 2024

Uh oh!

picnixz commented Nov 26, 2024 •

edited

Loading

Uh oh!

rruuaanng commented Nov 26, 2024

Uh oh!

rruuaanng commented Jan 8, 2025

Uh oh!

Uh oh!

Make email/message.py read headers more robustly #123742

Make email/message.py read headers more robustly #123742

Comments

zsolt-sb commented Sep 5, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

picnixz commented Sep 5, 2024

Uh oh!

zsolt-sb commented Sep 5, 2024

Uh oh!

rruuaanng commented Sep 6, 2024 • edited by erlend-aasland Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Footnotes

Uh oh!

zsolt-sb commented Sep 6, 2024 • edited by hugovk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

erlend-aasland commented Nov 12, 2024

Uh oh!

picnixz commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rruuaanng commented Nov 26, 2024

Uh oh!

picnixz commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rruuaanng commented Nov 26, 2024

Uh oh!

rruuaanng commented Jan 8, 2025

Uh oh!

zsolt-sb commented Sep 5, 2024 •

edited by bedevere-app bot

Loading

rruuaanng commented Sep 6, 2024 •

edited by erlend-aasland

Loading

zsolt-sb commented Sep 6, 2024 •

edited by hugovk

Loading

picnixz commented Nov 12, 2024 •

edited

Loading

picnixz commented Nov 26, 2024 •

edited

Loading