-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Make email/message.py read headers more robustly #123742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Does the behaviour change in 3.10, 3.11, 3.12, 3.13 or 3.14? (I don't know by heart every feature that has been added and whether this behaviour has been changed). In addition, I'm not sure whether this should be considered as a security issue or not (if this is not the case, 3.8 to 3.11 won't get any updates since they are security-only). |
Ah yes, I was using 3.9, but the code example I am pointing to is 3.12. It has not changed since [3.9](https://github.com/python/cpython/blob/3.9/Lib/email/message.py#L258) |
Could you show me the code that's causing the error?1 Footnotes
|
Minimal python code import email
import mimetypes
import os
with open('example.eml') as fp:
msg = email.message_from_string(fp.read())
counter = 1
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
filename = part.get_filename()
if not filename:
ext = mimetypes.guess_extension(part.get_content_type())
filename = f'part-{counter:03d}{ext}'
counter += 1
with open(os.path.join('.', filename), 'wb') as fp:
fp.write(part.get_payload(decode=True)) This code will write out the the attachment as well as the text. What is important here though is the actual data If you open with something like the thunderbird email client, you can successfully download report named TheReport.csv. If you run the above sample code you will see TheReport.csv will come back still base64 encoded. If you look into the file with vi, you can see the space after the base64 declaration. |
This comment was marked as outdated.
This comment was marked as outdated.
Instead of silently accepting and normalising ill-formatted fields, another option could be to raise an exception. What do you think, @picnixz? |
Since the current behaviour is anyway buggy, whether we fix it or raise an exception or do something else shouldn't really break any existing code (existing code is already "broken" in some sense). So here are a few suggestions:
So, I have the following proposal: we either raise an exception saying that Now, I'm always pondering whether warnings are even used in practice. I feel that there are two categories of people: those for which warnings are ignored and those for which no warning should be emitted. So, emitting a warning for the first category is useless while for the other, an exception would be better. One way to be flexible is to add a For instance: def get_payload(self, i=None, decode=False, *, sanitize=True, strict=False):
...
raw_cte = self.get('content-transfer-encoding')
cte = str(raw_cte or '').lower()
if sanitize:
cte = cte.strip()
# do other stuff in the future maybe
...
if cte == ...: ...
elif cte in (...): ...
elif cte in (...): ...
elif raw_cte is not None:
if strict:
raise ValueError(f"unknown content-transfer-encoding: {raw_cte!r}")
else:
import warnings
# maybe introduce a specific warning class instead of using UserWarning
warnings.warn(f"unknown content-transfer-encoding: {raw_cte!r}")
... Emitting a warning may not be needed and we could just silently ignore the CTE. What do you think of this proposal @erlend-aasland? |
That is to say, we should default that the message body must be correct, otherwise it will not conform to rfc, right? |
I don't know if the corresponding RFC (if any) specifies the allowed header formats. For instance, if the RFC allows for trailing whitespaces to exist and if it also says that they are to be ignored, then we can just call From a practical PoV it makes sense to automatically strip them. It's the easiest and probably the most natural way to parse such header. But if we want to be RFC-compliant, we may need to do something else. I don't have time for checking the RFC so any decision should be made after the RFC is consulted (again, this becomes a non-issue if the RFC is vague enough, and we would likely choose auto-normalization for practical reasons). I don't think we'll have an issue in the future because I don't see how a trailing whitespace should be considered (namely, what would Btw, it may be interesting to see how Perl deals with that (I usually use Perl for mail-related stuff). |
As far as I remember, |
We can close this issue as it is a duplicate of issue #98188, and the upstream has already resolved it. Thank you all for your attention to this matter :). @picnixz @erlend-aasland |
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Bug description:
if variable 'attachment' is of type email.message.Message
Will yield the base64 content rather than the unecoded content if the sender provides the header 'base64 '
IE
If we adjusted this code
https://github.com/python/cpython/blob/3.12/Lib/email/message.py#L290
to also strip(), this would resolve those edge cases where email clients erroneously add spaces.
This was an actual problem I have run into with a vendor.
CPython versions tested on:
3.9, 3.12
Operating systems tested on:
macOS
Linked PRs
get_payload
not being able to parse headers with spaces #123761The text was updated successfully, but these errors were encountered: