Skip to content

email.message.EmailMessage accepts invalid header field names without error, which raise an error when parsed #127794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TauPan opened this issue Dec 10, 2024 · 6 comments
Labels
stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error

Comments

@TauPan
Copy link

TauPan commented Dec 10, 2024

Bug report

Bug description:

email.message.EmailMessage accepts invalid header field names without error, which raise an error when parsed, regardless of policy and causes corrupt emails.

Case in point (with python 3.13.1 installed via pyenv, occurs in 3.11
and earlier as well):

delgado@tuxedo-e101776:~> python3.13
Python 3.13.1 (main, Dec 10 2024, 15:13:47) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import email.message
>>> message = email.message.EmailMessage()
>>> message.add_header('From', '[email protected]')
None
>>> message.add_header('To', '[email protected]')
None
>>> message.add_header('Subject', 'Example Subject')
None
>>> message.add_header('Invalid Header', 'Contains a space, which is illegal')
None
>>> message.add_header('X-Valid Header', 'Custom header as recommended')
None
>>> message.set_content('Hello, this is an example!')
None
>>> message.defects
[]
>>> message._headers
[('From', '[email protected]'),
 ('To', '[email protected]'),
 ('Subject', 'Example Subject'),
 ('Invalid Header', 'Contains a space, which is illegal'),
 ('X-Valid Header', 'Custom header as recommended'),
 ('Content-Type', 'text/plain; charset="utf-8"'),
 ('Content-Transfer-Encoding', '7bit'),
 ('MIME-Version', '1.0')]
>>> message.as_string()
('From: [email protected]\n'
 'To: [email protected]\n'
 'Subject: Example Subject\n'
 'Invalid Header: Contains a space, which is illegal\n'
 'X-Valid Header: Custom header as recommended\n'
 'Content-Type: text/plain; charset="utf-8"\n'
 'Content-Transfer-Encoding: 7bit\n'
 'MIME-Version: 1.0\n'
 '\n'
 'Hello, this is an example!\n')
>>> message.policy
EmailPolicy()
>>> msg_string = message.as_string()
>>> msg_string
('From: [email protected]\n'
 'To: [email protected]\n'
 'Subject: Example Subject\n'
 'Invalid Header: Contains a space, which is illegal\n'
 'X-Valid Header: Custom header as recommended\n'
 'Content-Type: text/plain; charset="utf-8"\n'
 'Content-Transfer-Encoding: 7bit\n'
 'MIME-Version: 1.0\n'
 '\n'
 'Hello, this is an example!\n')
>>> import email.parser
>>> parsed_message = email.parser.Parser().parsestr(msg_string)
>>> parsed_message._headers
[('From', '[email protected]'),
 ('To', '[email protected]'),
 ('Subject', 'Example Subject')]
>>> parsed_message.as_string()
('From: [email protected]\n'
 'To: [email protected]\n'
 'Subject: Example Subject\n'
 '\n'
 'Invalid Header: Contains a space, which is illegal\n'
 'X-Valid Header: Custom header as recommended\n'
 'Content-Type: text/plain; charset="utf-8"\n'
 'Content-Transfer-Encoding: 7bit\n'
 'MIME-Version: 1.0\n'
 '\n'
 'Hello, this is an example!\n')
>>> parsed_message.policy
Compat32()
>>> parsed_message.defects
[MissingHeaderBodySeparatorDefect()]
>>> import email.policy
>>> parsed_message_strict = email.parser.Parser(policy=email.policy.strict).parsestr(msg_string)
Traceback (most recent call last):
  File "<python-input-19>", line 1, in <module>
    parsed_message_strict = email.parser.Parser(policy=email.policy.strict).parsestr(msg_string)
  File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/parser.py", line 64, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/parser.py", line 53, in parse
    feedparser.feed(data)
    ~~~~~~~~~~~~~~~^^^^^^
  File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/feedparser.py", line 176, in feed
    self._call_parse()
    ~~~~~~~~~~~~~~~~^^
  File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/feedparser.py", line 180, in _call_parse
    self._parse()
    ~~~~~~~~~~~^^
  File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/feedparser.py", line 234, in _parsegen
    self.policy.handle_defect(self._cur, defect)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/_policybase.py", line 193, in handle_defect
    raise defect
email.errors.MissingHeaderBodySeparatorDefect
>>> parsed_message_nonstrict = email.parser.Parser(policy=email.policy.default).parsestr(msg_string)
>>> parsed_message_nonstrict.as_string()
('From: [email protected]\n'
 'To: [email protected]\n'
 'Subject: Example Subject\n'
 '\n'
 'Invalid Header: Contains a space, which is illegal\n'
 'X-Valid Header: Custom header as recommended\n'
 'Content-Type: text/plain; charset="utf-8"\n'
 'Content-Transfer-Encoding: 7bit\n'
 'MIME-Version: 1.0\n'
 '\n'
 'Hello, this is an example!\n')
>>> parsed_message_nonstrict.defects
[MissingHeaderBodySeparatorDefect()]

The illegal header field name is accepted by EmailMessage without a defect, but when the resulting message is parsed, regardless of policy, it looks to me like header parsing stops at that point and the line with the defect header is viewed as first line of the body, which leads to the MissingHeaderBodySeparatorDefect.

It's interesting that email.headers contains the following:

# Field name regexp, including trailing colon, but not separating whitespace,
# according to RFC 2822.  Character range is from tilde to exclamation mark.
# For use with .match()
fcre = re.compile(r'[\041-\176]+:$')

which is the correct regex according to the rfc, including the final colon, which apparently isn't used anywhere in the code.

A MUA (such as claws or mutt) will display the resulting email with the remaining headers as part of the body, breaking any mime multipart rendering.

CPython versions tested on:

3.11, 3.13

Operating systems tested on:

Linux

Linked PRs

@TauPan TauPan added the type-bug An unexpected behavior, bug, or error label Dec 10, 2024
@ZeroIntensity ZeroIntensity added topic-email stdlib Python modules in the Lib dir labels Dec 10, 2024
@srinivasreddy
Copy link
Contributor

@TauPan Are you working on it? Shall I work on it?

@TauPan
Copy link
Author

TauPan commented Dec 11, 2024

@TauPan Are you working on it? Shall I work on it?

I'd appreciate if you could work on it. Thanks!

@srinivasreddy
Copy link
Contributor

I am on it. Thanks.

srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 11, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 11, 2024
@bitdancer
Copy link
Member

As you observed, the missing header separator error is python's equivalent of the other email programs displaying the broken headers in the body: we register the defect and then start the body starting with the broken header, which is the correct behavior per the rfcs. feedparser uses its own regex which incorporates the one from the rfc but extends it for internal code reasons. The one in email.headers (old API) is probably a refactoring leftover.

srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 13, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 13, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 13, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 13, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 13, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 17, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 17, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 17, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 18, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 23, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Dec 31, 2024
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Jan 2, 2025
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Jan 8, 2025
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Feb 19, 2025
srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Mar 5, 2025
bitdancer added a commit to srinivasreddy/cpython that referenced this issue Mar 29, 2025
picnixz added a commit that referenced this issue Mar 30, 2025
`email.message.Message` objects now validate header names specified via `__setitem__`
or `add_header` according to RFC 5322, §2.2 [1].

In particular, callers should expect a ValueError to be raised for invalid header names.

[1]: https://datatracker.ietf.org/doc/html/rfc5322#section-2.2

---------

Co-authored-by: Bénédikt Tran <[email protected]>
Co-authored-by: R. David Murray <[email protected]>
@srinivasreddy
Copy link
Contributor

srinivasreddy commented Mar 31, 2025

🚀 🚀 @TauPan We have merged the PR into main. Thanks a lot to @bitdancer and @picnixz . Appreciate it. We can close this issue.

@picnixz picnixz closed this as completed Mar 31, 2025
@picnixz picnixz removed the type-bug An unexpected behavior, bug, or error label Mar 31, 2025
@picnixz picnixz added type-feature A feature request or enhancement type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Mar 31, 2025
@picnixz
Copy link
Member

picnixz commented Mar 31, 2025

Rationale for the non-backports: #127820 (comment).

seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
…ython#127820)

`email.message.Message` objects now validate header names specified via `__setitem__`
or `add_header` according to RFC 5322, §2.2 [1].

In particular, callers should expect a ValueError to be raised for invalid header names.

[1]: https://datatracker.ietf.org/doc/html/rfc5322#section-2.2

---------

Co-authored-by: Bénédikt Tran <[email protected]>
Co-authored-by: R. David Murray <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants