Skip to content

Commit 0976339

Browse files
encukoubasbloemsaatserhiy-storchaka
authored
gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)
## Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. ## Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
1 parent 5912487 commit 0976339

File tree

10 files changed

+160
-4
lines changed

10 files changed

+160
-4
lines changed

Doc/library/email.errors.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,13 @@ The following exception classes are defined in the :mod:`email.errors` module:
5858
:class:`~email.mime.nonmultipart.MIMENonMultipart` (e.g.
5959
:class:`~email.mime.image.MIMEImage`).
6060

61+
62+
.. exception:: HeaderWriteError()
63+
64+
Raised when an error occurs when the :mod:`~email.generator` outputs
65+
headers.
66+
67+
6168
.. exception:: MessageDefect()
6269

6370
This is the base class for all defects found when parsing email messages.

Doc/library/email.policy.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,24 @@ added matters. To illustrate::
229229

230230
.. versionadded:: 3.6
231231

232+
233+
.. attribute:: verify_generated_headers
234+
235+
If ``True`` (the default), the generator will raise
236+
:exc:`~email.errors.HeaderWriteError` instead of writing a header
237+
that is improperly folded or delimited, such that it would
238+
be parsed as multiple headers or joined with adjacent data.
239+
Such headers can be generated by custom header classes or bugs
240+
in the ``email`` module.
241+
242+
As it's a security feature, this defaults to ``True`` even in the
243+
:class:`~email.policy.Compat32` policy.
244+
For backwards compatible, but unsafe, behavior, it must be set to
245+
``False`` explicitly.
246+
247+
.. versionadded:: 3.13
248+
249+
232250
The following :class:`Policy` method is intended to be called by code using
233251
the email library to create policy instances with custom settings:
234252

Doc/whatsnew/3.13.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -736,6 +736,15 @@ doctest
736736
email
737737
-----
738738

739+
* Headers with embedded newlines are now quoted on output.
740+
741+
The :mod:`~email.generator` will now refuse to serialize (write) headers
742+
that are improperly folded or delimited, such that they would be parsed as
743+
multiple headers or joined with adjacent data.
744+
If you need to turn this safety feature off,
745+
set :attr:`~email.policy.Policy.verify_generated_headers`.
746+
(Contributed by Bas Bloemsaat and Petr Viktorin in :gh:`121650`.)
747+
739748
* :func:`email.utils.getaddresses` and :func:`email.utils.parseaddr` now return
740749
``('', '')`` 2-tuples in more situations where invalid email addresses are
741750
encountered instead of potentially inaccurate values. Add optional *strict*

Lib/email/_header_value_parser.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,8 @@
9292
ASPECIALS = TSPECIALS | set("*'%")
9393
ATTRIBUTE_ENDS = ASPECIALS | WSP
9494
EXTENDED_ATTRIBUTE_ENDS = ATTRIBUTE_ENDS - set('%')
95+
NLSET = {'\n', '\r'}
96+
SPECIALSNL = SPECIALS | NLSET
9597

9698
def quote_string(value):
9799
return '"'+str(value).replace('\\', '\\\\').replace('"', r'\"')+'"'
@@ -2802,9 +2804,13 @@ def _refold_parse_tree(parse_tree, *, policy):
28022804
wrap_as_ew_blocked -= 1
28032805
continue
28042806
tstr = str(part)
2805-
if part.token_type == 'ptext' and set(tstr) & SPECIALS:
2806-
# Encode if tstr contains special characters.
2807-
want_encoding = True
2807+
if not want_encoding:
2808+
if part.token_type == 'ptext':
2809+
# Encode if tstr contains special characters.
2810+
want_encoding = not SPECIALSNL.isdisjoint(tstr)
2811+
else:
2812+
# Encode if tstr contains newlines.
2813+
want_encoding = not NLSET.isdisjoint(tstr)
28082814
try:
28092815
tstr.encode(encoding)
28102816
charset = encoding

Lib/email/_policybase.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,13 @@ class Policy(_PolicyBase, metaclass=abc.ABCMeta):
157157
message_factory -- the class to use to create new message objects.
158158
If the value is None, the default is Message.
159159
160+
verify_generated_headers
161+
-- if true, the generator verifies that each header
162+
they are properly folded, so that a parser won't
163+
treat it as multiple headers, start-of-body, or
164+
part of another header.
165+
This is a check against custom Header & fold()
166+
implementations.
160167
"""
161168

162169
raise_on_defect = False
@@ -165,6 +172,7 @@ class Policy(_PolicyBase, metaclass=abc.ABCMeta):
165172
max_line_length = 78
166173
mangle_from_ = False
167174
message_factory = None
175+
verify_generated_headers = True
168176

169177
def handle_defect(self, obj, defect):
170178
"""Based on policy, either raise defect or call register_defect.

Lib/email/errors.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@ class CharsetError(MessageError):
2929
"""An illegal charset was given."""
3030

3131

32+
class HeaderWriteError(MessageError):
33+
"""Error while writing headers."""
34+
35+
3236
# These are parsing defects which the parser was able to work around.
3337
class MessageDefect(ValueError):
3438
"""Base class for a message defect."""

Lib/email/generator.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,14 @@
1414
from copy import deepcopy
1515
from io import StringIO, BytesIO
1616
from email.utils import _has_surrogates
17+
from email.errors import HeaderWriteError
1718

1819
UNDERSCORE = '_'
1920
NL = '\n' # XXX: no longer used by the code below.
2021

2122
NLCRE = re.compile(r'\r\n|\r|\n')
2223
fcre = re.compile(r'^From ', re.MULTILINE)
24+
NEWLINE_WITHOUT_FWSP = re.compile(r'\r\n[^ \t]|\r[^ \n\t]|\n[^ \t]')
2325

2426

2527
class Generator:
@@ -222,7 +224,16 @@ def _dispatch(self, msg):
222224

223225
def _write_headers(self, msg):
224226
for h, v in msg.raw_items():
225-
self.write(self.policy.fold(h, v))
227+
folded = self.policy.fold(h, v)
228+
if self.policy.verify_generated_headers:
229+
linesep = self.policy.linesep
230+
if not folded.endswith(self.policy.linesep):
231+
raise HeaderWriteError(
232+
f'folded header does not end with {linesep!r}: {folded!r}')
233+
if NEWLINE_WITHOUT_FWSP.search(folded.removesuffix(linesep)):
234+
raise HeaderWriteError(
235+
f'folded header contains newline: {folded!r}')
236+
self.write(folded)
226237
# A blank line always separates headers from body
227238
self.write(self._NL)
228239

Lib/test/test_email/test_generator.py

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from email.generator import Generator, BytesGenerator
77
from email.headerregistry import Address
88
from email import policy
9+
import email.errors
910
from test.test_email import TestEmailBase, parameterize
1011

1112

@@ -249,6 +250,44 @@ def test_rfc2231_wrapping_switches_to_default_len_if_too_narrow(self):
249250
g.flatten(msg)
250251
self.assertEqual(s.getvalue(), self.typ(expected))
251252

253+
def test_keep_encoded_newlines(self):
254+
msg = self.msgmaker(self.typ(textwrap.dedent("""\
255+
To: nobody
256+
Subject: Bad subject=?UTF-8?Q?=0A?=Bcc: [email protected]
257+
258+
None
259+
""")))
260+
expected = textwrap.dedent("""\
261+
To: nobody
262+
Subject: Bad subject=?UTF-8?Q?=0A?=Bcc: [email protected]
263+
264+
None
265+
""")
266+
s = self.ioclass()
267+
g = self.genclass(s, policy=self.policy.clone(max_line_length=80))
268+
g.flatten(msg)
269+
self.assertEqual(s.getvalue(), self.typ(expected))
270+
271+
def test_keep_long_encoded_newlines(self):
272+
msg = self.msgmaker(self.typ(textwrap.dedent("""\
273+
To: nobody
274+
Subject: Bad subject=?UTF-8?Q?=0A?=Bcc: [email protected]
275+
276+
None
277+
""")))
278+
expected = textwrap.dedent("""\
279+
To: nobody
280+
Subject: Bad subject
281+
=?utf-8?q?=0A?=Bcc:
282+
283+
284+
None
285+
""")
286+
s = self.ioclass()
287+
g = self.genclass(s, policy=self.policy.clone(max_line_length=30))
288+
g.flatten(msg)
289+
self.assertEqual(s.getvalue(), self.typ(expected))
290+
252291

253292
class TestGenerator(TestGeneratorBase, TestEmailBase):
254293

@@ -273,6 +312,29 @@ def test_flatten_unicode_linesep(self):
273312
g.flatten(msg)
274313
self.assertEqual(s.getvalue(), self.typ(expected))
275314

315+
def test_verify_generated_headers(self):
316+
"""gh-121650: by default the generator prevents header injection"""
317+
class LiteralHeader(str):
318+
name = 'Header'
319+
def fold(self, **kwargs):
320+
return self
321+
322+
for text in (
323+
'Value\r\nBad Injection\r\n',
324+
'NoNewLine'
325+
):
326+
with self.subTest(text=text):
327+
message = message_from_string(
328+
"Header: Value\r\n\r\nBody",
329+
policy=self.policy,
330+
)
331+
332+
del message['Header']
333+
message['Header'] = LiteralHeader(text)
334+
335+
with self.assertRaises(email.errors.HeaderWriteError):
336+
message.as_string()
337+
276338

277339
class TestBytesGenerator(TestGeneratorBase, TestEmailBase):
278340

Lib/test/test_email/test_policy.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ class PolicyAPITests(unittest.TestCase):
2626
'raise_on_defect': False,
2727
'mangle_from_': True,
2828
'message_factory': None,
29+
'verify_generated_headers': True,
2930
}
3031
# These default values are the ones set on email.policy.default.
3132
# If any of these defaults change, the docs must be updated.
@@ -294,6 +295,31 @@ def test_short_maxlen_error(self):
294295
with self.assertRaises(email.errors.HeaderParseError):
295296
policy.fold("Subject", subject)
296297

298+
def test_verify_generated_headers(self):
299+
"""Turning protection off allows header injection"""
300+
policy = email.policy.default.clone(verify_generated_headers=False)
301+
for text in (
302+
'Header: Value\r\nBad: Injection\r\n',
303+
'Header: NoNewLine'
304+
):
305+
with self.subTest(text=text):
306+
message = email.message_from_string(
307+
"Header: Value\r\n\r\nBody",
308+
policy=policy,
309+
)
310+
class LiteralHeader(str):
311+
name = 'Header'
312+
def fold(self, **kwargs):
313+
return self
314+
315+
del message['Header']
316+
message['Header'] = LiteralHeader(text)
317+
318+
self.assertEqual(
319+
message.as_string(),
320+
f"{text}\nBody",
321+
)
322+
297323
# XXX: Need subclassing tests.
298324
# For adding subclassed objects, make sure the usual rules apply (subclass
299325
# wins), but that the order still works (right overrides left).
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
:mod:`email` headers with embedded newlines are now quoted on output. The
2+
:mod:`~email.generator` will now refuse to serialize (write) headers that
3+
are unsafely folded or delimited; see
4+
:attr:`~email.policy.Policy.verify_generated_headers`. (Contributed by Bas
5+
Bloemsaat and Petr Viktorin in :gh:`121650`.)

0 commit comments

Comments
 (0)