Skip to content

Commit 854f645

Browse files
hartworkgpshead
andauthored
[3.8] gh-115398: Expose Expat >=2.6.0 reparse deferral API (CVE-2023-52425) (GH-115623) (GH-116275)
Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods: - `xml.etree.ElementTree.XMLParser.flush` - `xml.etree.ElementTree.XMLPullParser.flush` - `xml.parsers.expat.xmlparser.GetReparseDeferralEnabled` - `xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` - `xml.sax.expatreader.ExpatParser.flush` Based on the "flush" idea from #115138 (comment) . Includes code suggested-by: Snild Dolkow <[email protected]> and by core dev Serhiy Storchaka. Co-authored-by: Gregory P. Smith <[email protected]>
1 parent 4d58a1d commit 854f645

14 files changed

+435
-20
lines changed

Doc/library/pyexpat.rst

+36
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,42 @@ XMLParser Objects
196196
:exc:`ExpatError` to be raised with the :attr:`code` attribute set to
197197
``errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]``.
198198

199+
.. method:: xmlparser.SetReparseDeferralEnabled(enabled)
200+
201+
.. warning::
202+
203+
Calling ``SetReparseDeferralEnabled(False)`` has security implications,
204+
as detailed below; please make sure to understand these consequences
205+
prior to using the ``SetReparseDeferralEnabled`` method.
206+
207+
Expat 2.6.0 introduced a security mechanism called "reparse deferral"
208+
where instead of causing denial of service through quadratic runtime
209+
from reparsing large tokens, reparsing of unfinished tokens is now delayed
210+
by default until a sufficient amount of input is reached.
211+
Due to this delay, registered handlers may — depending of the sizing of
212+
input chunks pushed to Expat — no longer be called right after pushing new
213+
input to the parser. Where immediate feedback and taking over responsiblity
214+
of protecting against denial of service from large tokens are both wanted,
215+
calling ``SetReparseDeferralEnabled(False)`` disables reparse deferral
216+
for the current Expat parser instance, temporarily or altogether.
217+
Calling ``SetReparseDeferralEnabled(True)`` allows re-enabling reparse
218+
deferral.
219+
220+
Note that :meth:`SetReparseDeferralEnabled` has been backported to some
221+
prior releases of CPython as a security fix. Check for availability of
222+
:meth:`SetReparseDeferralEnabled` using :func:`hasattr` if used in code
223+
running across a variety of Python versions.
224+
225+
.. versionadded:: 3.8.19
226+
227+
.. method:: xmlparser.GetReparseDeferralEnabled()
228+
229+
Returns whether reparse deferral is currently enabled for the given
230+
Expat parser instance.
231+
232+
.. versionadded:: 3.8.19
233+
234+
199235
:class:`xmlparser` objects have the following attributes:
200236

201237

Doc/library/xml.etree.elementtree.rst

+39
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,11 @@ data but would still like to have incremental parsing capabilities, take a look
163163
at :func:`iterparse`. It can be useful when you're reading a large XML document
164164
and don't want to hold it wholly in memory.
165165

166+
Where *immediate* feedback through events is wanted, calling method
167+
:meth:`XMLPullParser.flush` can help reduce delay;
168+
please make sure to study the related security notes.
169+
170+
166171
Finding interesting elements
167172
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
168173

@@ -1352,6 +1357,24 @@ XMLParser Objects
13521357

13531358
Feeds data to the parser. *data* is encoded data.
13541359

1360+
1361+
.. method:: flush()
1362+
1363+
Triggers parsing of any previously fed unparsed data, which can be
1364+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1365+
The implementation of :meth:`flush` temporarily disables reparse deferral
1366+
with Expat (if currently enabled) and triggers a reparse.
1367+
Disabling reparse deferral has security consequences; please see
1368+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1369+
1370+
Note that :meth:`flush` has been backported to some prior releases of
1371+
CPython as a security fix. Check for availability of :meth:`flush`
1372+
using :func:`hasattr` if used in code running across a variety of Python
1373+
versions.
1374+
1375+
.. versionadded:: 3.8.19
1376+
1377+
13551378
:meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
13561379
for each opening tag, its ``end(tag)`` method for each closing tag, and data
13571380
is processed by method ``data(data)``. For further supported callback
@@ -1413,6 +1436,22 @@ XMLPullParser Objects
14131436

14141437
Feed the given bytes data to the parser.
14151438

1439+
.. method:: flush()
1440+
1441+
Triggers parsing of any previously fed unparsed data, which can be
1442+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1443+
The implementation of :meth:`flush` temporarily disables reparse deferral
1444+
with Expat (if currently enabled) and triggers a reparse.
1445+
Disabling reparse deferral has security consequences; please see
1446+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1447+
1448+
Note that :meth:`flush` has been backported to some prior releases of
1449+
CPython as a security fix. Check for availability of :meth:`flush`
1450+
using :func:`hasattr` if used in code running across a variety of Python
1451+
versions.
1452+
1453+
.. versionadded:: 3.8.19
1454+
14161455
.. method:: close()
14171456

14181457
Signal the parser that the data stream is terminated. Unlike

Include/pyexpat.h

+3-1
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,10 @@ struct PyExpat_CAPI
4848
enum XML_Status (*SetEncoding)(XML_Parser parser, const XML_Char *encoding);
4949
int (*DefaultUnknownEncodingHandler)(
5050
void *encodingHandlerData, const XML_Char *name, XML_Encoding *info);
51-
/* might be none for expat < 2.1.0 */
51+
/* might be NULL for expat < 2.1.0 */
5252
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
53+
/* might be NULL for expat < 2.6.0 */
54+
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
5355
/* always add new stuff to the end! */
5456
};
5557

Lib/test/test_pyexpat.py

+54
Original file line numberDiff line numberDiff line change
@@ -729,5 +729,59 @@ def resolve_entity(context, base, system_id, public_id):
729729
self.assertEqual(handler_call_args, [("bar", "baz")])
730730

731731

732+
class ReparseDeferralTest(unittest.TestCase):
733+
def test_getter_setter_round_trip(self):
734+
parser = expat.ParserCreate()
735+
enabled = (expat.version_info >= (2, 6, 0))
736+
737+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
738+
parser.SetReparseDeferralEnabled(False)
739+
self.assertIs(parser.GetReparseDeferralEnabled(), False)
740+
parser.SetReparseDeferralEnabled(True)
741+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
742+
743+
def test_reparse_deferral_enabled(self):
744+
if expat.version_info < (2, 6, 0):
745+
self.skipTest(f'Expat {expat.version_info} does not '
746+
'support reparse deferral')
747+
748+
started = []
749+
750+
def start_element(name, _):
751+
started.append(name)
752+
753+
parser = expat.ParserCreate()
754+
parser.StartElementHandler = start_element
755+
self.assertTrue(parser.GetReparseDeferralEnabled())
756+
757+
for chunk in (b'<doc', b'/>'):
758+
parser.Parse(chunk, False)
759+
760+
# The key test: Have handlers already fired? Expecting: no.
761+
self.assertEqual(started, [])
762+
763+
parser.Parse(b'', True)
764+
765+
self.assertEqual(started, ['doc'])
766+
767+
def test_reparse_deferral_disabled(self):
768+
started = []
769+
770+
def start_element(name, _):
771+
started.append(name)
772+
773+
parser = expat.ParserCreate()
774+
parser.StartElementHandler = start_element
775+
if expat.version_info >= (2, 6, 0):
776+
parser.SetReparseDeferralEnabled(False)
777+
self.assertFalse(parser.GetReparseDeferralEnabled())
778+
779+
for chunk in (b'<doc', b'/>'):
780+
parser.Parse(chunk, False)
781+
782+
# The key test: Have handlers already fired? Expecting: yes.
783+
self.assertEqual(started, ['doc'])
784+
785+
732786
if __name__ == "__main__":
733787
unittest.main()

Lib/test/test_sax.py

+51
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
from io import BytesIO, StringIO
1919
import codecs
2020
import os.path
21+
import pyexpat
2122
import shutil
2223
from urllib.error import URLError
2324
from test import support
@@ -1206,6 +1207,56 @@ def test_expat_incremental_reset(self):
12061207

12071208
self.assertEqual(result.getvalue(), start + b"<doc>text</doc>")
12081209

1210+
def test_flush_reparse_deferral_enabled(self):
1211+
if pyexpat.version_info < (2, 6, 0):
1212+
self.skipTest(f'Expat {pyexpat.version_info} does not support reparse deferral')
1213+
1214+
result = BytesIO()
1215+
xmlgen = XMLGenerator(result)
1216+
parser = create_parser()
1217+
parser.setContentHandler(xmlgen)
1218+
1219+
for chunk in ("<doc", ">"):
1220+
parser.feed(chunk)
1221+
1222+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1223+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1224+
1225+
parser.flush()
1226+
1227+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1228+
self.assertEqual(result.getvalue(), start + b"<doc>")
1229+
1230+
parser.feed("</doc>")
1231+
parser.close()
1232+
1233+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1234+
1235+
def test_flush_reparse_deferral_disabled(self):
1236+
result = BytesIO()
1237+
xmlgen = XMLGenerator(result)
1238+
parser = create_parser()
1239+
parser.setContentHandler(xmlgen)
1240+
1241+
for chunk in ("<doc", ">"):
1242+
parser.feed(chunk)
1243+
1244+
if pyexpat.version_info >= (2, 6, 0):
1245+
parser._parser.SetReparseDeferralEnabled(False)
1246+
1247+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1248+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1249+
1250+
parser.flush()
1251+
1252+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1253+
self.assertEqual(result.getvalue(), start + b"<doc>")
1254+
1255+
parser.feed("</doc>")
1256+
parser.close()
1257+
1258+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1259+
12091260
# ===== Locator support
12101261

12111262
def test_expat_locator_noinfo(self):

Lib/test/test_xml_etree.py

+63-17
Original file line numberDiff line numberDiff line change
@@ -105,11 +105,6 @@
105105
"""
106106

107107

108-
fails_with_expat_2_6_0 = (unittest.expectedFailure
109-
if pyexpat.version_info >= (2, 6, 0) else
110-
lambda test: test)
111-
112-
113108
def checkwarnings(*filters, quiet=False):
114109
def decorator(test):
115110
def newtest(*args, **kwargs):
@@ -1250,12 +1245,14 @@ def test_tree_write_attribute_order(self):
12501245

12511246
class XMLPullParserTest(unittest.TestCase):
12521247

1253-
def _feed(self, parser, data, chunk_size=None):
1248+
def _feed(self, parser, data, chunk_size=None, flush=False):
12541249
if chunk_size is None:
12551250
parser.feed(data)
12561251
else:
12571252
for i in range(0, len(data), chunk_size):
12581253
parser.feed(data[i:i+chunk_size])
1254+
if flush:
1255+
parser.flush()
12591256

12601257
def assert_events(self, parser, expected, max_events=None):
12611258
self.assertEqual(
@@ -1273,34 +1270,32 @@ def assert_event_tags(self, parser, expected, max_events=None):
12731270
self.assertEqual([(action, elem.tag) for action, elem in events],
12741271
expected)
12751272

1276-
def test_simple_xml(self, chunk_size=None):
1273+
def test_simple_xml(self, chunk_size=None, flush=False):
12771274
parser = ET.XMLPullParser()
12781275
self.assert_event_tags(parser, [])
1279-
self._feed(parser, "<!-- comment -->\n", chunk_size)
1276+
self._feed(parser, "<!-- comment -->\n", chunk_size, flush)
12801277
self.assert_event_tags(parser, [])
12811278
self._feed(parser,
12821279
"<root>\n <element key='value'>text</element",
1283-
chunk_size)
1280+
chunk_size, flush)
12841281
self.assert_event_tags(parser, [])
1285-
self._feed(parser, ">\n", chunk_size)
1282+
self._feed(parser, ">\n", chunk_size, flush)
12861283
self.assert_event_tags(parser, [('end', 'element')])
1287-
self._feed(parser, "<element>text</element>tail\n", chunk_size)
1288-
self._feed(parser, "<empty-element/>\n", chunk_size)
1284+
self._feed(parser, "<element>text</element>tail\n", chunk_size, flush)
1285+
self._feed(parser, "<empty-element/>\n", chunk_size, flush)
12891286
self.assert_event_tags(parser, [
12901287
('end', 'element'),
12911288
('end', 'empty-element'),
12921289
])
1293-
self._feed(parser, "</root>\n", chunk_size)
1290+
self._feed(parser, "</root>\n", chunk_size, flush)
12941291
self.assert_event_tags(parser, [('end', 'root')])
12951292
self.assertIsNone(parser.close())
12961293

1297-
@fails_with_expat_2_6_0
12981294
def test_simple_xml_chunk_1(self):
1299-
self.test_simple_xml(chunk_size=1)
1295+
self.test_simple_xml(chunk_size=1, flush=True)
13001296

1301-
@fails_with_expat_2_6_0
13021297
def test_simple_xml_chunk_5(self):
1303-
self.test_simple_xml(chunk_size=5)
1298+
self.test_simple_xml(chunk_size=5, flush=True)
13041299

13051300
def test_simple_xml_chunk_22(self):
13061301
self.test_simple_xml(chunk_size=22)
@@ -1499,6 +1494,57 @@ def test_unknown_event(self):
14991494
with self.assertRaises(ValueError):
15001495
ET.XMLPullParser(events=('start', 'end', 'bogus'))
15011496

1497+
def test_flush_reparse_deferral_enabled(self):
1498+
if pyexpat.version_info < (2, 6, 0):
1499+
self.skipTest(f'Expat {pyexpat.version_info} does not '
1500+
'support reparse deferral')
1501+
1502+
parser = ET.XMLPullParser(events=('start', 'end'))
1503+
1504+
for chunk in ("<doc", ">"):
1505+
parser.feed(chunk)
1506+
1507+
self.assert_event_tags(parser, []) # i.e. no elements started
1508+
if ET is pyET:
1509+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1510+
1511+
parser.flush()
1512+
1513+
self.assert_event_tags(parser, [('start', 'doc')])
1514+
if ET is pyET:
1515+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1516+
1517+
parser.feed("</doc>")
1518+
parser.close()
1519+
1520+
self.assert_event_tags(parser, [('end', 'doc')])
1521+
1522+
def test_flush_reparse_deferral_disabled(self):
1523+
parser = ET.XMLPullParser(events=('start', 'end'))
1524+
1525+
for chunk in ("<doc", ">"):
1526+
parser.feed(chunk)
1527+
1528+
if pyexpat.version_info >= (2, 6, 0):
1529+
if not ET is pyET:
1530+
self.skipTest(f'XMLParser.(Get|Set)ReparseDeferralEnabled '
1531+
'methods not available in C')
1532+
parser._parser._parser.SetReparseDeferralEnabled(False)
1533+
1534+
self.assert_event_tags(parser, []) # i.e. no elements started
1535+
if ET is pyET:
1536+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1537+
1538+
parser.flush()
1539+
1540+
self.assert_event_tags(parser, [('start', 'doc')])
1541+
if ET is pyET:
1542+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1543+
1544+
parser.feed("</doc>")
1545+
parser.close()
1546+
1547+
self.assert_event_tags(parser, [('end', 'doc')])
15021548

15031549
#
15041550
# xinclude tests (samples from appendix C of the xinclude specification)

Lib/xml/etree/ElementTree.py

+14
Original file line numberDiff line numberDiff line change
@@ -1303,6 +1303,11 @@ def read_events(self):
13031303
else:
13041304
yield event
13051305

1306+
def flush(self):
1307+
if self._parser is None:
1308+
raise ValueError("flush() called after end of stream")
1309+
self._parser.flush()
1310+
13061311

13071312
def XML(text, parser=None):
13081313
"""Parse XML document from string constant.
@@ -1711,6 +1716,15 @@ def close(self):
17111716
del self.parser, self._parser
17121717
del self.target, self._target
17131718

1719+
def flush(self):
1720+
was_enabled = self.parser.GetReparseDeferralEnabled()
1721+
try:
1722+
self.parser.SetReparseDeferralEnabled(False)
1723+
self.parser.Parse(b"", False)
1724+
except self._error as v:
1725+
self._raiseerror(v)
1726+
finally:
1727+
self.parser.SetReparseDeferralEnabled(was_enabled)
17141728

17151729
# --------------------------------------------------------------------
17161730
# C14N 2.0

0 commit comments

Comments
 (0)