Skip to content

Commit 6a95676

Browse files
authored
pythongh-115398: Expose Expat >=2.6.0 reparse deferral API (CVE-2023-52425) (pythonGH-115623)
Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods: - `xml.etree.ElementTree.XMLParser.flush` - `xml.etree.ElementTree.XMLPullParser.flush` - `xml.parsers.expat.xmlparser.GetReparseDeferralEnabled` - `xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` - `xml.sax.expatreader.ExpatParser.flush` Based on the "flush" idea from python#115138 (comment) . ### Notes - Please treat as a security fix related to CVE-2023-52425. Includes code suggested-by: Snild Dolkow <[email protected]> and by core dev Serhiy Storchaka.
1 parent d01886c commit 6a95676

16 files changed

+435
-21
lines changed

Doc/library/pyexpat.rst

+31
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,37 @@ XMLParser Objects
196196
:exc:`ExpatError` to be raised with the :attr:`code` attribute set to
197197
``errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]``.
198198

199+
.. method:: xmlparser.SetReparseDeferralEnabled(enabled)
200+
201+
.. warning::
202+
203+
Calling ``SetReparseDeferralEnabled(False)`` has security implications,
204+
as detailed below; please make sure to understand these consequences
205+
prior to using the ``SetReparseDeferralEnabled`` method.
206+
207+
Expat 2.6.0 introduced a security mechanism called "reparse deferral"
208+
where instead of causing denial of service through quadratic runtime
209+
from reparsing large tokens, reparsing of unfinished tokens is now delayed
210+
by default until a sufficient amount of input is reached.
211+
Due to this delay, registered handlers may — depending of the sizing of
212+
input chunks pushed to Expat — no longer be called right after pushing new
213+
input to the parser. Where immediate feedback and taking over responsiblity
214+
of protecting against denial of service from large tokens are both wanted,
215+
calling ``SetReparseDeferralEnabled(False)`` disables reparse deferral
216+
for the current Expat parser instance, temporarily or altogether.
217+
Calling ``SetReparseDeferralEnabled(True)`` allows re-enabling reparse
218+
deferral.
219+
220+
.. versionadded:: 3.13
221+
222+
.. method:: xmlparser.GetReparseDeferralEnabled()
223+
224+
Returns whether reparse deferral is currently enabled for the given
225+
Expat parser instance.
226+
227+
.. versionadded:: 3.13
228+
229+
199230
:class:`xmlparser` objects have the following attributes:
200231

201232

Doc/library/xml.etree.elementtree.rst

+29
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,11 @@ data but would still like to have incremental parsing capabilities, take a look
166166
at :func:`iterparse`. It can be useful when you're reading a large XML document
167167
and don't want to hold it wholly in memory.
168168

169+
Where *immediate* feedback through events is wanted, calling method
170+
:meth:`XMLPullParser.flush` can help reduce delay;
171+
please make sure to study the related security notes.
172+
173+
169174
Finding interesting elements
170175
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
171176

@@ -1387,6 +1392,19 @@ XMLParser Objects
13871392

13881393
Feeds data to the parser. *data* is encoded data.
13891394

1395+
1396+
.. method:: flush()
1397+
1398+
Triggers parsing of any previously fed unparsed data, which can be
1399+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1400+
The implementation of :meth:`flush` temporarily disables reparse deferral
1401+
with Expat (if currently enabled) and triggers a reparse.
1402+
Disabling reparse deferral has security consequences; please see
1403+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1404+
1405+
.. versionadded:: 3.13
1406+
1407+
13901408
:meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
13911409
for each opening tag, its ``end(tag)`` method for each closing tag, and data
13921410
is processed by method ``data(data)``. For further supported callback
@@ -1448,6 +1466,17 @@ XMLPullParser Objects
14481466

14491467
Feed the given bytes data to the parser.
14501468

1469+
.. method:: flush()
1470+
1471+
Triggers parsing of any previously fed unparsed data, which can be
1472+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1473+
The implementation of :meth:`flush` temporarily disables reparse deferral
1474+
with Expat (if currently enabled) and triggers a reparse.
1475+
Disabling reparse deferral has security consequences; please see
1476+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1477+
1478+
.. versionadded:: 3.13
1479+
14511480
.. method:: close()
14521481

14531482
Signal the parser that the data stream is terminated. Unlike

Doc/whatsnew/3.13.rst

+11
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,17 @@ Other Language Changes
174174

175175
(Contributed by Victor Stinner in :gh:`114570`.)
176176

177+
* Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425)
178+
by adding five new methods:
179+
180+
* :meth:`xml.etree.ElementTree.XMLParser.flush`
181+
* :meth:`xml.etree.ElementTree.XMLPullParser.flush`
182+
* :meth:`xml.parsers.expat.xmlparser.GetReparseDeferralEnabled`
183+
* :meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled`
184+
* :meth:`!xml.sax.expatreader.ExpatParser.flush`
185+
186+
(Contributed by Sebastian Pipping in :gh:`115623`.)
187+
177188

178189
New Modules
179190
===========

Include/pyexpat.h

+3-1
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,10 @@ struct PyExpat_CAPI
4848
enum XML_Status (*SetEncoding)(XML_Parser parser, const XML_Char *encoding);
4949
int (*DefaultUnknownEncodingHandler)(
5050
void *encodingHandlerData, const XML_Char *name, XML_Encoding *info);
51-
/* might be none for expat < 2.1.0 */
51+
/* might be NULL for expat < 2.1.0 */
5252
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
53+
/* might be NULL for expat < 2.6.0 */
54+
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
5355
/* always add new stuff to the end! */
5456
};
5557

Lib/test/test_pyexpat.py

+54
Original file line numberDiff line numberDiff line change
@@ -755,5 +755,59 @@ def resolve_entity(context, base, system_id, public_id):
755755
self.assertEqual(handler_call_args, [("bar", "baz")])
756756

757757

758+
class ReparseDeferralTest(unittest.TestCase):
759+
def test_getter_setter_round_trip(self):
760+
parser = expat.ParserCreate()
761+
enabled = (expat.version_info >= (2, 6, 0))
762+
763+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
764+
parser.SetReparseDeferralEnabled(False)
765+
self.assertIs(parser.GetReparseDeferralEnabled(), False)
766+
parser.SetReparseDeferralEnabled(True)
767+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
768+
769+
def test_reparse_deferral_enabled(self):
770+
if expat.version_info < (2, 6, 0):
771+
self.skipTest(f'Expat {expat.version_info} does not '
772+
'support reparse deferral')
773+
774+
started = []
775+
776+
def start_element(name, _):
777+
started.append(name)
778+
779+
parser = expat.ParserCreate()
780+
parser.StartElementHandler = start_element
781+
self.assertTrue(parser.GetReparseDeferralEnabled())
782+
783+
for chunk in (b'<doc', b'/>'):
784+
parser.Parse(chunk, False)
785+
786+
# The key test: Have handlers already fired? Expecting: no.
787+
self.assertEqual(started, [])
788+
789+
parser.Parse(b'', True)
790+
791+
self.assertEqual(started, ['doc'])
792+
793+
def test_reparse_deferral_disabled(self):
794+
started = []
795+
796+
def start_element(name, _):
797+
started.append(name)
798+
799+
parser = expat.ParserCreate()
800+
parser.StartElementHandler = start_element
801+
if expat.version_info >= (2, 6, 0):
802+
parser.SetReparseDeferralEnabled(False)
803+
self.assertFalse(parser.GetReparseDeferralEnabled())
804+
805+
for chunk in (b'<doc', b'/>'):
806+
parser.Parse(chunk, False)
807+
808+
# The key test: Have handlers already fired? Expecting: yes.
809+
self.assertEqual(started, ['doc'])
810+
811+
758812
if __name__ == "__main__":
759813
unittest.main()

Lib/test/test_sax.py

+51
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from io import BytesIO, StringIO
2020
import codecs
2121
import os.path
22+
import pyexpat
2223
import shutil
2324
import sys
2425
from urllib.error import URLError
@@ -1214,6 +1215,56 @@ def test_expat_incremental_reset(self):
12141215

12151216
self.assertEqual(result.getvalue(), start + b"<doc>text</doc>")
12161217

1218+
def test_flush_reparse_deferral_enabled(self):
1219+
if pyexpat.version_info < (2, 6, 0):
1220+
self.skipTest(f'Expat {pyexpat.version_info} does not support reparse deferral')
1221+
1222+
result = BytesIO()
1223+
xmlgen = XMLGenerator(result)
1224+
parser = create_parser()
1225+
parser.setContentHandler(xmlgen)
1226+
1227+
for chunk in ("<doc", ">"):
1228+
parser.feed(chunk)
1229+
1230+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1231+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1232+
1233+
parser.flush()
1234+
1235+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1236+
self.assertEqual(result.getvalue(), start + b"<doc>")
1237+
1238+
parser.feed("</doc>")
1239+
parser.close()
1240+
1241+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1242+
1243+
def test_flush_reparse_deferral_disabled(self):
1244+
result = BytesIO()
1245+
xmlgen = XMLGenerator(result)
1246+
parser = create_parser()
1247+
parser.setContentHandler(xmlgen)
1248+
1249+
for chunk in ("<doc", ">"):
1250+
parser.feed(chunk)
1251+
1252+
if pyexpat.version_info >= (2, 6, 0):
1253+
parser._parser.SetReparseDeferralEnabled(False)
1254+
1255+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1256+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1257+
1258+
parser.flush()
1259+
1260+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1261+
self.assertEqual(result.getvalue(), start + b"<doc>")
1262+
1263+
parser.feed("</doc>")
1264+
parser.close()
1265+
1266+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1267+
12171268
# ===== Locator support
12181269

12191270
def test_expat_locator_noinfo(self):

Lib/test/test_xml_etree.py

+63-16
Original file line numberDiff line numberDiff line change
@@ -121,10 +121,6 @@
121121
</foo>
122122
"""
123123

124-
fails_with_expat_2_6_0 = (unittest.expectedFailure
125-
if pyexpat.version_info >= (2, 6, 0) else
126-
lambda test: test)
127-
128124
def checkwarnings(*filters, quiet=False):
129125
def decorator(test):
130126
def newtest(*args, **kwargs):
@@ -1462,12 +1458,14 @@ def test_attlist_default(self):
14621458

14631459
class XMLPullParserTest(unittest.TestCase):
14641460

1465-
def _feed(self, parser, data, chunk_size=None):
1461+
def _feed(self, parser, data, chunk_size=None, flush=False):
14661462
if chunk_size is None:
14671463
parser.feed(data)
14681464
else:
14691465
for i in range(0, len(data), chunk_size):
14701466
parser.feed(data[i:i+chunk_size])
1467+
if flush:
1468+
parser.flush()
14711469

14721470
def assert_events(self, parser, expected, max_events=None):
14731471
self.assertEqual(
@@ -1485,34 +1483,32 @@ def assert_event_tags(self, parser, expected, max_events=None):
14851483
self.assertEqual([(action, elem.tag) for action, elem in events],
14861484
expected)
14871485

1488-
def test_simple_xml(self, chunk_size=None):
1486+
def test_simple_xml(self, chunk_size=None, flush=False):
14891487
parser = ET.XMLPullParser()
14901488
self.assert_event_tags(parser, [])
1491-
self._feed(parser, "<!-- comment -->\n", chunk_size)
1489+
self._feed(parser, "<!-- comment -->\n", chunk_size, flush)
14921490
self.assert_event_tags(parser, [])
14931491
self._feed(parser,
14941492
"<root>\n <element key='value'>text</element",
1495-
chunk_size)
1493+
chunk_size, flush)
14961494
self.assert_event_tags(parser, [])
1497-
self._feed(parser, ">\n", chunk_size)
1495+
self._feed(parser, ">\n", chunk_size, flush)
14981496
self.assert_event_tags(parser, [('end', 'element')])
1499-
self._feed(parser, "<element>text</element>tail\n", chunk_size)
1500-
self._feed(parser, "<empty-element/>\n", chunk_size)
1497+
self._feed(parser, "<element>text</element>tail\n", chunk_size, flush)
1498+
self._feed(parser, "<empty-element/>\n", chunk_size, flush)
15011499
self.assert_event_tags(parser, [
15021500
('end', 'element'),
15031501
('end', 'empty-element'),
15041502
])
1505-
self._feed(parser, "</root>\n", chunk_size)
1503+
self._feed(parser, "</root>\n", chunk_size, flush)
15061504
self.assert_event_tags(parser, [('end', 'root')])
15071505
self.assertIsNone(parser.close())
15081506

1509-
@fails_with_expat_2_6_0
15101507
def test_simple_xml_chunk_1(self):
1511-
self.test_simple_xml(chunk_size=1)
1508+
self.test_simple_xml(chunk_size=1, flush=True)
15121509

1513-
@fails_with_expat_2_6_0
15141510
def test_simple_xml_chunk_5(self):
1515-
self.test_simple_xml(chunk_size=5)
1511+
self.test_simple_xml(chunk_size=5, flush=True)
15161512

15171513
def test_simple_xml_chunk_22(self):
15181514
self.test_simple_xml(chunk_size=22)
@@ -1711,6 +1707,57 @@ def test_unknown_event(self):
17111707
with self.assertRaises(ValueError):
17121708
ET.XMLPullParser(events=('start', 'end', 'bogus'))
17131709

1710+
def test_flush_reparse_deferral_enabled(self):
1711+
if pyexpat.version_info < (2, 6, 0):
1712+
self.skipTest(f'Expat {pyexpat.version_info} does not '
1713+
'support reparse deferral')
1714+
1715+
parser = ET.XMLPullParser(events=('start', 'end'))
1716+
1717+
for chunk in ("<doc", ">"):
1718+
parser.feed(chunk)
1719+
1720+
self.assert_event_tags(parser, []) # i.e. no elements started
1721+
if ET is pyET:
1722+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1723+
1724+
parser.flush()
1725+
1726+
self.assert_event_tags(parser, [('start', 'doc')])
1727+
if ET is pyET:
1728+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1729+
1730+
parser.feed("</doc>")
1731+
parser.close()
1732+
1733+
self.assert_event_tags(parser, [('end', 'doc')])
1734+
1735+
def test_flush_reparse_deferral_disabled(self):
1736+
parser = ET.XMLPullParser(events=('start', 'end'))
1737+
1738+
for chunk in ("<doc", ">"):
1739+
parser.feed(chunk)
1740+
1741+
if pyexpat.version_info >= (2, 6, 0):
1742+
if not ET is pyET:
1743+
self.skipTest(f'XMLParser.(Get|Set)ReparseDeferralEnabled '
1744+
'methods not available in C')
1745+
parser._parser._parser.SetReparseDeferralEnabled(False)
1746+
1747+
self.assert_event_tags(parser, []) # i.e. no elements started
1748+
if ET is pyET:
1749+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1750+
1751+
parser.flush()
1752+
1753+
self.assert_event_tags(parser, [('start', 'doc')])
1754+
if ET is pyET:
1755+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1756+
1757+
parser.feed("</doc>")
1758+
parser.close()
1759+
1760+
self.assert_event_tags(parser, [('end', 'doc')])
17141761

17151762
#
17161763
# xinclude tests (samples from appendix C of the xinclude specification)

Lib/xml/etree/ElementTree.py

+14
Original file line numberDiff line numberDiff line change
@@ -1320,6 +1320,11 @@ def read_events(self):
13201320
else:
13211321
yield event
13221322

1323+
def flush(self):
1324+
if self._parser is None:
1325+
raise ValueError("flush() called after end of stream")
1326+
self._parser.flush()
1327+
13231328

13241329
def XML(text, parser=None):
13251330
"""Parse XML document from string constant.
@@ -1726,6 +1731,15 @@ def close(self):
17261731
del self.parser, self._parser
17271732
del self.target, self._target
17281733

1734+
def flush(self):
1735+
was_enabled = self.parser.GetReparseDeferralEnabled()
1736+
try:
1737+
self.parser.SetReparseDeferralEnabled(False)
1738+
self.parser.Parse(b"", False)
1739+
except self._error as v:
1740+
self._raiseerror(v)
1741+
finally:
1742+
self.parser.SetReparseDeferralEnabled(was_enabled)
17291743

17301744
# --------------------------------------------------------------------
17311745
# C14N 2.0

0 commit comments

Comments
 (0)