Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement stack overflow protection for supported platforms #91079

Open
markshannon opened this issue Mar 4, 2022 · 17 comments
Open

Implement stack overflow protection for supported platforms #91079

markshannon opened this issue Mar 4, 2022 · 17 comments
Assignees
Labels
deferred-blocker interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@markshannon
Copy link
Member

markshannon commented Mar 4, 2022

BPO 46923
Nosy @gpshead, @ronaldoussoren, @stevendaprano, @markshannon, @corona10, @brandtbucher

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/markshannon'
closed_at = None
created_at = <Date 2022-03-04.16:54:19.358>
labels = []
title = 'Implement stack overflow protection for supported platforms'
updated_at = <Date 2022-03-14.14:27:17.389>
user = 'https://github.com/markshannon'

bugs.python.org fields:

activity = <Date 2022-03-14.14:27:17.389>
actor = 'ronaldoussoren'
assignee = 'Mark.Shannon'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2022-03-04.16:54:19.358>
creator = 'Mark.Shannon'
dependencies = []
files = []
hgrepos = []
issue_num = 46923
keywords = []
message_count = 3.0
messages = ['414538', '414553', '415143']
nosy_count = 6.0
nosy_names = ['gregory.p.smith', 'ronaldoussoren', 'steven.daprano', 'Mark.Shannon', 'corona10', 'brandtbucher']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue46923'
versions = []

Linked PRs

@markshannon
Copy link
Member Author

python/steering-council#102 (definitely not PEP-651 ;))

We should implement efficient stack checks on those platforms that allow us to introspect stack extents.
Windows and posix systems allow us to do this.

C allows addresses of stack variables to be taken. This means that C stacks cannot be moved.
In theory they might be discontinuous, although I suspect that they are always contiguous.

My plan is to maintain a per-thread "safe region" of 32or 64k. We can check if the current stack pointer (or near enough) is in that region cheaply.
If we are not in that region, we can ask the O/S for a stack limit to determine a new "safe region". If we cannot find a safe region, then we raise a MemoryError.

Personally I'd prefer a new exception StackOverflow to MemoryError but, thanks to stackoverflow.com, it is now impossible for new programmers to do a web search to determine what a "stack overflow" is.
So, I guess MemoryError will have to do.

@markshannon markshannon self-assigned this Mar 4, 2022
@stevendaprano
Copy link
Member

Personally I'd prefer a new exception StackOverflow to MemoryError

+1 on a new exception (presumably a subclass of MemoryError).

How about using OverflowedStack instead?

The situation is not quite as bad as you suggest. Googling for "stack overflow" alone (with a space and no other qualifications):

  • on Bing, scroll halfway down the first page of results to find the "People also ask..."

    How do you get a stack overflow?
    How to prevent a stack overflow error?

  • also on Bing at the bottom of the first page of results is a question on stackoverflow.com asking what causes memory stack overflows;

  • on DuckDuckGo, the first page of search results fails to suggest anything useful;

  • on Google itself, on the first page is the People Also Ask

    What causes stack overflows?

  • as well as a link to Wikipedia's page on stack overflows.

I expect that going forward, "python stack overflow exception" will be sufficient to hit the Python docs somewhere on the first page.

Besides, presumably this OverflowedStack exception is likely to be rare, so I expect that few people will need to google it.

@ronaldoussoren
Copy link
Contributor

bpo-33955 is an older issue about implementing the current functionality for this on macOS, which has an API for querying stack limits.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@markshannon
Copy link
Member Author

markshannon commented Aug 12, 2022

It looks like OverflowedStack (or whatever it ends up being called) will have to subclass RecursionError, not MemoryError.

The use of RecursionError, rather than the more general RuntimeError is extensive in the tests, so might well be elsewhere.
But the more serious problem is that MemoryError implies that you have run out of memory, which is not the case with a stack overflow. MemoryError is treated specially in the VM in ways that are not appropriate for a stack overflow.
E.g. no traceback, a fixed number of instances.

So, I think the way to go is to add two subclasses of RecursionError, one for excessively deep Python code, and another for consuming too much C stack.

Another reason for making them both subclasses of RecursionError is that it cannot be determined statically which will occur first.
If the call stack is a mixture of Python and builtin functions, either the Python stack, or the C stack could reach it limits first.
Raising a RecursionError in both cases makes it easier to handle this case.

Given that we are subclassing RecursionError, my preferred names would be CRecursionError and PyRecursionError.
Which brings me on to the other issue.

Portability of stack overflows.

Since we do not use the C stack for Python recursion, and we will hopefully modify the compiler to use less stack, we could reduce the amount of C stack available on linux to more closely match Windows, increasing the portability of code. Code that raises a CRecursionError on one platform, will likely raise the same exception on other platforms.

@gpshead
Copy link
Member

gpshead commented Aug 12, 2022

I'm not sure we should differentiate at the Exception class level between C and Py RecursionError. There is no action someone can take in code catching one vs the other. So just call them all RecursionError and differentiate reasons within the error message string itself in case the user wants some hint as to if there's anything they can tweak to avoid it at the moment. Implementation of APIs can be migrated to/from C or Python and implementations vary how much space gets used over time so anyone only catching one of those instead of the more general exception isn't going to be in a good position for future change compatibility that might alter which one their code sees.

@gpshead
Copy link
Member

gpshead commented Aug 12, 2022

we could reduce the amount of C stack available on linux to more closely match Windows

I don't recommend this. It would break someones code that is working just fine in their existing environment who has no interest in other environments.

A nicer long term consistent goal would be to allow RecursionError to be eliminated for pure Python recursion entirely by those who want to turn off our default limit. (ultimately they'd hit a MemoryError if their process wasn't OOM killed)

@markshannon
Copy link
Member Author

OK. Let's reuse RecursionError for now, we can also add subclasses later.
There is no need to change the amount C stack either, but I think it is worth considering later.

A nicer long term consistent goal would be to allow RecursionError to be eliminated for pure Python recursion

Eliminating recursion limits is probably a bad idea, we need memory to build all those frame objects and traceback objects.
Try setting the recursion limit to 100_000_000 to see what happens. It takes a long time to build 100M frame objects and 100M traceback objects, and if you hit ctrl-C while they are being created, it just gets worse.

@markshannon
Copy link
Member Author

markshannon commented Sep 1, 2022

  • Implement portable version that doesn't request information from the O/S, so that we have a solid fallback for unsupported platforms
  • Allow use of full stack on linux (and other posix) systems
  • Allow use of full stack on Windows
  • Allow use of full stack on MacOS.
  • Better support on webassembly platforms

mpage pushed a commit to mpage/cpython that referenced this issue Oct 11, 2022
vstinner added a commit to vstinner/cpython that referenced this issue Aug 26, 2023
Symbols of the C API should be prefixed by "Py_" to avoid conflict
with existing names in 3rd party C extensions on #include <Python.h>.
vstinner added a commit to vstinner/cpython that referenced this issue Sep 8, 2023
Symbols of the C API should be prefixed by "Py_" to avoid conflict
with existing names in 3rd party C extensions on #include <Python.h>.
vstinner added a commit that referenced this issue Sep 8, 2023
Symbols of the C API should be prefixed by "Py_" to avoid conflict
with existing names in 3rd party C extensions on "#include <Python.h>".

test.pythoninfo now logs Py_C_RECURSION_LIMIT constant and other
_testcapi and _testinternalcapi constants.
@iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 27, 2023
@GanerCodes
Copy link

Having the c recursion limit being constant seems like a pretty major blunder imo, this code has no reason to fail

import sys
sys.setrecursionlimit(1_000_000)

class funner:
    def __call__(𝕊, x):
        if x:
            𝕊(x-1)

funner()(498) # works
funner()(499) # fails

@encukou
Copy link
Member

encukou commented Feb 10, 2025

Marking as deferred blocker, since the Windows implementation is blocking #113655.

@picnixz picnixz added the type-feature A feature request or enhancement label Feb 11, 2025
markshannon added a commit that referenced this issue Feb 19, 2025
…-130007)

* Implement C recursion protection with limit pointers

* Remove calls to PyOS_CheckStack

* Add stack protection to parser

* Make tests more robust to low stacks

* Improve error messages for stack overflow
@encukou
Copy link
Member

encukou commented Feb 20, 2025

Since this was merged, several buildbots started failing:

Will you have time today to look into these?

@markshannon
Copy link
Member Author

markshannon commented Feb 20, 2025

Probably not today, I'll see what I can do. Definitely tomorrow morning, though.

The nogil windows buildbot supposedly has 624683 lines of stdio logs, which seems odd. I've not tried to download it.

@encukou
Copy link
Member

encukou commented Feb 21, 2025

Let's revert for now, then: #130413

encukou added a commit that referenced this issue Feb 24, 2025
… not counters. (GH-130007)" for now (GH130413)

Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now

Unfortunatlely, the change broke some buildbots.

This reverts commit 2498c22.
@encukou
Copy link
Member

encukou commented Feb 24, 2025

I've reverted for now, as buildbot failures can masking other bugs that sneak in to main.
Like any revert, this doesn't mean the feature or the implementation is bad -- just that it needs more work for cases.

Use git cherry-pick 2498c22 to add the change back locally.

If you'd like me to go through this and add #ifdefs everywhere so we can introduce it gradually (on known-good platforms), let me know.

@markshannon
Copy link
Member Author

I really wish you hadn't reverted when I had a mergeable fix: #130398

markshannon added a commit to faster-cpython/cpython that referenced this issue Feb 24, 2025
…its using addresses, not counters. (pythonGH-130007)" for now (GH130413)"

This reverts commit ef29104.
@markshannon
Copy link
Member Author

Actually, its not a big deal. I've just rebased that PR on the revert of the revert.

@brettcannon
Copy link
Member

I really wish you hadn't reverted when I had a mergeable fix: #130398

Unfortunately, that PR didn't reference this issue so you couldn't tell from the timeline that a PR had been created. Also, the last comment suggested a potential fix on Friday which didn't happen. That left at least the WASI buildbot red for 5 days when PEP 11 policy is 24 hours.

I understand the frustration, but based on the signals in this issue there wasn't a quick fix coming, so unblocking a tier 2 platform took precedence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deferred-blocker interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
Development

No branches or pull requests

9 participants