-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Unify Apify and Scrapy to use single event loop & remove nest-asyncio
#390
Conversation
nest-asyncio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice beautiful! I scanned the changes and they look like the stuff I've been through for the past two weeks. I didn't spot anything suspicious except two things where I commented.
This wasn't a thorough review of all the lines, so I'm just "commenting", not approving or disapproving.
197026c
to
f1a7fd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid job! I didn't manage to read the whole thing yet, here's a bunch of random comments.
Please merge from latest master to your branch and double check that the code in examples is not longer than 90. |
@Pijukatel So please provide the same PR as for the Crawlee (apify/crawlee-python#973) here and to the Apify Python client as well 🙂. |
Haha, I see this is different repo. I will prepare the PR for this repo as well and merge it after you to not block you. |
e785bf1
to
f1e8bd5
Compare
nest-asyncio
nest-asyncio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two documentation-related notes, otherwise it looks good!
@@ -270,8 +270,8 @@ async def finalize() -> None: | |||
self.log.debug(f'Not calling sys.exit({exit_code}) because Actor is running in IPython') | |||
elif os.getenv('PYTEST_CURRENT_TEST', default=False): # noqa: PLW1508 | |||
self.log.debug(f'Not calling sys.exit({exit_code}) because Actor is running in an unit test') | |||
elif hasattr(asyncio, '_nest_patched'): | |||
self.log.debug(f'Not calling sys.exit({exit_code}) because Actor is running in a nested event loop') | |||
elif os.getenv('SCRAPY_SETTINGS_MODULE'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention that we now depend on this to be present in the upgrading guide, just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the upgrading guide mentioning minor versions. Is it okay if I add it to the Scrapy guide doc page?
Edit: it is there
|
||
The fastest way to start using Scrapy in Apify Actors is by leveraging the [Scrapy Actor template](https://apify.com/templates/categories/python). This template provides a pre-configured structure and setup necessary to integrate Scrapy into your Actors seamlessly. It includes: setting up the Scrapy settings, `asyncio` reactor, Actor logger, and item pipeline as necessary to make Scrapy spiders run in Actors and save their outputs in Apify datasets. | ||
## Key integration components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be awesome if we could have a nice diagram here 🙂 In the ideal case, it would show how the asyncio event loop and twisted reactor interact. We can definitely postpone that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that Asyncio and Twisted share the same event loop. I'm not familiar with the details of how Asyncio and Twisted exactly interact together. Other than that, we have a separate thread with an Asyncio event loop for the Scheduler's synchronous calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I chose the best diagram for this, maybe https://swimlanes.io/ would be better
logger.exception('Exception occurred while shutting down.') | ||
|
||
finally: | ||
logger.debug(f'{self.__class__.__name__} closed successfully.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully 🤔 I think inside finally
we shouldn't say that word.
Description
nest-asyncio
has been completely removed.ApifyScheduler
, which is synchronous, now executes asyncio coroutines (communication with RQ) in a separate thread with its own asyncio event loop.scrapy
subpackage.Issues
actor-templates
is merged.Tests
Next steps
actor-templates
.Follow-up issues
apify
logger in Scrapy-Apify integration #391