-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mimecast: change the way to consume S3 objects #1316
Conversation
Reviewer's Guide by SourceryThis pull request changes the way the Mimecast connector consumes S3 objects by using generators instead of lists. This change is intended to fix memory issues observed on the connector. The events are now consumed in batches of 10000 items by default. Sequence diagram for fetching and processing events with generatorssequenceDiagram
participant Connector
participant MimecastAPI
participant S3Object
participant EventProcessor
Connector->>MimecastAPI: Request event URLs
MimecastAPI-->>Connector: Returns list of S3 object URLs
loop For each S3 object URL
Connector->>S3Object: Download S3 object content (gzipped JSON lines)
S3Object-->>Connector: Returns gzipped JSON lines
Connector->>EventProcessor: Yield JSON events from lines
end
loop For each batch of events (size: EVENTS_BATCH_SIZE)
Connector->>Connector: Filter events based on cursor (if applicable)
Connector->>Connector: Increment INCOMING_MESSAGES metric
Connector->>Connector: Yield batch of events
end
Updated class diagram for AsyncGeneratorConverterclassDiagram
class AsyncGeneratorConverter {
-async_iterator: AsyncIterator
-loop: asyncio.BaseEventLoop
__init__(async_generator: AsyncGenerator, loop: asyncio.BaseEventLoop)
+__iter__()
+get_anext() : Any
+__next__()
}
note for AsyncGeneratorConverter "Converts an async generator to a synchronous iterator."
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @squioc - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a more specific type hint for the
iterable
parameter in thebatched
function, e.g.,Iterable[Any]
. - The AsyncGeneratorConverter seems like it could be a generally useful utility - consider whether it belongs in a separate library.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 4 issues found
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
66623d4
to
55b8ecf
Compare
Test Results15 tests - 45 15 ✅ - 41 1s ⏱️ - 2m 15s Results for commit 55b8ecf. ± Comparison against base commit 32b1484. This pull request removes 60 and adds 15 tests. Note that renamed tests count towards both.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1316 +/- ##
==========================================
+ Coverage 89.83% 90.94% +1.11%
==========================================
Files 77 77
Lines 3029 3048 +19
Branches 141 144 +3
==========================================
+ Hits 2721 2772 +51
+ Misses 257 224 -33
- Partials 51 52 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… a generator of n batched items
…rom the batch urls
55b8ecf
to
014be2b
Compare
I close this PR in favor of #1318 |
Use generators, instead of lists, to consume the content of the S3 objects returned by the API.
The events are consumed in batch of 10000 items by default.
This behavior should fix the memory issue observed on the connector.
Summary by Sourcery
Bug Fixes: