Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mimecast: change the way to consume S3 objects #1316

Closed
wants to merge 6 commits into from

Conversation

squioc
Copy link
Collaborator

@squioc squioc commented Mar 12, 2025

Use generators, instead of lists, to consume the content of the S3 objects returned by the API.
The events are consumed in batch of 10000 items by default.
This behavior should fix the memory issue observed on the connector.

Summary by Sourcery

Bug Fixes:

  • Fix memory issues by using generators to process S3 objects, which prevents loading large lists of events into memory at once.

@squioc squioc added the bug Something isn't working label Mar 12, 2025
@squioc squioc requested review from lvoloshyn-sekoia and a team March 12, 2025 17:33
Copy link

sourcery-ai bot commented Mar 12, 2025

Reviewer's Guide by Sourcery

This pull request changes the way the Mimecast connector consumes S3 objects by using generators instead of lists. This change is intended to fix memory issues observed on the connector. The events are now consumed in batches of 10000 items by default.

Sequence diagram for fetching and processing events with generators

sequenceDiagram
  participant Connector
  participant MimecastAPI
  participant S3Object
  participant EventProcessor

  Connector->>MimecastAPI: Request event URLs
  MimecastAPI-->>Connector: Returns list of S3 object URLs
  loop For each S3 object URL
    Connector->>S3Object: Download S3 object content (gzipped JSON lines)
    S3Object-->>Connector: Returns gzipped JSON lines
    Connector->>EventProcessor: Yield JSON events from lines
  end
  loop For each batch of events (size: EVENTS_BATCH_SIZE)
    Connector->>Connector: Filter events based on cursor (if applicable)
    Connector->>Connector: Increment INCOMING_MESSAGES metric
    Connector->>Connector: Yield batch of events
  end
Loading

Updated class diagram for AsyncGeneratorConverter

classDiagram
    class AsyncGeneratorConverter {
        -async_iterator: AsyncIterator
        -loop: asyncio.BaseEventLoop
        __init__(async_generator: AsyncGenerator, loop: asyncio.BaseEventLoop)
        +__iter__()
        +get_anext() : Any
        +__next__()
    }
    note for AsyncGeneratorConverter "Converts an async generator to a synchronous iterator."
Loading

File-Level Changes

Change Details Files
Refactored the download_batches function to use generators instead of lists to improve memory usage.
  • Modified __fetch_content, sync_download_batch, and async_download_batch to be generators.
  • Added AsyncGeneratorConverter to convert async generators to synchronous iterables.
  • Updated download_batches to return a generator that yields events.
  • Added a new batched function that yields batches of items from an iterable.
Mimecast/mimecast_modules/helpers.py
Mimecast/tests/test_helpers.py
Modified the event processing logic in __fetch_next_events to consume events in batches using the batched function.
  • Introduced EVENTS_BATCH_SIZE to control the number of events processed in each batch.
  • Modified __fetch_next_events to use download_batches as a generator and process events in batches.
  • The events are consumed in batch of 10000 items by default.
Mimecast/mimecast_modules/connector_mimecast_siem.py
Updated the connector version in manifest.json and added a changelog entry.
  • Updated the connector version to 1.1.7 in manifest.json.
  • Added a changelog entry in CHANGELOG.md to document the fix for memory issues.
Mimecast/CHANGELOG.md
Mimecast/manifest.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @squioc - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a more specific type hint for the iterable parameter in the batched function, e.g., Iterable[Any].
  • The AsyncGeneratorConverter seems like it could be a generally useful utility - consider whether it belongs in a separate library.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟡 Testing: 4 issues found
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@squioc squioc force-pushed the fix/MimecastS3ObjectConsumption branch from 66623d4 to 55b8ecf Compare March 12, 2025 17:36
Copy link

Test Results

15 tests   - 45   15 ✅  - 41   1s ⏱️ - 2m 15s
 1 suites ± 0    0 💤  -  4 
 1 files   ± 0    0 ❌ ± 0 

Results for commit 55b8ecf. ± Comparison against base commit 32b1484.

This pull request removes 60 and adds 15 tests. Note that renamed tests count towards both.
tests.agents.test_init_scan ‑ test_init_scan
tests.agents.test_isolation ‑ test_endpoint_deisolation
tests.agents.test_isolation ‑ test_endpoint_isolation
tests.agents.test_isolation ‑ test_isolation_and_deisolation_actions
tests.deep_visibility.test_query ‑ test_list_remote_scripts_integration
tests.deep_visibility.test_query ‑ test_query
tests.deep_visibility.test_query ‑ test_query_canceled
tests.deep_visibility.test_query ‑ test_query_exhausted_retries
tests.deep_visibility.test_query ‑ test_query_failed
tests.iocs.test_create_iocs ‑ test_create_iocs
…
tests.test_helpers ‑ test_async_generator_converter
tests.test_helpers ‑ test_batched[abcde-2-expected1]
tests.test_helpers ‑ test_batched[abcde-3-expected0]
tests.test_helpers ‑ test_batched[iterable2-2-expected2]
tests.test_helpers ‑ test_download_batches_asynchronously
tests.test_helpers ‑ test_download_batches_synchronously
tests.test_mimecast_siem_logs ‑ test_authentication_failed
tests.test_mimecast_siem_logs ‑ test_fetch_batches
tests.test_mimecast_siem_logs ‑ test_old_cursor
tests.test_mimecast_siem_logs ‑ test_permission_denied
…

Copy link

codecov bot commented Mar 12, 2025

Codecov Report

Attention: Patch coverage is 89.13043% with 5 lines in your changes missing coverage. Please review.

Project coverage is 90.94%. Comparing base (32b1484) to head (55b8ecf).

Files with missing lines Patch % Lines
...mecast/mimecast_modules/connector_mimecast_siem.py 80.00% 2 Missing and 1 partial ⚠️
Mimecast/mimecast_modules/helpers.py 93.54% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1316      +/-   ##
==========================================
+ Coverage   89.83%   90.94%   +1.11%     
==========================================
  Files          77       77              
  Lines        3029     3048      +19     
  Branches      141      144       +3     
==========================================
+ Hits         2721     2772      +51     
+ Misses        257      224      -33     
- Partials       51       52       +1     
Flag Coverage Δ
Mimecast 89.34% <89.13%> (+10.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@squioc squioc force-pushed the fix/MimecastS3ObjectConsumption branch from 55b8ecf to 014be2b Compare March 13, 2025 14:43
@squioc
Copy link
Collaborator Author

squioc commented Mar 13, 2025

I close this PR in favor of #1318

@squioc squioc closed this Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant