Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORL-3223]: Disposable email accounts #4750

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from

Conversation

kabeaty
Copy link
Contributor

@kabeaty kabeaty commented Feb 24, 2025

What does this PR do?

These changes add an option to the admin's Configure --> Moderation --> Users section to turn on a feature that sets a new user's status to always pre-moderate comments if they use a disposable email address. The feature is disabled by default.

There is also a button added to the admin that allow users to refresh disposable email addresses in Redis. Otherwise, they will be updated at server startup.

These changes will impact:

  • commenters
  • moderators
  • admins
  • developers

What changes to the GraphQL/Database Schema does this PR introduce?

These changes add disposableEmailDomains to settings, with whether it is enabled or not.

Does this PR introduce any new environment variables or feature flags?

no

If any indexes were added, were they added to INDEXES.md?

n/a

How do I test this PR?

This PR can be tested by first starting up the server (which will grab disposable email domains and loading them into Redis).

You can see what is added by going into Redis: docker exec -it redis redis-cli
and then looking up :disposable keys: KEYS *:disposable

Go to admin and enable the disposable email domains feature under Configure --> Moderation --> Users --> Email domain.
Then attempt to register a new user that has an email domain that's included on the disposable emails list. See that this user is set to always pre-moderate. Test out local auth users and sso users with multi site test. Add an email from disposable emails to the Exceptions list under Email domain config. See that this email is now not pre-moderated when the disposable email domains feature is enabled.

(Note that if pre-moderation is removed by a moderator after being initially set, this should already be handled and they wouldn't be pre-modded again with the check that a user wasn't already premoderatedBecauseOfEmailAt.)

See that users with emails with domains not on the disposable emails list are created and not set to always pre-mod.

Press the Update disposable emails button in the admin. Then check a key in redis (pick a disposable domain + :disposable to check). See that it's set to an updated timestamp.

Go to admin and disable the disposable email domains feature. See that users, disposable email domain or not, are not set to always pre-moderate. Test local auth and sso.

Were any tests migrated to React Testing Library?

no

How do we deploy this PR?

@kabeaty kabeaty requested a review from nick-funk February 24, 2025 21:05
Copy link

netlify bot commented Feb 24, 2025

Deploy Preview for gallant-galileo-14878c canceled.

Name Link
🔨 Latest commit 554a1ef
🔍 Latest deploy log https://app.netlify.com/sites/gallant-galileo-14878c/deploys/67c096cb5c2ba200086b12fc

@@ -121,26 +95,43 @@ export const shouldPremodDueToLikelySpamEmail = async (
return false;
}

if (!user.email) {
Copy link
Contributor Author

@kabeaty kabeaty Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, consolidated some checks for user email, email split length, etc., into here. This way, each premod filter check can focus on its more specific check.


// if domain is included in protected email domains, we should
// not pre-moderate it
if (tenant?.protectedEmailDomains?.includes(domain)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updates so that protected email domains can't be pre-moderated.

redis: AugmentedRedis,
logger: Logger
) {
const now = new Date();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a check in here if the feature is enabled on any tenant. If not, we don't load read/add these to Redis. In that case, we'd make sure to always update/add disposable domains if/when the feature is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not a huge issue if it always loads them into redis on pod startup, but it's also nice if it reloads it when you flick the feature on?

I'm okay with either, as long as we never get into a state where the domains aren't loaded and redis is sitting with an empty key lookup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, kind of what I was thinking. I will leave as is for now so that it always loads on startup. There is the update button that can be used to update the domains if there does come to be an issue, as well.

`${domain}${DISPOSABLE_EMAIL_DOMAINS_REDIS_KEY}`,
now.toISOString(),
"EX",
DISPOSABLE_EMAIL_DOMAIN_CACHE_DURATION
Copy link
Contributor

@nick-funk nick-funk Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you set an expiry, there is a possibility of it being set, pods stay alive for a long while, then the keys disappear and we're hooped, there is no way to reload the domain lookup without a deployment

I think it might be wise to remove this expiry entirely and use an hset or similar that can be wiped every time it's updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I updated to remove the expiry and use hset here.

Also fyi -- there is a button in the admin in the configuration section for this feature that can be clicked to update the disposable domains, so we shouldn't be in a situation where there's no way to reload without a deployment :)


pipeline.on("data", async (data: { value: string }) => {
const domain = data.value;
await redis.set(
Copy link
Contributor

@nick-funk nick-funk Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll want to use a hash set lookup for these values instead of a pipeline:
https://redis.io/docs/latest/commands/hset/

Hash set look ups cache within contiguous memory and are made specifically to allow the scheduler to pre-load and keep those keys organized if there is a lot of traffic to that "set" of values. Whereas separate keys relies on random memory lookups, which are slower than discrete blocked memory (which an hset will be allocated by).

Also, if you're going to go the raw key set route and call it many, many times, it might be better to use mset instead of pipelining multiple commands, it's orders of magnitude faster and gentler on the write queue in redis as it runs as a bulk op:
https://redis.io/docs/latest/commands/mset/

An mset will also be atomic in its scheduling for writing to the redis memory cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the note/links on this! I updated to use the hset here, and it now gets wiped before being updated each time.

Copy link
Contributor

@nick-funk nick-funk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some suggestions for the disposableEmails service and how it interacts with Redis.

Comment on lines 13 to 45
export async function readDisposableEmailDomainsAndAddToRedis(
redis: AugmentedRedis,
logger: Logger
) {
const now = new Date().toISOString();

// clear previous disposable domains before loading in new
await redis.del(DISPOSABLE_EMAIL_DOMAINS_REDIS_KEY);

try {
const stream = await axios.get(`${DISPOSABLE_EMAIL_DOMAINS_LIST_URL}`, {
responseType: "stream",
});

const pipeline = stream.data.pipe(new Parser()).pipe(streamArray());

pipeline.on("data", async (data: { value: string }) => {
const domain = data.value;
await redis.hset(DISPOSABLE_EMAIL_DOMAINS_REDIS_KEY, domain, now);
});

pipeline.on("end", () => {
logger.info("Finished streaming disposable emails list");
});

pipeline.on("error", (error: any) => {
logger.error(
"Error reading and setting disposable emails in Redis:",
error
);
});
} catch (error) {
logger.error(
Copy link
Contributor

@nick-funk nick-funk Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this redis op needs to be wrapped into a transaction via multi to be atomic, otherwise, you're going to get pods wiping out the hash set (hset) via race conditions when they all spin up.

This is also going to hammer Redis with a lot of requests...

I wrote a snippet that might be useful to fix this issue:

try {
    // the list is 242 KB, even if it were 1000 times larger,
    // that's still less than 250 MB, well within a pods capability
    // might be okay to load with fetch
    //
    // if you want, can still use the axios pipeline stream to walk
    // through and append to the multi command below
    const response = await fetch(DISPOSABLE_EMAIL_DOMAINS_LIST_URL);
    if (!response) {
      logger.error("Error retrieving disposable emails from request");
      return;
    }

    const json = (await response.json()) as string[];
    if (!json) {
      logger.error(
        "Error reading disposable emails from request to json array"
      );
      return;
    }

    // need to use a transaction to make the delete followed by
    // populating the hash set an atomic operation so that
    // pods don't race condition delete the set out from under
    // the writes
    //
    // this also make only one request to redis instead of thousands
    // like the previous code would do
    const transaction = redis.multi();

    transaction.del(DISPOSABLE_EMAIL_DOMAINS_REDIS_KEY);

    for (const value of json) {
      transaction.hset(DISPOSABLE_EMAIL_DOMAINS_REDIS_KEY, value, now);
    }

    await transaction.exec();
  } catch (error) {
    logger.error(
      "Error streaming and setting disposable emails in Redis:",
      error
    );
  }

Copy link
Contributor

@nick-funk nick-funk Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, note that the current code, running without a multi transaction, will hammer Redis at startup on SBN.

We have ~175 pods and there are ~42,000 records in this domain list.

So, if we do the math, that means the current code will issue ~7,350,000 individual requests to redis at startup as all the pods start up. Redis can handle that, but we can easily be more optimal.

If we do a multi transaction instead, it will only issue 1 request per pod... which is 7,350,000 / 175, which is 42,000x more performant for deployment startup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the snippet and detailed feedback here! This all makes sense, and I've updated to use the multi transaction to fetch and add to redis. I updated to just use fetch as well.

Copy link
Contributor

@nick-funk nick-funk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment about the use of multi to transact wrap the populating of the Redis hash set for disposable emails.

Should avoid a race condition and make the code ~40,000 times faster with ~40,000 less network calls to Redis 😃

Copy link
Contributor

@nick-funk nick-funk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you for all the updates!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants