Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluesky.from_as1: extract any non-linked @-mention of a valid handle #890

Merged
merged 5 commits into from
Mar 3, 2025

Conversation

Daft-Freak
Copy link
Contributor

This is quite possibly a Bad Idea, but I've typed out the code so might as well open this.

My motivation here is that it's currently almost impossible to get a non-bridged users attention, but also that this is what would happen if you put the exact same text into a post using the bluesky app.

(This doesn't affect any existing tests, but I'm sure it'll have plenty of weird edge cases)

Copy link
Owner

@snarfed snarfed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, interesting! Hmm. I definitely get the motivation, and I definitely also agree that it seems a bit risky. I'm mixed on doing this in general, but I don't really have crisp arguments either for or against yet. I'll think about it.

A couple edge cases here come out of matching on HTML content, eg at least as written, 1) I think this would match a mention inside a link if the link text has leading whitespace, and 2) Mastodon sometimes interjects HTML tags inside text in odd ways, eg <span class="foo">@</span>bar.baz, which would trip this up.

If you instead do this later, on the converted Bluesky post's text field, I think that would fix those issues? And also check that you're not duplicating or overlapping a facet that already exists?

@snarfed
Copy link
Owner

snarfed commented Feb 19, 2025

Thanks, getting closer! We probably want to move it even later, after generating all other facets, and only do this if it doesn't overlap any of those facets.

I also wonder about this in a multi-protocol world. Apart from Bluesky, Bridgy Fed currently does fediverse and web, and we hope to add Nostr (snarfed/bridgy-fed#446), Farcaster (snarfed/bridgy-fed#447), and maybe others in the future. We don't really know the author's intent here, ie which user on which protocol they intend to @-mention, eg @domain.com could be a Bluesky handle, a Nostr NIP-05 handle, a Farcaster ENS domain, or something else entirely.

The mention inference in this PR will only happen when we bridge to Bluesky, and for DNS domains, the same person will own it regardless of which network we're on, so I think I'm still comfortable with this as is, but I'm still thinking about it. Lmk if you have any thoughts!

@Daft-Freak
Copy link
Contributor Author

Daft-Freak commented Feb 19, 2025

Thanks, getting closer! We probably want to move it even later, after generating all other facets, and only do this if it doesn't overlap any of those facets.

Hah, I realised that and was working on it when I found that there's already a bit of an issue with links with leading spaces. If you have <a href="..."> @you.com</a>:

  • Source.postprocess_object adds a mention tag with a displayName of @you.com
  • The link extraction after converting from HTML adds a link tag with a displayName of @you.com (leading space)

Anyway, I was about to commit the latest version.

@snarfed
Copy link
Owner

snarfed commented Feb 19, 2025

Lol yes, I noticed the space thing too, #890 (review), whee.

New version looks pretty good! Last thing I can think of right now: could you do the overlap check first, and only resolve the handle if it passes that check, to minimize external network requests? (Thanks for your patience with all these changes btw!)

I think that gets us to a change that we could reasonable merge. Should we, ie is this a net good feature to add...maybe? Thoughts?

@Daft-Freak
Copy link
Contributor Author

I'm not sure I have many other thoughts here, though I admit I'm mostly thinking from my perspective of Mastodon -> Bluesky...

This behaviour might be a little surprising sometimes I guess (If you wanted to write @some.domain for some other reason?), but then I could also consider not behaving like a post made through bsky.app surprising. 🤔

@Daft-Freak
Copy link
Contributor Author

Added an extra patch for links with leading whitespace (though that happens without these changes so I maybe should've put that in it's own PR)

@snarfed
Copy link
Owner

snarfed commented Feb 20, 2025

Oh thanks, nice!

Another minor thought here, we could switch to \b (word boundary special sequence) for the beginning and end of AT_MENTION_PATTERN, might be a bit more robust.

Not important though, can happen in a separate PR. Otherwise, let me think a bit more about this feature overall. cc @anujahooja too.

@snarfed
Copy link
Owner

snarfed commented Mar 3, 2025

Sorry for the long turnaround here @Daft-Freak due to #895 (comment), thank you for your patience! We decided we're comfortable with both of these features right now, at least fediverse => Bluesky, so I'll merge. Great changes and PRs (and tests!), thank you again!

@snarfed snarfed merged commit d136024 into snarfed:main Mar 3, 2025
2 checks passed
@Daft-Freak Daft-Freak deleted the patch-1 branch March 4, 2025 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants