Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude specific line #316

Open
jakobod opened this issue Jul 30, 2021 · 21 comments
Open

Exclude specific line #316

jakobod opened this issue Jul 30, 2021 · 21 comments
Labels
C-enhancement Category: Raise on the bar on expectations S-waiting-on-design Status: Waiting on user-facing design to be resolved before implementing

Comments

@jakobod
Copy link

jakobod commented Jul 30, 2021

Hi!

I'm wondering why there is no directive that can be used to ignore a specific line? I have some files that have intentional typos in them that should be ignored, while the rest of the file should still be checked!
I'm thinking of a solution like clang-formats:

// clang-format off
...
// clang-format on

Did I not find that feature or has it been left out intentionally?

greetings!

@epage epage added the C-enhancement Category: Raise on the bar on expectations label Jul 30, 2021
@epage
Copy link
Collaborator

epage commented Jul 30, 2021

Just hasn't been implemented left.

Different approaches

  • File-level exclude
    • Already supported, does block spell checking the file name
    • Would be better with layered config, see Layered config #193
  • File-wide ignore directive
  • Line-trailing ignore directive
  • Ignore block directives, like mentioned above
  • External ignore directives
    • Looks like codespell uses this, seems like it'd be hard to maintain

Considerations for inline directives

  • Comment styles are language-specific
  • For global and trailing directives, we need to scan ahead

@davidsneighbour
Copy link

How about an ignore system the way tools like stylelint implements it:

  • ignore following line
  • ignore on and ignore off for blocks (multiple lines) of code

Sometimes there is an requirement of having "wrong" code and adding all these items (see #544) to the ignore list might lead to missed fixes down the road.

@Blacksmoke16
Copy link
Contributor

Following up on #613 (comment), my main use case is wanting to allow a specific word as allowed, while not preventing it from being flagged elsewhere.

Inline comment would be the most focused. From an implementation perspective, different comment syntax between languages shouldn't really matter if you just look for some unique string exists in a given line. That way each lang can start the comment with whatever syntax they use. Granted this would be harder if parsing is done token by token, not line by line however.

The start/end directive, or even as mentioned later a "ignore next line" could also work well, allowing the parser to be aware of what is coming up, versus trying to act upon something it already parsed.

Scoping extended-words in the config file to a specific line would be another solution that would provide the same benefit, but possibly easier to implement. However, it would be more likely to get out of sync from the code so 😕. Raising it up to specific file, could be a good middle ground, similar to #193, but a bit less verbose since you wouldn't need to add extra files in those contexts.

This is definitely something I'd love to see. Feels so hacky globally ignoring a misspelling when it only matters in a handful of contexts. E.g. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer

@neiljp
Copy link

neiljp commented Mar 9, 2023

I just started looking at using typos today, and it's very fast :)

Line-specific ignores are the main feature I'd look for that isn't present already. For me this mainly applies to tests, which can include intentional errors. I agree with others that excluding an entire file (or splitting content out into another file), or an entire word, is rather a large hammer right now.

I found this is also missing in codespell, though being worked on, and various options were brought up to integrate with multiple languages too (codespell-project/codespell#1212).

@epage
Copy link
Collaborator

epage commented Mar 9, 2023

Huh, seeing the discussion about using cspell's syntax is interesting. I like the idea of a cross-tool syntax, much like burntsushi helped create a cross-search tool ignore file.

I'm still not a fan of supporting different comment styles. If anything, I'd prefer a common shibboleth that works independent of comment styles.

epage added a commit to epage/typos that referenced this issue Mar 22, 2023
Typos primarily works off of identifiers and words.  We have built-in
support to detect constructs that span identifiers that should not be
spell checked, like UUIDs, emails, domains, etc.  This opens it up for
for user-defined identifier-spanning constructs using regexes via
`extend-ignore-re`.

This works differently than any of the previous ways of ignoring thing
because the regexes require extra parse passes.  Under the assumption
that (1) actual typos are rare and (2) number of files relying on
`extend-ignore-re` are rare, we only do these extra parse passes when a
typo is found, causing almost no performance hit in the expected case.

While this could be used for more generic types of ignores, it isn't the
most maintainable because it is separate from the source files in
question.  Ideally, we'd implement document settings / directives for
these cases (crate-ci#316).
@epage
Copy link
Collaborator

epage commented Mar 22, 2023

FYI #695 provides a new workaround for false positives

epage added a commit to epage/typos that referenced this issue Mar 22, 2023
Typos primarily works off of identifiers and words.  We have built-in
support to detect constructs that span identifiers that should not be
spell checked, like UUIDs, emails, domains, etc.  This opens it up for
for user-defined identifier-spanning constructs using regexes via
`extend-ignore-re`.

This works differently than any of the previous ways of ignoring thing
because the regexes require extra parse passes.  Under the assumption
that (1) actual typos are rare and (2) number of files relying on
`extend-ignore-re` are rare, we only do these extra parse passes when a
typo is found, causing almost no performance hit in the expected case.

While this could be used for more generic types of ignores, it isn't the
most maintainable because it is separate from the source files in
question.  Ideally, we'd implement document settings / directives for
these cases (crate-ci#316).
@Delgan
Copy link
Contributor

Delgan commented Sep 24, 2023

I believe that using inline exclusion is not a workable approach for a rather simple reason: it would be ineffective with file types that do not support comments, such as .json and .txt.

I think we need some kind of ignore configuration list that includes elements in the format "<file>:<line>:<word>" for example.

It's true that maintaining such a list will be cumbersome. However, to mitigate this problem, we could consider introducing a command-line option like --update-ignore-list that would conveniently regenerate the configuration file based on the detected typos.

I agree that inline comments have high appeal, but unfortunately they may not be universally applicable (although they could certainly complement the proposed approach).

@epage
Copy link
Collaborator

epage commented Sep 25, 2023

An "ignore all" mode sounds intriguing. I hesitate slightly because external excludes seems like a path of last resort and making it easy will likely incentivize people to not improve things, either within typos or in their repos.

I think in normal modes we should warn if we check a file and the ignore is invalid (not an error in case its reasonable for two people to run different typos versions on the same code base). If the file for an external exclude is within scope of the run and doesn't exist, we should probably make that a warning (not an error in case it is a "sometimes there" file).

If we can, the syntax for this should be easy for git add -p to split and add incrementally as our "interactive mode" just like with typos -w.

@Blacksmoke16
Copy link
Contributor

Blacksmoke16 commented Sep 26, 2023

it would be ineffective with file types that do not support comments, such as .json and .txt.

I don't think it worth sacrificing the feature just because some file types don't have comments. Supporting inline exclusion would work and be helpful for many other file types. I could see supporting something like an external list in addition to inline exclusions for those kinds of files. But I don't really want to have to deal with it when 99% of files I use support comments.

@epage
Copy link
Collaborator

epage commented Sep 26, 2023

To be clear, I do not see them as mutually exclusive but that we'd benefit from both

phip1611 pushed a commit to phip1611/typos that referenced this issue Dec 5, 2023
Typos primarily works off of identifiers and words.  We have built-in
support to detect constructs that span identifiers that should not be
spell checked, like UUIDs, emails, domains, etc.  This opens it up for
for user-defined identifier-spanning constructs using regexes via
`extend-ignore-re`.

This works differently than any of the previous ways of ignoring thing
because the regexes require extra parse passes.  Under the assumption
that (1) actual typos are rare and (2) number of files relying on
`extend-ignore-re` are rare, we only do these extra parse passes when a
typo is found, causing almost no performance hit in the expected case.

While this could be used for more generic types of ignores, it isn't the
most maintainable because it is separate from the source files in
question.  Ideally, we'd implement document settings / directives for
these cases (crate-ci#316).
@Lillecarl
Copy link

@epage it's a can of worms, but with tree-sitter you can run language specific parsers on filetypes and see what's actually a comment and what isn't. This could also be used for spicy options like

  • Don't spellcheck variables
  • Don't spellcheck function names
  • Don't spellcheck this scope
  • Don't spellcheck this class
  • Don't spellcheck this essentially

https://tree-sitter.github.io/tree-sitter/playground Have a look at the playground, it's amazing tech. Though it'd have to be a feature flag, tree-sitter is more correct than fast 😄

@epage
Copy link
Collaborator

epage commented Jan 30, 2024

I am also always put off from tree sitter in dealing with all of the individual plugins and allowing people to extend a program with more.

@Natim
Copy link

Natim commented Apr 3, 2024

We are also looking for this kind of feature on our side, currently, we add exceptions such as hd → has and KMS → km globally but it would be nicer to put online exceptions,

@epage
Copy link
Collaborator

epage commented Apr 3, 2024

Forgot to mention it in this issue but default.extend-ignore-re exists which can be used to make your own inline ignores.

Some suggested regexes (including line ignore and block ignore) can be found at https://github.com/crate-ci/typos/blob/master/docs/reference.md#config-fields

@liby
Copy link

liby commented Dec 19, 2024

Hi @epage 👋

I'm wondering if you've considered adding document directives support in typos by default, similar to Prettier's ignore? We could support three types of directives:

  1. Ignore next line:
# typos-ignore-next-line
my_mispeled_variable = 123
  1. Ignore current line:
my_mispeled_variable = 123 # typos-ignore-this-line
  1. Range ignore:
# typos-ignore-start
def handle_speical_case():
return "mispeled"
# typos-ignore-end

This could even be implemented based on the current extend-ignore-re mechanism, just covering common comment styles would be sufficient. For file types that don't support comments, they can still use extend-ignore-re.

I think this could serve as an "escape hatch" for users who:

  • Are not familiar with writing regex patterns

  • Don't want to create a separate config file just to ignore a specific case

  • Prefer a more straightforward, out-of-the-box solution

I'm curious about the current decision not to implement this - is it because you see potential issues with this approach, or are there other considerations I might be missing?

@epage
Copy link
Collaborator

epage commented Dec 19, 2024

Earlier in the thread I mentioned the problem of knowing the comment style for each language. Some of our extensions map to multiple languages. Some formats embed other languages. Some languages only support block comments and I'd like to avoid having a full parser for those and instead we'd probably limit what comment syntax is used.

Or as I said, maybe we go with a special sequence without checking for comments at all.

With giving people to control what they want, this is a lower priority issue for me for designing and implementing.

@chrisgrieser
Copy link

chrisgrieser commented Dec 19, 2024

Or as I said, maybe we go with a special sequence without checking for comments at all.

I think this is the reasonable way to go. No need to check whether something is written in a comment, just check for a sequence like typos-ignore-next-line, which I assume will not occur in a document naturally.

With giving people to control what they want, this is a lower priority issue for me for designing and implementing.

It may not be high priority, but I think it should be regarded as more important than "low priority", for various reasons:

  • Using the typos config file, you are not able to deal with cases where a string is a misspelling in one location and correct use in another location. For example, the sentence In Scrabble, the letters "ot" are useful ot do this and that. → the first "ot" is correct, the second a misspelling.
  • ignore-comments make a document more portable. You can copy a document to a different folder or even different project, without having to remember that you also need to copy the accompanying typos.toml.
  • Using an ignore regex may work, but is quite inconvenient and tedious if you have multiple words you want to ignore.
  • Ignore comments are in practice almost a "standard" feature that every modern linter or LSP supports. Out of all the tooling I use, I think typos is the only one that does not support it.
  • And finally, there seems to be quite a lot of demand for it – with currently 30 likes, it is one of most upvoted feature request among all open issues in this repo, if not the most upvoted one.

@kzaitsev
Copy link

First of all, thank you for the great tool.

I have an issue such as this:

error: `ba` should be `by`, `be`
  --> my_spec.rb:18:71
   |
18 |     sequence(:ssh_fp) { |n| "97:c1:12:7e:1c:88:f6:77:6c:9b:cf:8d:ba:bb:8e:f#{n}" }
   |                                                                       ^^
   |

Of course, I don't want to add "ba" as an exception to the dictionary, so I tried to use extend-ignore-re to exclude it as the pattern, but I faced a lack of examples. Writing such a regexp seems like a rocket since whenever you need to make an exception.

The ability to exclude the line will be very helpful.

@Delgan
Copy link
Contributor

Delgan commented Dec 19, 2024

@kzaitsev I believe you can workaround your issue by defining extend-ignore-re as follows:

extend-ignore-re = ["(?Rm)^.*(#|//)\\s*spellchecker:disable-line$"]

Then, in your code, add:

sequence(:ssh_fp) { |n| "97:c1:12:7e:1c:88:f6:77:6c:9b:cf:8d:ba:bb:8e:f#{n}" }  // spellchecker:disable-line

See examples in the documentation: typos Reference.

@liby
Copy link

liby commented Dec 19, 2024

Or as I said, maybe we go with a special sequence without checking for comments at all.

Regarding the special sequences approach, I see some potential challenges:

  • JSON and other files that don't support comments

  • Config files where the sequence might appear as valid content

  • String literals containing the sequence

  • Documentation about this feature

@epage epage added S-triage Status: New; needs maintainer attention. S-waiting-on-design Status: Waiting on user-facing design to be resolved before implementing and removed S-triage Status: New; needs maintainer attention. labels Feb 19, 2025
@DanCardin
Copy link

For what it's worth, preferring a # typos: ignore syntax akin to type: ignore/pyright: ignore etc; I set mine to something like this:

[tool.typos.default]
extend-ignore-re = [".*#.*typos: ignore[^\\n]*\\n"]

Idk if you're necessarily interested including such potentially opinionated language-specific regexes baked in, could the solution not "just" be to include default values for the file-specific variants? e.g.

[tool.typos.py]
extend-ignore-re = ...  # or ignore-re as a way to make it user-override-able if preferable?

etc, for each filetype that makes sense to have an inline-"comment" style ignore ability?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: Raise on the bar on expectations S-waiting-on-design Status: Waiting on user-facing design to be resolved before implementing
Projects
None yet
Development

No branches or pull requests