Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: don't re-analyze files depending on their SHA #169

Open
vemv opened this issue Feb 19, 2021 · 2 comments
Open

Idea: don't re-analyze files depending on their SHA #169

vemv opened this issue Feb 19, 2021 · 2 comments

Comments

@vemv
Copy link
Contributor

vemv commented Feb 19, 2021

Problem statement

Any given linter (particularly Eastwood) can be slow.

For the use case of "running branch-formatter integrated with the repl", one can observe that one might be linting again files that haven't been modified.

For example:

  • I'm working in a long-lived branch that has touched 30 files
    • accordingly, the branch-formatter strategy will dictate that 30 files must be analyzed per lint! invocation
  • But I'm only actively working in a smaller subset, like 1 or 2 files
    • The other files touched by the branch may have been already analyzed, which makes re-analyzing them redundant.

Proposal

Implement a strategy that removes files if all these conditions are met:

  • the given file has already been successfully analyzed
    • e.g. has warnings or empty warnings, but no exception was thrown
  • the SHA of the file's contents hasn't changed
    • (File#lastModified can also work)
  • no linter warnings have been emitted for this particular file
    • this is important - otherwise one would show linter warnings just once, and then elide them in subsequent runs because the file hadn't changed.

Caveats

It's plausible that fixing a given file's linting result does not depend entirely on the file itself: fixing it might be accomplished by modifying a different file.

That could cause stale caches or such.

However I'm not immediately aware if such a case. Still worth a think, on a per-linter basis.

Applicability

This optimization is only possibly useful in a local dev repl, so out of caution, I would not add the proposed strategy to the stack if (System/getenv "CI").

Other

I think that the cache should be keyed per git-branch and possibly deleted on each detected git branch change anyway. This is a basic caution against the mentioned Caveats.

Alternatives and comparison

Cache linters' reports. e.g. (caching-linter/new (linters.kondo/new {}))

Might perform better.

cc/ @thumbnail

@vemv
Copy link
Contributor Author

vemv commented Mar 19, 2021

@thumbnail I'm tempted to give a shot to the alternative you suggested. I have the vague feeling that it can be something thin to implement.

IOW, this wouldn't conflict with architectural redesigns - we could always scrap my work, but keep the integration tests.

Assuming the resulting PR is indeed thin, does this plan SGTY?

@thumbnail
Copy link
Member

I think a wrapper is the most viable option. Simply using lastModified as a cache key to prevent files from being linted, and merge previous/cached reports in the end result.

I think the branch/project formatter shouldn't use the cached versions by default, that way CI doesn't either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants