Idea: don't re-analyze files depending on their SHA #169

vemv · 2021-02-19T13:38:24Z

Problem statement

Any given linter (particularly Eastwood) can be slow.

For the use case of "running branch-formatter integrated with the repl", one can observe that one might be linting again files that haven't been modified.

For example:

I'm working in a long-lived branch that has touched 30 files
- accordingly, the branch-formatter strategy will dictate that 30 files must be analyzed per lint! invocation
But I'm only actively working in a smaller subset, like 1 or 2 files
- The other files touched by the branch may have been already analyzed, which makes re-analyzing them redundant.

Proposal

Implement a strategy that removes files if all these conditions are met:

the given file has already been successfully analyzed
- e.g. has warnings or empty warnings, but no exception was thrown
the SHA of the file's contents hasn't changed
- (File#lastModified can also work)
no linter warnings have been emitted for this particular file
- this is important - otherwise one would show linter warnings just once, and then elide them in subsequent runs because the file hadn't changed.

Caveats

It's plausible that fixing a given file's linting result does not depend entirely on the file itself: fixing it might be accomplished by modifying a different file.

That could cause stale caches or such.

However I'm not immediately aware if such a case. Still worth a think, on a per-linter basis.

Applicability

This optimization is only possibly useful in a local dev repl, so out of caution, I would not add the proposed strategy to the stack if (System/getenv "CI").

Other

I think that the cache should be keyed per git-branch and possibly deleted on each detected git branch change anyway. This is a basic caution against the mentioned Caveats.

Alternatives and comparison

Cache linters' reports. e.g. (caching-linter/new (linters.kondo/new {}))

Might perform better.

cc/ @thumbnail

The text was updated successfully, but these errors were encountered:

vemv · 2021-03-19T12:33:18Z

@thumbnail I'm tempted to give a shot to the alternative you suggested. I have the vague feeling that it can be something thin to implement.

IOW, this wouldn't conflict with architectural redesigns - we could always scrap my work, but keep the integration tests.

Assuming the resulting PR is indeed thin, does this plan SGTY?

thumbnail · 2021-03-19T17:35:40Z

I think a wrapper is the most viable option. Simply using lastModified as a cache key to prevent files from being linted, and merge previous/cached reports in the end result.

I think the branch/project formatter shouldn't use the cached versions by default, that way CI doesn't either.

vemv mentioned this issue Mar 30, 2021

Eastwood: always use absolute filenames, internally #187

Draft

26 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: don't re-analyze files depending on their SHA #169

Idea: don't re-analyze files depending on their SHA #169

vemv commented Feb 19, 2021 •

edited

Loading

vemv commented Mar 19, 2021

thumbnail commented Mar 19, 2021

Idea: don't re-analyze files depending on their SHA #169

Idea: don't re-analyze files depending on their SHA #169

Comments

vemv commented Feb 19, 2021 • edited Loading

Problem statement

Proposal

Caveats

Applicability

Other

Alternatives and comparison

vemv commented Mar 19, 2021

thumbnail commented Mar 19, 2021

vemv commented Feb 19, 2021 •

edited

Loading