-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix duplicated code format #126183
Fix duplicated code format #126183
Conversation
Who says that? Isn't that a bug of the formatter and we shouldn't cover up? |
The formatting provider is given the ranges one by one, so it process each one without any knowledge of the list of ranges. In some cases, it makes sense for the formatter to provide an edit spans further than the input range to avoid awkward formatting. See this example, some basic C++, formatting provider is clangd: #include <vector>
int main(int, char**)
{
std::vector<int> v = {
1, // A
2, // B
3, // C
};
return 0;
} I'll use the comments A, B and C to reference lines. #include <vector>
int main(int, char**)
{
std::vector<int> v = {
1, // A
2, // B
3, // C
};
return 0;
} Let's say I use git and I modified lines A and C, I use "Format modified lines". VSCode will ask clangd to format 2 ranges independently: one is line A, and the second is line C. For both, clangd will format lines A, B and C because it considers that these 3 lines form a single block that should be indented consistently. This results in 2 identical edits. Both what clangd and the user expect is pretty trivial, yet vscode will reject the formatting. Basically you have to make a choice, either you:
Note that I draw a line as to what should be allowed: edits must be strictly identical. It's not up to the IDE to fix an inconsistent formatter. |
That's a goal we all strive for but I think you meant "I am happy" because cases where edits overlap but aren't identical aren't covered. Different alternatives:
I believe that would work and is a more complete and pragmatic approach to the problem |
I like number 2., I'll implement that when I find some time |
28538e7
to
40f4aef
Compare
Here's my take on this |
model.getFormattingOptions(), | ||
cts.token | ||
); | ||
return (await workerService.computeMoreMinimalEdits(model.uri, rawEdits)) || []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
computeMoreMinimalEdits
could be costly to compute for very large documents (at least the prettier formatter always replaces the entire document).
Maybe you should check if the token is still alive after the provideDocumentRangeFormattingEdits
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added token check and called computeMoreMinimalEdits only at the end
if (hasIntersectingEdit(rawEditsList[i], rawEditsList[j])) { | ||
// Merge ranges i and j into a single range, recompute the associated edits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are 1000 modified ranges and the formatter reports 1000 edits for each format range request (which at least prettier does sometimes even for simple format range requests), you run 1000^4 operations here.
I think it should be fine if you just test if any two ranges touch the same line. Then you could use a set of touched line numbers and get an algorithm than runs in O(line numbers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also compute more minimal edits when you are done. There should be a guarantee that computing more minimal edits doesn't out grow the range (and therefore doesn't create overlap). Usually, extensions compute very few edits and making them more minimal often creates many, many edits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now compute minimal edits only at the end.
I also added a quick exit to the hasIntersectingEdit
function.
I think it should be fine if you just test if any two ranges touch the same line. Then you could use a set of touched line numbers and get an algorithm than runs in O(line numbers).
It's not that easy, not only do I need to know that 2 edits intersect, but I also need to know which 2 ranges are at the original of these, and the list of all edits produced by these 2 ranges so I can merged them.
So I need to iterate all combination of ranges in any case. That's already O(n^2). Then I need to know if the edits intersect. That's O(m^2). I guess the overall complexity of O(n^2*m^2) is what you refer to as O(n^4).
I don't think I can do much better while retaining correctness and completeness, at least not at the cost of an overly complicated algorithm (usefulness vs maintainability).
Now that minimal edits are only computed at the end, I expect the problem size to stay rather small, which means complexity should not be an issue that much.
If you're worried I still have a few tricks:
- Quick exit on
hasIntersectingEdits
, brings down O(m^2) down to O(m) in most cases (not really O then, agreed), worst case is not affected, but we improve most use cases while retaining correctness. (already implemented) - Assume only consecutive ranges can produce intersecting edits: although not guaranteed, I don't see any case where a sane formatter would not respect this. That would bring O(n^2) down to O(n) at the cost of potentially not merging 2 ranges...which would be just as good as what we're doing before this PR
- Skip this algorithm if the problem size is too big. Format range merging would still work as long as there are not too many edits. Again, strictly not worse than the current state of things. The problem with this is finding the correct threshold.
Let me know what you think, or if you have any other ideas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the overall complexity of O(n^2*m^2) is what you refer to as O(n^4).
Yes, if the number of edits equals the number of modified lines, we have n = m. Can you try out your algorithm on checker.ts when you edit roughly 10% of lines? Can you compare the time with how long it takes without this change (though I guess just formatting a single range in that file with prettier might be already slow)?
Quick exit on hasIntersectingEdits, brings down O(m^2) down to O(m) in most cases
That is a good idea!
Skip this algorithm if the problem size is too big. Format range merging would still work as long as there are not too many edits.
I think we should do this if it becomes slow for really large files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An easy trick would be to count of often we have spliced and merged ranges and simply stop in case a threshold has been reached. Tho, I would suggest we wait for that bug to be reported. I am happy with the changes and we can still improve it later.
@qchateau Thanks for contributing this PR! ❤️ It also fixes an issue of the prettier extension when formatting only modified lines. |
Nice to see we all have the same problems :D |
When formatting multiple ranges, there may be duplicated TextEdits.
Make sure to dedupe them to avoid encountering the error
"Overlapping ranges are not allowed!"