Fix duplicated code format #126183

qchateau · 2021-06-13T21:43:15Z

When formatting multiple ranges, there may be duplicated TextEdits.
Make sure to dedupe them to avoid encountering the error
"Overlapping ranges are not allowed!"

jrieken · 2021-06-15T07:48:28Z

When formatting multiple ranges, there may be duplicated TextEdits.

Who says that? Isn't that a bug of the formatter and we shouldn't cover up?

qchateau · 2021-06-15T19:59:10Z

The formatting provider is given the ranges one by one, so it process each one without any knowledge of the list of ranges. In some cases, it makes sense for the formatter to provide an edit spans further than the input range to avoid awkward formatting.
Problems arise when there 2 of the input ranges are close to each other, and when the resulting edit for each one includes the other. That is totally fine as long as both edits are identical.

See this example, some basic C++, formatting provider is clangd:

#include <vector>

int main(int, char**)
{
    std::vector<int> v = {
                1, // A
            2, // B
                 3, // C
    };
    return 0;
}

I'll use the comments A, B and C to reference lines.
For reference, clangd wants to format like this (not lines A, B and C changed):

#include <vector>

int main(int, char**)
{
    std::vector<int> v = {
        1, // A
        2, // B
        3, // C
    };
    return 0;
}

Let's say I use git and I modified lines A and C, I use "Format modified lines". VSCode will ask clangd to format 2 ranges independently: one is line A, and the second is line C. For both, clangd will format lines A, B and C because it considers that these 3 lines form a single block that should be indented consistently. This results in 2 identical edits. Both what clangd and the user expect is pretty trivial, yet vscode will reject the formatting.

Basically you have to make a choice, either you:

do nothing
- users (me) cannot format the document at all, and are spammed with pop-ups
require formatters to be strict about range
- users will be mad because they end up with an awkward formatting, which may be even worse than no formatting at all
allow and merge strictly identical edits
- users are happy and will send you love

Note that I draw a line as to what should be allowed: edits must be strictly identical. It's not up to the IDE to fix an inconsistent formatter.

jrieken · 2021-06-16T09:00:48Z

users are happy and will send you love

That's a goal we all strive for but I think you meant "I am happy" because cases where edits overlap but aren't identical aren't covered. Different alternatives:

Add new API like DocumentRange_S_FormattingEditProvider which is always called with an array of ranges to format. This would be the best solution because all the control and power is with extensions. However, it will be a new API and it is unlikely that extensions adopt this and we still stuck with how to use the other, single range, API
A different idea is this: Check for what ranges the returned edits overlap, then discard those edits, merge the ranges into a single, bigger range, and ask for edits again. In your sample above that would mean:
1. make format request for line A, make format request for line C
2. detect overlap between the edits for A and C
3. merge the ranges into a single range A,B,C and ask again for edits
4. repeat until edits are free of overlap or when only a single range is left

I believe that would work and is a more complete and pragmatic approach to the problem

qchateau · 2021-06-16T17:07:06Z

I like number 2., I'll implement that when I find some time

qchateau · 2021-06-19T13:45:23Z

Here's my take on this

hediet · 2021-06-21T07:55:10Z

src/vs/editor/contrib/format/format.ts

+			model.getFormattingOptions(),
+			cts.token
+		);
+		return (await workerService.computeMoreMinimalEdits(model.uri, rawEdits)) || [];


computeMoreMinimalEdits could be costly to compute for very large documents (at least the prettier formatter always replaces the entire document).

Maybe you should check if the token is still alive after the provideDocumentRangeFormattingEdits call?

Added token check and called computeMoreMinimalEdits only at the end

hediet · 2021-06-21T09:08:00Z

src/vs/editor/contrib/format/format.ts

+				if (hasIntersectingEdit(rawEditsList[i], rawEditsList[j])) {
+					// Merge ranges i and j into a single range, recompute the associated edits


If there are 1000 modified ranges and the formatter reports 1000 edits for each format range request (which at least prettier does sometimes even for simple format range requests), you run 1000^4 operations here.

I think it should be fine if you just test if any two ranges touch the same line. Then you could use a set of touched line numbers and get an algorithm than runs in O(line numbers).

You could also compute more minimal edits when you are done. There should be a guarantee that computing more minimal edits doesn't out grow the range (and therefore doesn't create overlap). Usually, extensions compute very few edits and making them more minimal often creates many, many edits

I now compute minimal edits only at the end.
I also added a quick exit to the hasIntersectingEdit function.

I think it should be fine if you just test if any two ranges touch the same line. Then you could use a set of touched line numbers and get an algorithm than runs in O(line numbers).

It's not that easy, not only do I need to know that 2 edits intersect, but I also need to know which 2 ranges are at the original of these, and the list of all edits produced by these 2 ranges so I can merged them.
So I need to iterate all combination of ranges in any case. That's already O(n^2). Then I need to know if the edits intersect. That's O(m^2). I guess the overall complexity of O(n^2*m^2) is what you refer to as O(n^4).
I don't think I can do much better while retaining correctness and completeness, at least not at the cost of an overly complicated algorithm (usefulness vs maintainability).

Now that minimal edits are only computed at the end, I expect the problem size to stay rather small, which means complexity should not be an issue that much.
If you're worried I still have a few tricks:

Quick exit on hasIntersectingEdits, brings down O(m^2) down to O(m) in most cases (not really O then, agreed), worst case is not affected, but we improve most use cases while retaining correctness. (already implemented)

Assume only consecutive ranges can produce intersecting edits: although not guaranteed, I don't see any case where a sane formatter would not respect this. That would bring O(n^2) down to O(n) at the cost of potentially not merging 2 ranges...which would be just as good as what we're doing before this PR

Skip this algorithm if the problem size is too big. Format range merging would still work as long as there are not too many edits. Again, strictly not worse than the current state of things. The problem with this is finding the correct threshold.

Let me know what you think, or if you have any other ideas

I guess the overall complexity of O(n^2*m^2) is what you refer to as O(n^4).

Yes, if the number of edits equals the number of modified lines, we have n = m. Can you try out your algorithm on checker.ts when you edit roughly 10% of lines? Can you compare the time with how long it takes without this change (though I guess just formatting a single range in that file with prettier might be already slow)?

Quick exit on hasIntersectingEdits, brings down O(m^2) down to O(m) in most cases

That is a good idea!

Skip this algorithm if the problem size is too big. Format range merging would still work as long as there are not too many edits.

I think we should do this if it becomes slow for really large files.

An easy trick would be to count of often we have spliced and merged ranges and simply stop in case a threshold has been reached. Tho, I would suggest we wait for that bug to be reported. I am happy with the changes and we can still improve it later.

hediet · 2021-06-23T09:43:54Z

@qchateau Thanks for contributing this PR! ❤️ It also fixes an issue of the prettier extension when formatting only modified lines.

jrieken · 2021-06-23T09:46:14Z

Yes, many thanks to @qchateau. You creating the PR was a lucky coincidence because @hediet and me had brainstormed about this exact problem a day earlier.

qchateau · 2021-06-24T17:12:43Z

Nice to see we all have the same problems :D

sandy081 assigned jrieken Jun 14, 2021

jrieken added the info-needed Issue requires more information from poster label Jun 15, 2021

jrieken added formatting Source formatter issues and removed info-needed Issue requires more information from poster labels Jun 16, 2021

Merge formatting ranges that result in conflicting edits

40f4aef

qchateau force-pushed the fix-duplicated-format branch from 28538e7 to 40f4aef Compare June 19, 2021 13:44

hediet reviewed Jun 21, 2021

View reviewed changes

jrieken added this to the June 2021 milestone Jun 21, 2021

hediet reviewed Jun 21, 2021

View reviewed changes

Optimize format range merging

974fce6

jrieken merged commit 7a06a53 into microsoft:main Jun 23, 2021

github-actions bot locked and limited conversation to collaborators Aug 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicated code format #126183

Fix duplicated code format #126183

qchateau commented Jun 13, 2021

jrieken commented Jun 15, 2021

qchateau commented Jun 15, 2021

jrieken commented Jun 16, 2021

qchateau commented Jun 16, 2021

qchateau commented Jun 19, 2021

hediet Jun 21, 2021

qchateau Jun 21, 2021

hediet Jun 21, 2021

jrieken Jun 21, 2021

qchateau Jun 21, 2021

hediet Jun 22, 2021 •

edited

Loading

jrieken Jun 23, 2021

hediet commented Jun 23, 2021

jrieken commented Jun 23, 2021

qchateau commented Jun 24, 2021

		if (hasIntersectingEdit(rawEditsList[i], rawEditsList[j])) {
		// Merge ranges i and j into a single range, recompute the associated edits

Fix duplicated code format #126183

Fix duplicated code format #126183

Conversation

qchateau commented Jun 13, 2021

jrieken commented Jun 15, 2021

qchateau commented Jun 15, 2021

jrieken commented Jun 16, 2021

qchateau commented Jun 16, 2021

qchateau commented Jun 19, 2021

hediet Jun 21, 2021

Choose a reason for hiding this comment

qchateau Jun 21, 2021

Choose a reason for hiding this comment

hediet Jun 21, 2021

Choose a reason for hiding this comment

jrieken Jun 21, 2021

Choose a reason for hiding this comment

qchateau Jun 21, 2021

Choose a reason for hiding this comment

hediet Jun 22, 2021 • edited Loading

Choose a reason for hiding this comment

jrieken Jun 23, 2021

Choose a reason for hiding this comment

hediet commented Jun 23, 2021

jrieken commented Jun 23, 2021

qchateau commented Jun 24, 2021

hediet Jun 22, 2021 •

edited

Loading