Fix character offset and byte offset problem in No-English language and issuse #6 #8

owent · 2016-04-25T08:20:29Z

detail: microsoft/vscode#5735

We should replace this code if microsoft/vscode#5735 is accepted.

> detail: microsoft/vscode#5735 > > We should replace this code if microsoft/vscode#5735 is accepted.

owent · 2016-04-25T09:01:59Z

@xaverh please notice that "buffer": "^4.6.0" is in vscode 1.0.0, I'm noe sure which version is it in vscode 0.10.10. If the dependency of vscode version shoud be changed to 1.0.0?

@ioachim This PR solve #6 and #7.

ioachim · 2016-04-25T12:48:02Z

@owt5008137 Thanks for correcting the issue! Is there any issue why you didn't use the Node Buffer class, but the buffer module? That way we could decode the string just once (as you can send the buffer directly to ChildProcess.stdin, instead of the string, avoiding encoding the string twice.

xaverh · 2016-04-25T13:06:20Z

@owt5008137 Thanks for your help. Very appreciated! I pushed a new version to the Marketplace.

owent · 2016-04-25T13:13:21Z

@ioachim buffer is just the Node Buffer class.
Sorry I didn't read the document https://nodejs.org/dist/latest-v5.x/docs/api/buffer.html carefully. It seems not need to use

import buffer = require('buffer')

I'm not sure if I can use all the node.js inner modules, so I add it to package.json. Just remove it if it's already included in VSCode.Decoding the string twice means VSCode decode the string once and this plugin decode it again.

If the text in VSCode editor is not a utf8 string, the getText api still get a utf8 content, so it must be decoded and encoded to utf8 once.

I have read some code in clang-format, it seems only support utf-8.And it only accepts byte offset and the output is also byte offset and byte length.But VSCode's API only accept character offset and length.

So we still need to know what's the character offset of the byte offset(a utf8 character has 1-6 bytes), and then I decode it again. What I can only do is remember last relationship of byte offset and character offset to avoid decode the same content more than once.

Of course it's better if we can use VSCode's decode result, but I did not find and API to get these.Do you have any idea to do it?

ioachim · 2016-04-25T13:39:46Z

The decoding is needed in any case, when writing to stdin it is done internally.

The codeContent parameter should be the Buffer instead of a string, and the value given to it should be the same written to the clang-format stdin, to avoid decoding the whole source twice.

What I started writing was code to get the char-offset from the byte-offset by reading the buffer directly, instead of encoding back to string, by parsing each byte of the slice. That way we avoid allocating a string each time we want to measure the length in characters.

owent · 2016-04-25T14:00:14Z

@ioachim

var codeBuffer = new Buffer(codeContent);

This code also run only once every time we format the content.
I'm not sure, but all the buffer objects created by slice(start, end) should be reference to the same buffer block.Because them are all readonly.The only problem is I created so many strings.

Your solution seems a better way by converting byte-offset to char-offset one by one.But I wonder how many memory it will cost to make index of every byte-offset.Or just index every diffrent length? Is it will cost O(log(n)) to get the char-offset from the byte-offset?

ioachim · 2016-04-25T14:17:56Z

About the first part: yes, slice() makes a reference to the original buffer, not a copy. Passing the buffer is a just a minor improvement.

About the string allocation thing, my proposal wasn't to save each position, as you say it doesn't make much sense cost wise, but replacing the buffer.toString('utf8').length calls with calls to a function that gets the length by checking the bytes in that slice.

owent · 2016-04-25T14:31:18Z

@ioachim Oh, I see.It's really a good idea.
It's really no need to allocate so many string objects or make any index.Especially clang-format always has a monotonic increasing output.

ioachim · 2016-04-25T14:32:46Z

@owt5008137 I'll write that changes sometime this week 👍

owent · 2016-04-25T15:07:36Z

@ioachim OK,I will pull your code then.

owent added 2 commits April 25, 2016 16:19

Fix character offset and byte offset problem in No-English language.

86e773b

> detail: microsoft/vscode#5735 > > We should replace this code if microsoft/vscode#5735 is accepted.

fix [issues xaverh#6](xaverh#6)

f407966

owent changed the title ~~Fix character offset and byte offset problem in No-English language.~~ Fix character offset and byte offset problem in No-English language and issuse #6 Apr 25, 2016

This was referenced Apr 25, 2016

clang-format.style in Windows? #6

Closed

Formatting problem with special characters #7

Closed

xaverh merged commit 83d8fca into xaverh:master Apr 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix character offset and byte offset problem in No-English language and issuse #6 #8

Fix character offset and byte offset problem in No-English language and issuse #6 #8

owent commented Apr 25, 2016

owent commented Apr 25, 2016

ioachim commented Apr 25, 2016

xaverh commented Apr 25, 2016

owent commented Apr 25, 2016 •

edited

Loading

ioachim commented Apr 25, 2016 •

edited

Loading

owent commented Apr 25, 2016 •

edited

Loading

ioachim commented Apr 25, 2016

owent commented Apr 25, 2016

ioachim commented Apr 25, 2016

owent commented Apr 25, 2016

Fix character offset and byte offset problem in No-English language and issuse #6 #8

Fix character offset and byte offset problem in No-English language and issuse #6 #8

Conversation

owent commented Apr 25, 2016

owent commented Apr 25, 2016

ioachim commented Apr 25, 2016

xaverh commented Apr 25, 2016

owent commented Apr 25, 2016 • edited Loading

ioachim commented Apr 25, 2016 • edited Loading

owent commented Apr 25, 2016 • edited Loading

ioachim commented Apr 25, 2016

owent commented Apr 25, 2016

ioachim commented Apr 25, 2016

owent commented Apr 25, 2016

owent commented Apr 25, 2016 •

edited

Loading

ioachim commented Apr 25, 2016 •

edited

Loading

owent commented Apr 25, 2016 •

edited

Loading