You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into an issue where reading in json file of filepaths would sometimes duplicate text when running it through tokens. The following code when run with the attached sample text reproduces the issue. The second token is parsed as \pathpath\to\some\other\file.txt instead of \path\to\some\other\file.txt with chunkSize of 55
I added some code to print out the chunks to see where they start and stop and found that when the chunks end with an escape character the text just before it gets duplicated on the next chunk.
thank you for reporting the issue and for the reproducer!
I think I found the issue and fixed in #516, at least it fixes your example. Let's see what @satabin has to say when reviewing, maybe there are even more edge cases like it.
I ran into an issue where reading in json file of filepaths would sometimes duplicate text when running it through tokens. The following code when run with the attached sample text reproduces the issue. The second token is parsed as \pathpath\to\some\other\file.txt instead of \path\to\some\other\file.txt with chunkSize of 55
Sample.txt
I added some code to print out the chunks to see where they start and stop and found that when the chunks end with an escape character the text just before it gets duplicated on the next chunk.
The text was updated successfully, but these errors were encountered: