Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ReplaceLineEndings and EnumerateLines #53115

Merged
merged 4 commits into from
Jun 7, 2021

Conversation

GrabYourPitchforks
Copy link
Member

Resolves #40315.

Note to reviewers - the API surface, devdoc, and general behavior are complete. However, the implementation is naïve, as it's just a simple iterator instead of SIMD-optimized. The reason for this is that there's currently no overload of IndexOfAny that fulfills the algorithmic complexity requirements we have.

Once #49397 comes online it should be fairly straightforward to SIMD-optimize this without introducing layers of ISA special-casing or littering the workhorse method with unsafe code. But for now, let's get the basic functionality out and worry about optimizing it later.

@ghost
Copy link

ghost commented May 22, 2021

Tagging subscribers to this area: @tannergooding
See info in area-owners.md if you want to be subscribed.

Issue Details

Resolves #40315.

Note to reviewers - the API surface, devdoc, and general behavior are complete. However, the implementation is naïve, as it's just a simple iterator instead of SIMD-optimized. The reason for this is that there's currently no overload of IndexOfAny that fulfills the algorithmic complexity requirements we have.

Once #49397 comes online it should be fairly straightforward to SIMD-optimize this without introducing layers of ISA special-casing or littering the workhorse method with unsafe code. But for now, let's get the basic functionality out and worry about optimizing it later.

Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Runtime

Milestone: 6.0.0

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@GrabYourPitchforks
Copy link
Member Author

I need to send a follow-up iteration which removes some of the duplicated code and calls these new APIs instead.

@GrabYourPitchforks
Copy link
Member Author

Changes in latest iteration

  • Removed custom iterator logic from NewLineUtility now that Reduce worst-case alg complexity of MemoryExtensions.IndexOfAny #53652 is checked in, and just riding directly atop IndexOfAny
  • Small optimization in string.ReplaceNewLines to avoid two memcpy operations
  • Updated unit tests which had their own custom line split logic and had them ride atop the newly introduced methods

This last item in particular simplifies some of the unit test logic (particularly noticeable for System.Data). I only updated unit tests that appeared that they were not testing newline behavior as part of a protocol. For example, RFC 2616 (HTTP/1.1) gives a specific definition of newline chars which is incompatible with the Unicode's definition of newline chars, so I didn't update the System.Net.Http unit tests to use the new helper.

Area owners, please verify that the changes I made to your unit tests are valid.
(/cc a bunch of people)

s = s.Replace("\r", "");
return s;
}
private static string NormalizeNewLines(string s) => s.ReplaceLineEndings("");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would s.ReplaceLineEndings() work the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I just read the docs says it will replace with Environment.NewLine but I personally don't find it particularly obvious (I was expecting to remove them)

Copy link
Member

@krwq krwq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML part looks good to me

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkInformation LGTM

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: from the json perspective

@GrabYourPitchforks
Copy link
Member Author

Thanks all for the input! I'm going ahead and committing this for now, assuming everything is ok.

System.Data crew (@ajcvickers @cheenamalhotra @David-Engel), if this feels inappropriate then please ping me, or open a new tracking issue, or simply revert the relevant portions of this commit. Whatever is easiest for you all. :)

@GrabYourPitchforks GrabYourPitchforks merged commit ad41de4 into dotnet:main Jun 7, 2021
@GrabYourPitchforks GrabYourPitchforks deleted the str_lineend branch June 7, 2021 20:29
@David-Engel
Copy link
Contributor

if this feels inappropriate then please ping me, or open a new tracking issue, or simply revert the relevant portions of this commit. Whatever is easiest for you all. :)

@GrabYourPitchforks Looks good to me. Plus the new code is so much cleaner! :)

@ghost ghost locked as resolved and limited conversation to collaborators Jul 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] string.NormalizeLineEndings
8 participants