-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore regex source generator #7598
Comments
It would be nice if the .NET Framework version of msbuild could go away for .NET Core based ones and have Visual Studio use the one from only the .NET SDK so it does not have to bundle a version of msbuild as well to where the .NET Core msbuild can compile both .NET SDK style projects and non-sdk style projects. This would also help improve the quality of msbuild (and Visual Studio) as a whole. |
@rainersigwald How much of perf. win do you expect here? |
It is mostly a startup win. |
I wouldn't expect a huge win on anything, but some of the pattern matches are in warm inner loops:
As Dan mentioned, startup time is reduced and there may be possibilities for inlining/JIT magic. I wish we could use the generator for our .NET Framework target for unification, but alas. |
Cc @stephentoub fyi |
Yeah, you'll end up wanting to do something like: #if NET7_0_OR_GREATER
[RegexGenerator(Pattern)]
public static partial Regex AwesomeSauce();
#else
public static Regex AwesomeSauce() => s_regex;
private static readonly Regex s_regex = new Regex(Pattern, RegexOptions.Compiled);
#endif |
Just noting this is blocked on #7790 |
Looks like #7790 has been merged, so this is unblocked? 🙂 |
@Bosch-Eli-Black Yes, but to be clear it remains low-priority for the core team. |
@rainersigwald Understood 🙂 |
@Bosch-Eli-Black I expect they'd accept a PR? (and related #7499 too) |
@danmoseley Haha, ah, yes, I'd love to! 🙂 I've just begun working through my company's procedures for getting authorization to contribute to open source projects, though, and I'm afraid that may take a very long time indeed. |
Do you need any help I don't mind helping so just let me know if you do cuz I just did one of myself last night my project I got done with it last night |
@nikijohnson77 We'd love some! Thanks for volunteering; I just assigned it to you. Submit a PR whenever you're ready, and feel free to ask questions. |
Do you have a definition to distinguish different target currently? such as NET7_0_OR_GREATER. @Forgind |
|
only if the project targets .NET 7 in its list of target frameworks otherwise the maximum defined is |
In the MSBuild codebase we generally try to avoid generic ifdefs like msbuild/src/Directory.BeforeCommon.targets Lines 88 to 95 in ee6f71d
This is helpful because it self-documents why the code is conditionally compiled; while I don't expect it in this case it's conceivable that the regex source generator could someday work on .NET Framework 4.7.2 targets, and if it did adopting it would mean searching for the specific feature, rather than having to look at all net7+ stuff. |
Highly unlikely ;-) |
But if I keep whining about it in this thread . . . 😇 We got most of our benefit from the granular ifdefs in the other direction: there were lots of features that were not in .NET Core 1.0 that got ported later, and the ifdefs let us gradually reunify a bunch of code. I still prefer the granular-ifdef strategy for its "documentation" of why things are conditional. |
Your voice will get tired. 😛 |
Alternatively, one can't stop them from defining their own version of the .NET 7 attribute for the source generator and install the generator as a nuget package on non-.NET 7 targets, right? If so, that might also be an option to reunify the code. |
The generator emits code that uses APIs new to .NET 7, including how the generated code plugs into Regex itself. |
If the pattern is relatively straightforward and stable another option is to inspect the generated code and adjust it to make your own hardcoded pattern matcher, no Regex involved. |
Just noting that I edited the OP to add a new location, where startup time might be a noticeable win: in solution parsing msbuild/src/Build/Construction/Solution/SolutionFile.cs Lines 37 to 70 in 01532a9
Since that's one of the first things we do in a |
Reducing JITting is also helpful if Native AOT is ever a goal. (But given MSBuild reflects over assemblies it dynamically discovers, perhaps that would be challenging..) |
@nikijohnson77 how is it going on this? |
Wait why is the regex used for solution parsing, if only there was a library that msbuild could use that uses a much better way of parsing solution files that is just as fast as .NET 7's regex. |
@AraHaan the change to use a source generated regex is quite localized and straightforward. If you're proposing replacing the component, that is best discussed in an issue specific to that (which I'm guessing exists) |
OK, so I played with this and it's quite doable, but a little less than straightforward because of the complexity of MSBuild regexes, and the fact that they are built up from strings (which sometimes themselves are built up from strings). As an extreme example, take this regex msbuild/src/Build/Evaluation/Expander.cs Lines 3040 to 3049 in a6f6699
which includes strings that are themselves built up, eg., msbuild/src/Build/Evaluation/Expander.cs Line 3060 in a6f6699
The generator can work on this, creating [GeneratedRegex("((?<=@\\(\\s*[A-Za-z_][A-Za-z_0-9\\-]*\\s*->\\s*'[^']*)%\\(\\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\)(?![^']*'(\\s*,\\s*'[^']*')?\\s*\\)))|((?<!@\\(\\s*[A-Za-z_][A-Za-z_0-9\\-]*\\s*->\\s*'[^']*)%\\(\\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\)(?=[^']*'(\\s*,\\s*'[^']*')?\\s*\\)))|((?<!@\\(\\s*[A-Za-z_][A-Za-z_0-9\\-]*\\s*->\\s*'[^']*@)%\\(\\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\)(?![^']*'(\\s*,\\s*'[^']*')?\\s*\\)))",RegexOptions.ExplicitCapture|RegexOptions.Compiled|RegexOptions.IgnorePatternWhitespace)] It's probably worth converting to a verbatim string (put a @ at the front and replace \ with ) in order to make them more readable. I opened suggestion dotnet/runtime#79895 for that. now I have [GeneratedRegex(@"((?<=@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']*)%\(\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\-]*)\s*\)(?![^']*'(\s*,\s*'[^']*')?\s*\)))|((?<!@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']*)%\(\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\-]*)\s*\)(?=[^']*'(\s*,\s*'[^']*')?\s*\)))|((?<!@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']*@)%\(\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\-]*)\s*\)(?![^']*'(\s*,\s*'[^']*')?\s*\)))",RegexOptions.ExplicitCapture|RegexOptions.Compiled|RegexOptions.IgnorePatternWhitespace)] The names of the substring variables had been acting as self documentation, so potentially one could manually add those as comments, something like this: [GeneratedRegex(@"
((?<=@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']* # ItemVectorWithTransformLHS
)
%\(\s* (?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)? (?<NAME>[A-Za-z_][A-Za-z_0-9\-]*) \s*\) # ItemMetadataSpecification
(?!
[^']*'(\s*,\s*'[^']*')?\s*\) # ItemVectorWithTransformRHS
)) | ((?<!
@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']* # ItemVectorWithTransformLHS
)
%\(\s* (?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)? (?<NAME>[A-Za-z_][A-Za-z_0-9\-]*) \s*\) # ItemMetadataSpecification
(?=
[^']*'(\s*,\s*'[^']*')?\s*\) # ItemVectorWithTransformRHS
)) | ((?<!
@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']* # ItemVectorWithTransformLHS
@)
%\(\s* (?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)? (?<NAME>[A-Za-z_][A-Za-z_0-9\-]*) \s*\) # ItemMetadataSpecification
(?!
[^']*'(\s*,\s*'[^']*')?\s*\) # ItemVectorWithTransformRHS
))
", RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace)] TBH I'm not sure that's clearer. I don't think I would bother. It would be better to just memorialize the existing fragments in comments, rather than in the regex. Note one other wrinkle - a number of these files compile for older versions than .NET 7. So it is necessary to wrap these in #if, so something like internal static readonly Lazy<Regex> ItemMetadataPattern = new Lazy<Regex>(
() => new Regex(ItemMetadataSpecification,
RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.Compiled)); becomes internal static readonly Lazy<Regex> ItemMetadataPattern = new Lazy<Regex>(() =>
#if NET7_0_OR_GREATER
GetItemMetadataPatternRegex());
#else
new Regex(ItemMetadataSpecification,
RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.Compiled));
#endif
#if NET7_0_OR_GREATER
[GeneratedRegex("%\\(\\s* (?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)? (?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*) \\s*\\)", RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace)]
private static partial Regex GetItemMetadataPatternRegex();
#endif These #if's aren't pretty, but it's worth it to eliminate all this JIT work at runtime when running on .NET 7+. I note that most of the Regexes in the code are constructed lazily, since #1199 to delay generating the IL. I don't think that's necessary for the source generated ones, as constructing them is cheap. However, it's probably simplest to still put them inside the lazy construction, as I've done above, so that all the places that use them can remain unchanged. |
I also hit a couple of issues using the fixer Neither stop it working, just make a little more manual tuning necessary. @nikijohnson77 does this make sense? LMK if you need more help. |
Doesn't the regex source generator support references to named string constants in the attribute argument? |
I feel like the regex string can be further simplified and made smaller (by defining common parts in each as const and then use string interpolation). So then even in the .NET 7's attribute code, it would use the constant used for the input regex string. |
If that works - great. |
the issues I linked above are now mostly fixed. Is anyone (@AraHaan @KalleOlaviNiemitalo?) interested in doing this? might start with just enabling one or two to see how it goes. |
I been busy lately but I could see what could be done on this when I have more time. |
Ty |
Done |
.NET 7 introduces a regular expression source generator that can generate a bunch of code at compile time to make a more optimal matcher.
Since MSBuild uses many regular expressions and knows a bunch of them at compile time, we should adopt this.
Possible locations:
CanonicalError
ConditionEvaluator
SolutionFile.cs
(non-exhaustive)
The text was updated successfully, but these errors were encountered: