Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bypass Format on simple constant interpolated strings #6738

Closed
wants to merge 1 commit into from
Closed

bypass Format on simple constant interpolated strings #6738

wants to merge 1 commit into from

Conversation

bbarry
Copy link

@bbarry bbarry commented Nov 12, 2015

When an interpolated string doesn't have alignment or formatString parameters it is faster to instead use Concat. Further optimizations can be done on calls to Concat with only one parameter.

Fixes #6669

When an interpolated string doesn't have alignment or formatString parameters it is faster to instead use Concat. Further optimizations can be done on calls to Concat with only one parameter.

Fixes #6669
@khellang
Copy link
Member

Will this also take care of #4678?

You might also be interested in #4678 (comment) to remove some boxing, but that can wait for another time 😄

@bbarry
Copy link
Author

bbarry commented Nov 12, 2015

this does not take care of #4678 (comment) (boxing of value types for the purposes of calling .ToString()) but does optimize $"{nameof(argument)}" to be the string literal "argument".

There are many pedantic cases it doesn't take care of either, for example

  • $"{""} {""} {""}" becomes string.Concat("", " ", "", " ", "") when it could be " "
  • $"{a,0}{b,0}" could be string.Concat(a, b) ?? "" (possibly without ?? "")

Further it does not attempt to intercept and look at other calls of string.Concat or string.Format that were not interpolated strings. I think those are worthwhile optimizations to consider looking into but I don't know where to start. I figured in the event that I get busy on other work it would be worth cleaning this patch up in case it gets decided these changes are worth including in spite of not cleaning up poor calls to those methods.

It does handle:

  • $"{a}" when a is a string constant and not null => a
  • $"{a}" when a is the constant null => ""
  • $"{a}" when a is a string variable => a ?? ""
  • $"{a}" when a is one of the common built in primitive value types other than Nullable<> identified by the SpecialType enum => a.ToString()
  • $"{a}" when a is a value type that is not Nullable<> and not one of the previous types => a.ToString() ?? ""
  • any interpolated string without alignment and formatString parameters with at least 1 value such that a?.ToString() is known to not be null => string.Concat(...)
  • any interpolated string without alignment and formatString parameters without that invariant => string.Concat(...) ?? ""

@gafter
Copy link
Member

gafter commented Nov 12, 2015

I can see just from your description that the implementation is not correct. Please see the specification for composite formatting.

For an argument whose runtime type implements the IFormattable interface (as the primitive types do), one does not call ToString directly, but instead there are different cases for numeric types, date and time types, and other types. For numeric types, for example, one must call IFormattable.ToString(string, IFormatProvider). The first parameter will normally be null in the cases of interest (no format string provided) but, according to the specification, for the second argument "the NumberFormatInfo object for the current thread culture is used". You'll have to generate code to fish that out of the current thread culture and call that interface method. In order to do that you may have to force the user to add additional (compile-time) assembly references for the required APIs.

Even for simple strings, it is not correct to simply insert the value into the result. One must first get an ICustomFormatter from the IFormatProvider for the current culture by calling IFormatProvider.GetFormat, and if it is not null call its ICustomFormatter.ToString(string format, IFormatProvider formatProvider) method, with a first argument corresponding to the (presumably null) format string for the interpolated string insertion. Of course all of the usual and common values for the current culture will short-cut this and just return the original string, but you cannot depend on that unless you have some way of knowing what the behavior of the current culture will be.

I believe that composite formatting is complicated enough that it is not worth attempting this category of "optimization". Generating correct code for even the simplest cases is unreasonably complex and will generate a huge amount of code.

@gafter gafter added Resolution-Won't Fix A real bug, but Triage feels that the issue is not impactful enough to spend time on. Area-Compilers labels Nov 12, 2015
@bbarry
Copy link
Author

bbarry commented Nov 12, 2015

@gafter based on the reference source and core clr it looks like I have to process each arg as:

(arg as IFormattable)?.ToString(null, null) ?? arg?.ToString()

unless it is a TimeSpan and FEATURE_LEGACYNETCF is defined and CompatibilitySwitches.IsAppEarlierThanWindowsPhone8 is set (because then it is arg.ToString()). I believe this is doable (I can explicitly fall back to string.Format for TimeSpan to let different runtimes handle that situation).

Is that worth doing?

Edit: in this case I could handle formatString parameters as well:

(arg as IFormattable)?.ToString(sFmt, null) ?? arg?.ToString()

Edit2: technically I could do alignment as well with a helper method emitted in source or some other way to create a reference:

align not provided or 0:

(arg as IFormattable)?.ToString(sFmt, null) ?? arg?.ToString()

align positive:

var s = (arg as IFormattable)?.ToString(sFmt, null) ?? arg?.ToString()
if ((s?.Length ?? 0) >= align) return s;
return new string(' ', align - s.Length) + s;

align negative:

var s = (arg as IFormattable)?.ToString(sFmt, null) ?? arg?.ToString()
if ((s?.Length ?? 0) >= align) return s;
return s + new string(' ', align - s.Length);

@gafter
Copy link
Member

gafter commented Nov 12, 2015

According to the specification you must get an ICustomFormatter from the IFormatProvider for the current culture. I don't know how that gets threaded down into StringBuilder, or if it is a bug that it does not. Examples in the docs I sent a pointer to show how the current culture is supposed to affect the result. I would be careful trying to copy code (and therefore bugs) rather than referring to the specification here.

That code is not necessarily the code that will run on all platforms. Different platforms factor the lookup of the current culture into different layers of the API calls. The compiler must obey the specification to ensure it can interoperate correctly with any factoring of the platform implementation.

@tmat
Copy link
Member

tmat commented Nov 12, 2015

👎 I'm voting for not making the compiler more complex. Who writes $"{a}"? We should rather write analyzer that points out that the code can be simplified and more efficient.

@bbarry
Copy link
Author

bbarry commented Nov 12, 2015

When you call string.Format(string,...), the IFormatProvider instance is null and ICustomFormatter never gets assigned anything other than null.

I'm not saying it is worth doing, even the expressions above would add significant weight to the CIL for any nontrivial input.

Edit: even though this code (StringBuilder.AppendFormat and every call stack back to a string.Format(string instance) is very straightforward and easy to follow, I am not sure those extracted expressions are worth pulling out and embedded in compiled user code. I would feel more comfortable if they were shipped alongside StringBuilder in case they needed to change for some reason. That's my argument for 👎

I could still do something about interpolated strings containing strings and null, but that seems so specific to the point I agree with @tmat.

@bbarry
Copy link
Author

bbarry commented Nov 12, 2015

Perhaps if we really wanted to speed this up, a core library change that created those expressions as aggressively inlined functions and then those functions were used, it might be a decent path. It all sounds messy.

@bbarry
Copy link
Author

bbarry commented Nov 13, 2015

Per the documentation for IFormattable, when the second parameter to ToString is null, it is up to the implementation of that interface to get the correct one for the current thread.

I think a base class library (or something equivalent) set of methods:

public static class FormatHelpers
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static string ToString<T>(T input, string format)
    {
        //handle Mango TimeSpan?
        return (input as IFormattable)?.ToString(format, null) ?? input?.ToString()
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static string ToStringPadLeft<T>(T input, string format, int pad)
    {
        string s = ToString<T>(input, format);
        if ((s?.Length ?? 0) >= pad) return s;
        return new string(' ', pad - s.Length) + s;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static string ToStringPadRight<T>(T input, string format, int pad)
    {
        string s = ToString<T>(input, format);
        if ((s?.Length ?? 0) >= pad) return s;
        return s + new string(' ', pad - s.Length);
    }
}

(perhaps a few others to make inlining work right; I am not well versed on JIT details to make the suggestion)

Then where this is defined, interpolated literals could reduce to a string.Concat call and potential ?? "" if no literals were discovered by calling the correct method on each hole. The result might be slightly more performant than the current implementation, but we're talking about the difference of N instructions per character on the format string that don't need to be done at compile time. I'd venture it might be one or two dozen?

I'm going to pass on trying to get that done.

@bbarry bbarry closed this Nov 13, 2015
@AdamSpeight2008
Copy link
Contributor

@gafter
This method is what ultimately gets called.
@tmat @bbarry
That what the StringFormatDiagnostics is potentially capable of doing.
I'm currently in the progress of updating it to VS2015

@bbarry
Copy link
Author

bbarry commented Nov 13, 2015

@AdamSpeight2008 I'm not particularly concerned about how someone uses interpolated strings, so much as the idea that this very useful feature to make your code easier to follow is currently a bad refactoring decision to make inside a performance critical scenario (places like this one in Kestral).

Interpolated strings even as simple as $"{a}" are useful in cases where the object is converted to FormattableString and then the format specifier gets replaced by a language resource somewhere along the way before the resulting string gets used somewhere. It is a mental load to know you want to use them in this case and yet there is a second case where they may equally make the code cleaner but need to be avoided for reasons not immediately obvious to those of us who occasionally forget about things like specifications for composite string formatting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers cla-already-signed Resolution-Won't Fix A real bug, but Triage feels that the issue is not impactful enough to spend time on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants