Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String interpolation can be optimized #22594

Closed
KrisVandermotten opened this issue Oct 9, 2017 · 20 comments
Closed

String interpolation can be optimized #22594

KrisVandermotten opened this issue Oct 9, 2017 · 20 comments
Labels
Area-Compilers Feature Request Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented
Milestone

Comments

@KrisVandermotten
Copy link
Contributor

KrisVandermotten commented Oct 9, 2017

String interpolation has proven to be a very successful feature. Some people even replace efficient concatenation by interpolation, assuming the new syntax is just as efficient. However, that is not the case.

The LDM of May 21, 2014 noted:

The compiler would be free to not call String.Format, if it knows how to do things more optimally. This would typically be the case when there are no format specifiers in the string.

So far, nothing has been done to optimize interpolated strings in this way.

I believe there are five scenarios:

  1. The interpolated string has no fill-ins. This case is handled already.
  2. None of the fill-ins have alignment or format specifiers, and they are all strings.
  3. None of the fill-ins have alignment or format specifiers, and they are all of reference type, some of which are not string (and may or may not implement IFormattable).
  4. None of the fill-ins have alignment or format specifiers, and some of them are value types.
  5. Some of the fill-ins have alignment or format specifiers.

Case 2 is surprisingly common. Last time I counted, Roslyn.sln had 1355 cases. It is also the easiest one to optimize. Basically, the call to string.Format can be replaced with string concatenation. That string concatenation is itself then lowered, often to a call to string.Concat. That lowering will perform constant folding, so in some cases no call is needed at all. Performance differences can be significant.

I have prepared pull request #22595 to handle this case.

@AdamSpeight2008
Copy link
Contributor

Alignment can optimised too, if expression's type is string.

 $"{HelloWorldString,-20}"
 /* can transformed into */
HelloWorldString.PadLeft(20," "c) 

and

 $"{HelloWorldString,20}"
 /* can transformed into */
HelloWorldString.PadRight(20," "c) 

@KrisVandermotten
Copy link
Contributor Author

@AdamSpeight2008 I know, but that will significantly increase the size of the generated code. Also, in my experience, this happens far less frequently in real world code.

More importantly, I think that is a different case. I wanted to tackle this case first. Alignment can be handled later if deemed useful.

@khellang
Copy link
Member

khellang commented Oct 9, 2017

This sounds like a variation of all the other (closed) issues, like #9212 (see the web of linked issues as well)

@KrisVandermotten
Copy link
Contributor Author

@khellang It is a variation on #9212, but the approach taken is fundamentally different. #9212 was about replacing the lowering to string.Format completely. This is about leaving the existing lowering for all but one specific (but very common) situation, that can be easily optimized. Also, this suggestion does not require the presence of any method, except for string.Concat which is already required by the compiler anyway.

@khellang
Copy link
Member

khellang commented Oct 9, 2017

So #6669? Hence why I said "see the web of linked issues as well" 😉

@KrisVandermotten
Copy link
Contributor Author

The key difference between this suggestion and similar suggestions in the past (including #6669), is that this modifies only the case where all fill-ins are strings. Those do not have the complexity of culture-sensitive formatting.

@HaloFour
Copy link

HaloFour commented Oct 9, 2017

My only real concern with such an optimization is that it encodes assumptions regarding the implementation of String.Format into the compiler, namely that string.Format("foo{0}baz", "bar") will always be functionally equivalent to string.Concat("foo", "bar", "baz"). Who's to say that some future version of the framework won't make System.String implement IFormattable and offer support for formatting based on the current culture just to satisfy the needs of some Fortune 100 with a penchant for Pig Latin?

Beyond that I think it's a good idea, though.

Or would this also fall under the post-compilation optimization step that has been suggested for inlining LINQ queries?

@KrisVandermotten
Copy link
Contributor Author

@HaloFour

My only real concern with such an optimization is that it encodes assumptions regarding the implementation of String.Format into the compiler, namely that string.Format("foo{0}baz", "bar") will always be functionally equivalent to string.Concat("foo", "bar", "baz").

To be precise, the way I implemented my PR, it encodes the assumption that string.Format("foo{0}baz", "bar") will always be functionally equivalent to "foo" + "bar" + "baz".

The assumption that "foo" + s + "baz" (where s is a non-constant string) is equivalent to string.Concat("foo", s, "baz") was already encoded in the compiler, as well as the assumptions that "foo" + ("bar" + "baz") is equivalent to "foo" + "bar" + "baz" and that "foo" + "bar" + "baz" is equivalent to "foobarbaz". The same is true for the assumption that "ba" + 'r' is equivalent to "ba" + "r", and others.

More broadly speaking, every constant folding optimization depends on the assumption that a calculation done at compile time is equivalent to doing the same (or an equivalent) calculation at runtime. That also applies to the assumption that 1 + 1 is equivalent to 2.

Who's to say that some future version of the framework won't [...] offer support for formatting based on the current culture just to satisfy the needs of some Fortune 100 with a penchant for Pig Latin?

If any of the above assumptions are broken, it would be a massive breaking change, impacting virtually all programs running on .NET. That is very unlikely to happen, to say the least. And when it does, the compiler is in trouble anyway.

Beyond that I think it's a good idea, though.

I agree :-)

Or would this also fall under the post-compilation optimization step that has been suggested for inlining LINQ queries?

No. The way I implemented this in #22595 is in the lowering of interpolated strings, similarly to how the optimizations of string concatenation are implemented.

@gafter
Copy link
Member

gafter commented May 4, 2018

I took a deep dive into the question whether this optimization is correct a couple of years ago. It looks to me like the specification for string formatting allows for the possibility of a custom culture that provides its own custom formatting for a string value when used as an interpolation. Based on that I concluded that it would not be correct to do this optimization. See #9212 (comment)

However. I tried constructing such a custom culture and found that the implementation never would call it for a value of type string. Technically, this is a bug in the implementation of string formatting, but it is probably something that will never change. If anything, I would expect the spec (or my understanding of it) to be changed. Therefore the optimization is probably OK.

This was previously considered and rejected in #9212, #6669, #4678, #194, and #76. But now @KrisVandermotten has used up his accumulated Karma and forced us to reconsider. Karma reset to zero. 😉

@HaloFour
Copy link

HaloFour commented May 4, 2018

@gafter

Might that open the door to potentially using interpolation for string constants as long as the arguments are also string constants?

@gafter
Copy link
Member

gafter commented May 4, 2018

@HaloFour Seems plausible but possibly complex (the formatting would have to be specified in the language spec, and the constant is still convertible to FormattableString with the fill-ins separately preserved). I suggest opening a new issue for it. I'm not sure what would motivate us to want to do that (over other things we could do with our time).

@HaloFour
Copy link

HaloFour commented May 4, 2018

@gafter

dotnet/csharplang#384

There are a couple of issues on this repo too although they seem to focus on the performance aspect rather than possibly opening up interpolation up to other scenarios, such as use with attributes. Granted, interpolation in this limited form isn't appreciably different from just using + to concat. The argument has always revolved around the aspect of interpolation being an expression that must be evaluated at runtime in the context of the current culture.

@alrz
Copy link
Member

alrz commented May 4, 2018

@gafter

I'm not sure what would motivate us to want to do that (over other things we could do with our time).

I wish you could throw these "nice to have" features to X.X milestone to get feedback and prioritize, at least it shows that the proposal is accepted in principle. It seems like the discussions are more constructive and helpful if the issue has the word "champion" in the title. The fact that there's an official thread for a potential language feature helps to keep the discussion in one place instead of multiple issues scattered all over the repo.

@KrisVandermotten
Copy link
Contributor Author

KrisVandermotten commented May 4, 2018

@gafter

However. I tried constructing such a custom culture and found that the implementation never would call it for a value of type string. Technically, this is a bug in the implementation of string formatting, but it is probably something that will never change.

Indeed. The way I understand it is that the ToString methods on String simply don't allow for culture dependent formatting:

    public override String ToString() {
        Contract.Ensures(Contract.Result<String>() != null);
        Contract.EndContractBlock();
        return this;
    }

    public String ToString(IFormatProvider provider) {
        Contract.Ensures(Contract.Result<String>() != null);
        Contract.EndContractBlock();
        return this;
    }

That is what makes this optimization safe. And honestly, I cannot imagine this to ever change.

Furthermore, as I said before, an optimization with a similar assumption already exists in string concatenation today: the assumption that s + 'a' == s + "a".

See LocalRewriter_StringConcat.cs line 370:

    // Is the expression a literal char?  If so, we can 
    // simply make it a literal string instead and avoid any  
    // allocations for converting the char to a string at run time. 
    if (operand.Kind == BoundKind.Literal) 
    { 
        ConstantValue cv = ((BoundLiteral)operand).ConstantValue; 
        if (cv != null && cv.SpecialType == SpecialType.System_Char) 
        { 
            return _factory.StringLiteral(cv.CharValue.ToString()); 
        } 
    } 

A similar reasoning applies: for both char and string, the result of .ToString() at compile time or at runtime is the same.

@CyrusNajmabadi
Copy link
Member

That is what makes this optimization safe. And honestly, I cannot imagine this to ever change.

Perhaps if we can get the Core people to guarantee that (maybe with a doc change), then this would be more palatable? It seems like something they'd be willing to say and doc.

@KrisVandermotten
Copy link
Contributor Author

@CyrusNajmabadi

Not sure what kind of guarantee you're looking for. The behavior is documented:

Returns this instance of String; no actual conversion is performed.

@CyrusNajmabadi
Copy link
Member

I would consider that a good guarantee :)

@jcouv
Copy link
Member

jcouv commented Jun 1, 2018

@KrisVandermotten Should this issue remain open for further optimizations? Or can it be closed (fixed by #26612)?

@KrisVandermotten
Copy link
Contributor Author

@jcouv Further optimizations are certainly not impossible, but what was described here has been done. New work, if any, can be done through new issues and/or pull requests.

@jcouv jcouv added the Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented label Jun 1, 2018
@jcouv jcouv modified the milestones: Unknown, 15.8 Jun 1, 2018
@jcouv
Copy link
Member

jcouv commented Jun 1, 2018

Thanks again.
The fix will be in 15.8 preview 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Feature Request Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented
Projects
None yet
Development

No branches or pull requests

9 participants