Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are 32-bit wide-char strings on Linux intended? #4742

Closed
MarijnS95 opened this issue Oct 25, 2022 · 17 comments
Closed

Are 32-bit wide-char strings on Linux intended? #4742

MarijnS95 opened this issue Oct 25, 2022 · 17 comments

Comments

@MarijnS95
Copy link
Contributor

CC @riverar @kennykerr, microsoft/windows-rs#1874
CC @pow2clk for being the resident Linux maintainer 👍

Context

I interface with libdxcompiler.so from Rust, on both Windows and Linux. Since the DXC library provides a cross-platform COM API, it only makes sense for our DXC Rust wrapper crate called hassle-rs to use one of the existing Rust crates providing COM wrapper support. Unfortunately the current one has fallen severely out of date, and so has its "official replacement" from Microsoft that provides much improved bindings. This has been deprecated in favour of Microsofts "new" windows-rs project which provides even more refined bindings, and autogenerates them based on official metadata.

My goal is to upgrade hassle-rs to the windows-rs crate and take advantage of everything it provides:

  • Actively, officially maintained;
    • No longer depending on code that has been marked as deprecated years ago;
  • Autogenerated language bindings, no more redefining COM structures and types which is error-prone;
    • Allowing us to use newer DXC APIs without hand-writing their COM vtables and type representation;
  • Much improved API handling, including COM refcounting / lifetime management;
  • Safer API surface, no more random pointers and casting.

Alas, these inner (Rust) details aren't all that relevant to DXC, but some questions need answering on the DXC side before windows-rs can proceed to become viable for this:

The problem

In order to support these DXC bindings - through windows-rs - on both Windows and Linux (or any non-Windows platform as it stands), windows-rs needs a minimal change to utilize 32-bit wide-chars when compiling for Linux (not to be confused with cross-compiling from Linux for a Windows target). In my eyes it makes total sense to use the platform-native wide-string convention (for e.g. compatibility with the standard library), but the windows-rs maintainers aren't convinced and demand extra confirmation that 32-bit wide-chars are needed on Linux.

Concrete questions / action points

  • Is this 32-bit "fallback" for wchar strings intended on non-Windows platforms?
  • Are there plans to create non-wide variants for the current API functions, e.g. the *W and *A symbol suffixes typically found in win32 APIs (Feature request: non-wchar/wstring interface #3210)?
  • Are there other Microsoft API's available on Linux (perhaps dxcore?) that use wide chars/strings in their API, and would also benefit from windows-rs supporting this? (Or conflict, if they end up using 16-bit wide-chars instead...)

Expected resolution

Either this discussion chain leads to my 32-bit wide-char change for non-Windows being allowed into the windows-rs codebase, if it turns out that this usage is prevalent. Otherwise, hopefully DXC can be extended with ANSI or UTF-8 APIs to complement every wchar-only API.

Related

#3210
#4242

@tim-weis
Copy link

Is this 32-bit "fallback" for wchar strings intended on non-Windows platforms?

I cannot imagine that this was a conscious decision. Indeed, I wouldn't call it a "fallback" as much as an "ABI contract violation".

From a cursory look it appears as though WinAdapter.h provides ABI type aliases for platforms that aren't _WIN32. Amongst those ABI types is LPCWSTR, aliasing const wchar_t*. With wchar_t meaning different things on different platforms we now have COM interfaces identified by the same IID that disagree on their respective binary contracts (yes, plural, that's bad).

This may not appear to be an immediate issue, with Linux COM objects living inside Linux, and Windows COM objects living inside Windows. But then there's DCOM. All of a sudden machine boundaries no longer serve as communication firewalls, and everyone needs to follow the same protocol.

I don't know how much of DCOM is implemented in various OS'. But with a public specification readily available I would guess that RPCs straddling machine boundaries isn't just something that could exist, in theory.

Whether inconsequential in practice or not, DXC's decision to delegate the ABI contract to "whatever the target system likes" is rather questionable. If anything, this would be something that needs fixing, or at least a thorough analysis of its impact.

Though I understand the frustration of not getting a seemingly trivial change pushed through, I would strongly favor confining the damage (the ABI contract violation here) over letting it bleed into the ecosystem as a whole.

@MarijnS95
Copy link
Contributor Author

MarijnS95 commented Oct 27, 2022

@tim-weis thanks for your insights! Sounds like DXC backtracking on its decision to use the platform-native wide char/string convention is most preferable.
However, then I don't know how prevalent UTF-16 helpers on non-Windows platforms are, compared to UTF-32. This may or may not make things harder.

Fwiw I'm not so much interested in getting a trivial change pushed through, as getting my specific use-case to work one way or the other (the frustration more so being caused by this trivial change getting dismissed as "we don't want to support non-Windows platforms" (at least initially) rather than "what if your conventions are incorrect"). Maybe I was biased towards UTF-32 as that's the platforms' widestring convention; COM talks about wide chars/strings after all, not UTF-16 strings but maybe that's just a convention issue that trickled in over time (even though the underlying definition of wchar clearly states "16-bit Unicode character"). I didn't know about/considered DCOM at the time.

Would love to hear about this from the DXC maintainer side!

@jenatali
Copy link
Member

jenatali commented Nov 2, 2022

Hello,

Chiming in with a second point with a Windows COM API that we've made available for Linux which has wide characters: We support D3D12 as a Linux API. The headers for targeting Linux are identical to the headers that are used on Windows: https://github.com/microsoft/DirectX-Headers, with the exception that we've followed DXC's example here with a set of "shim" headers to provide COM-like types, including wide chars, which we define to wchar_t.

This decision was not an accident, it was very much intentional, and was difficult. Fundamentally it comes down to "what do apps do with these strings," and the answer is that often you'll want to pass them into C or C++ functions to manipulate them, such as wcslen or wrapping it in a std::wstring. That doesn't work if the wide strings are 16-bit. On other hand, some of these strings come from / are marshalled to Windows, which speaks 16-bit wide chars, so we have a thunking/conversion layer in our Linux code to deal with that. We chose to bear the brunt of the conversion rather than force it onto any app/component consuming our API.

Regarding DCOM - that is a good point. DirectX has always used a subset, or maybe even a variant of COM that's become known as "nano-COM." Unlike traditional COM which requires server registration and can support out-of-proc RPC execution, nano-COM only supports inproc execution, and uses explicit DLL entrypoints to retrieve interfaces. So when it comes to DirectX headers... DCOM is not a concern. When it comes to windows-rs as a whole providing a definition for what a wide character is... then it's going to depend on what your target is. For Linux-native nano-COM components, wchar_t is the right type, but if your target is RPC to Windows, then yeah you probably want to write 16-bit wide chars over the network.

For what it's worth, when designing DXCore, we were aware of this problem (and others) plaguing cross-platform software and explicitly opted for all strings to be narrow UTF-8.

@pow2clk
Copy link
Member

pow2clk commented Nov 4, 2022

Per the above, while we recognize the complexity of the situation leads to no perfect options, we feel we've made the right decision and we're sticking with it.

@pow2clk pow2clk closed this as completed Nov 4, 2022
@MarijnS95
Copy link
Contributor Author

@pow2clk note that I'm in no way implying that this may have been the wrong choice, but your colleagues over at the Rust teams insist on at least getting a proper rationale from DX(S)C before allowing (me) to make any progress towards a workable solution.

At the same time @jenatali above only rationalized the decision made for DXCore, but not DX(S)C ("Chiming in with a second point with a Windows COM API that we've made available"). Can I interpret your statement meaning "the stance of DX(S)C mirrors @jenatali's for DXCore"?

In that sense closing this issue is still premature; if DXC states this is very much intended, yet the Rust windows-rs folks refuse to accommodate for multiple official Microsoft APIs using 32-bit wchar on Linux, there's always the suggested middle ground of a *A + *W ANSI and widechar API? Or should windows-rs be able to deal with 32- and 16-bit wchar APIs on Linux based on "some" configuration (per-API, described in the metadata)?

@jenatali repeats a valid point of native wchar being more convenient for users. Also, @jenatali and @pow2clk, is it safe to imply that DX(S)C is also nano-COM (since there's no RPC/CoCreateInstance etc)? It seems the internal header in DXC that provides this minimal support is called microcom, though.

@tim-weis @riverar @kennykerr what direction do you suggest we head in next?

@jenatali
Copy link
Member

jenatali commented Nov 4, 2022

Nano-COM/micro-COM would be the same, and yes DXC's API follows this pattern. As I understand it, DXC's use of native wchar_t was not necessarily intentional at the time, but in light of the discoveries made over time by other DirectX efforts, has been rationalized after-the-fact to have been the right choice.

In my opinion, any new APIs should opt for narrow UTF-8 strings only. Whether existing wide-only APIs get narrow wrappers... seems not worth the time or effort. I think for windows-rs the ideal situation would be able to handle both native wchar_t and 16-bit only char16_t versions of "wide." Per-API metadata sounds like a reasonable way to achieve that. Lacking that, using native OS wide strings only seems like the right choice, given that there are viable targets which have chosen that path, and there are not (to my knowledge) targets for the 16-bit string.

@pow2clk
Copy link
Member

pow2clk commented Nov 7, 2022

@MarijnS95 I want to give you what you need here. Tex, Jesse and I discussed this prior to Jesse's comment. The reasons for using native wchar on DXC are identical to those of Dxcore with the addition of being consistent with Dxcore. Do you need me to repeat his arguments so you have a single comment to point to?

I don't know why the (S) is there when you say DX(S)C. Am I missing something?

I am unfamiliar and unacquainted with the Microsoft employees you refer to who were involved in this discussion regarding rust. While I wish we were always consistent, it's a large company with many different engineers. In practice, it differs little from other users of our API. If they want to use it, that's great. If they don't want to use it, that's their decision. The API is how it is for good and valid reasons related to the platforms in question that we've outlined above. We're reluctant to devote time and resources to adding a complicated shim layer to that API to accommodate one user who wishes things were other than they are.

At any rate, I agree this issue is not resolved. I'm reopening the issue, but it doesn't seem like we're going to get anywhere with the current participants. I leave it to you to identify and induce other commenters to have their say in light of the above validation.

@pow2clk pow2clk reopened this Nov 7, 2022
@MarijnS95
Copy link
Contributor Author

MarijnS95 commented Nov 7, 2022

I don't know why the (S) is there when you say DX(S)C. Am I missing something?

Because I am used to call this DXC (and so do you, apparently) yet windows-rs (and other) folks call it DXSC (S for Shader) to not possibly confuse it with a short-hand for DXCore :)

Thanks for reopening. I agree that it doesn't seem like this would move anywhere any time soon, as those who requested me to file this issue remain curiously silent (especially now that a stance has been taken on wchar_t)...

They're not all Microsoft employees, some are just contributing to the project and provided their say. I'd again like to invite @riverar and @kennykerr to comment here, after discussing this with me prior on the windows-rs issue tracker. I understand that it's a large company and that contact across different employees may feel just as "distanced" as regular contributors talking across GitHub; main point was that I didn't want to be the messenger between the two parties, but on the other hand it is my wish for this to be resolved one way or another 😬

I still see two possible ways to resolve my specific issue:

  1. Provide additional DXC (wrapper) API that is purely UTF-8;
  2. Somehow represent 32-bit wide-chars in windows-rs;
    • Am I allowed to submit a change to that effect?
    • Can such a solution be implemented for everything not(windows), or should we drive it from metadata per-API, in case another API (outside DXC/DXCore) shows up that retains 16-bit wide chars/strings on not(windows)?

@kennykerr
Copy link

As I mentioned here and here, the changes needed to support non-Windows targets with windows-rs are just too far out of scope at this time.

@pow2clk
Copy link
Member

pow2clk commented Nov 9, 2022

I've read those comments @kennykerr. As I read them, it sounds like irrespective of UTF-8,16, or 32, you are not interested in pursuing nor facilitating the addition of non-windows support at this time. Is that accurate? If so, I'm not sure this is so much an issue with the compiler interface per se.

@kennykerr
Copy link

That's correct. I don't have any issue with the DirectXShaderCompiler project. As I've stated before, I am merely unwilling to further complicate the windows-rs project with support for non-Windows platforms.

@MarijnS95
Copy link
Contributor Author

In other words, this detour and extra noise on DirectXShaderCompiler requested by you was totally unnecessary and avoidable.

@kennykerr
Copy link

This is what I said:

So if this is something that you need to support the DirectX Shader Compiler, I suggest you reach out to that team to see how they want to support this further. Happy to work with them as needed.

In other words, if the DirectX Shader Compiler team would like to support Rust I'd be happy to assist them. Anyway, sorry if I caused any confusion. If I haven't been clear in the past hopefully this is now cleared up.

@llvm-beanz
Copy link
Collaborator

This has been inactive for a while, but I don't think the DXC team is going to do any work here.

Long-term we do have plans to implement a standard C API for the Clang-based HLSL compiler(see: llvm/llvm-project#63631). That project is a ways off, but is probably the only effort we're going to work on in this space.

We would accept patches to this area to address your immediate needs, but I'm unsure if we have a good path forward here.

@llvm-beanz llvm-beanz closed this as not planned Won't fix, can't repro, duplicate, stale Jul 13, 2023
@llvm-beanz llvm-beanz removed the needs-triage Awaiting triage label Jul 13, 2023
@MarijnS95
Copy link
Contributor Author

but I'm unsure if we have a good path forward here.

It seems that neither DXC nor windows-rs is willing to accommodate for and/or has a good view on supporting the direct need of invoking the DXC library trough the (mini)COM API on Linux/macOS from Rust.

Even if we did extend DXC with ASCII/UTF-8 API's, windows-rs might still struggle in its error handling and formatting paths where the same wchar use and issue is prevalent.

Perhaps the only way forward is to maintain a fork of windows-rs to accommodate this.

but I don't think the DXC team is going to do any work here.

Alternatively, if it turns out windows-rs would work when DXC is fully usable via ASCII/UTF-8 APIs (should be trivial now that the major IDxcCompiler APIs take a list of arguments and not much more), would DXC be open to accept contributions that add said API entrypoints?

@llvm-beanz
Copy link
Collaborator

I want to clarify. It isn't that we're not willing to accommodate your needs. We're just not going to do the work. Our team is too small and has too many other higher priority tasks on our roadmap at the moment including the work to implement HLSL support in Clang.

Alternatively I could have left the issue open for the next 5+ years until we archive the DXC repo, but I don't think that's actually a better outcome for you.

We can't help you with windows-rs, but we would likely accept patches to support your use case in DXC. We're definitely open to reviewing contributions that add UTF-8 entry points, within the constraints that we can't break the existing entry points.

@MarijnS95
Copy link
Contributor Author

@llvm-beanz let me clarify: with and/or I was differentiating between windows-rs not willing to accommodate for this use-case at all, versus DXC being open to it but having no immediate plans/vision for doing so if it is not for an external contributor.

Glad to hear that you're open to such a contribution, I'l keep it in mind when finding some time and after evaluating that there are no limitations elsewhere (inside windows-rs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

6 participants