Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ContentType implementation of IContentType.isAssociatedWith is case insensitive and I find no case sensitive alternative #673

Open
jantje opened this issue Sep 9, 2023 · 9 comments

Comments

@jantje
Copy link

jantje commented Sep 9, 2023

The ContentType implementation of IContentType.isAssociatedWith is case insensitive which means that if your content type contains .c and the presented file is a .C file ContentType.isAssociatedWith will return true.

There are 2 locations where ContentType.isAssociatedWith uses equalsIgnoreCase

return ((!strict && getBasicType(type) == getBasicType(otherType)) || type == otherType) && this.text.equalsIgnoreCase(text);

and

Unfortunately for me .c files are associated with c compilers and .C files are associated with C++ compilers which means CDT can not use ContentType.isAssociatedWith and chose to implemented a private method to provide the functionality.
https://github.com/eclipse-cdt/cdt/blob/7c8bb9f00ef0d5e3ff493bc0c944519a3e476da2/core/org.eclipse.cdt.core/src/org/eclipse/cdt/internal/core/CContentTypes.java#L130

As I need a case sensitive implementation of IContentType.isAssociatedWith I have the option to make the private method of CDT public or have a solution in (I)ContentType.
I'd prefer the second.

Note 1) I did not find any documentation that says the comparison is case insensitive.
Note 2) I think file name comparison is case insensitive as well.
Note 3) I can create a PR but I need some guidance on what the change should look like.

@merks
Copy link
Contributor

merks commented Sep 9, 2023

Why not use .cpp which is pretty standard and works on case insensitive file systems like on Windows?

@jantje
Copy link
Author

jantje commented Sep 9, 2023

Because it is not my call. I write a CDT extension not a cpp project.

@merks
Copy link
Contributor

merks commented Sep 10, 2023

Certainly you can't just make the comparison case-sensitive without breaking everyone relying on that not being the case-sensitive. Given that Window's file system is case-insensitive, relying on case seems like a poor design strategy in the first place. (You have to ask yourself, why has this not been needed/implementation for the past 18 years?)

At best I could image specifying an additional option for whether to be case-sensitive or not, with not being the default...

@laeubi
Copy link
Contributor

laeubi commented Sep 10, 2023

Rely on some characters at the end of a name to get the file type is already bad but distinguish them by their case is really bad I can't imagine why one would choose that by gcc explicitly list that case with special explanation:

https://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Overall-Options.html

file.C
C++ source code which must be preprocessed. Note that in .cxx, the last two letters must both be literally x. Likewise, .C refers to a literal capital C.

I think the only solution would be to add an option if a filetype should be handled case sensitive or not.

Also note that Windows is not unable to store the case, it just by default treat them the same file so you can't create two files that differ in case but of course you can create:

  • plain.c
  • cplussplus.C

in the same directory without a problem!

@jantje
Copy link
Author

jantje commented Sep 10, 2023

You have to ask yourself, why has this not been needed/implementation for the past 18 years?

Seems someone made a pull request 19 years ago
8cee0a0

As I said CDT made a workaround probably more than 10 years ago. I guess they tried to have this changed to.

Given that Window's file system is case-insensitive, relying on case seems like a poor design strategy in the first place.

Windows file system is not case insensitive. If you save a file named AaBB.cC and you do dir you will get AaBB.cC.
In the windows file system in a folder all files must be different case insensitive.

Solutions I see to this problem

  1. Document that IContentType.isAssociatedWith is case in sensitve
  2. add a method IContentType.isAssociatedWithTrict which is case sensitive
  3. Add a flag to IContentType itself to specify it as case sensitive

@merks
Copy link
Contributor

merks commented Sep 10, 2023

On Windows, if I do this in gitbash:

$touch a.c a.C && ls a*

it only shows:

a.c

That's because the file system is case insensitive:

image

Yes, you can have a.c or a.C (case preserving) but you cannot have both a.c and a.C in the file system because they are treated as having the same name.

I haven't looked closely, but can't such a default (to do 3.) be "hidden" in the org.eclipse.core.internal.content.FileSpec implementation?

@jonahgraham

Do you have an opinion about this?

@jantje
Copy link
Author

jantje commented Sep 10, 2023

Microsoft windows is case preserving but not case sensitive seems a correct description.
afbeelding

On Windows, if I do this in gitbash:

$touch a.c a.C && ls a*

it only shows:

a.c

exactly but if you do
rm a.*
$touch a.C a.c && ls a*

it will show:

a.C

@jonahgraham
Copy link
Contributor

TL;DR

AFAICT the bug here is that 8cee0a0 should have added to content type extension point documentation that the patterns are case insensitive.

I would be happy for a pull request that added that documentation.

It would be reasonable to add a feature request to allow defining case-sensitive extensions to content type definitions, but I am not sure who (else) needs it.


The rest of this is a trip down memory lane, but fundamentally feels off-topic the the issue, but on-topic to where the discussion has gone, hence I made it collapsible so that you don't all have to see it.

Many years ago (40+) a decision was made that uppercase extensions in C universe is treated as non-C files that need pre-processing (.C needed pre processing from the new C with classes language to c, uppercase S means preprocessing before being passed as lowercase s to the assembler, similar for fortran (F vs f)). IMHO that was a silly decision, but one that has been around forever. In practice uppercase C is rarely used as using case sensitivity to distinguish between names is silly and the alternatives of using cpp, cc, cxx, etc makes more sense.

This discussion/issue doesn't have to do with Windows really. It has to do with what extensions mean. In many contexts extensions are case insensitive, e.g. .jpg vs. JPG should be treated the same. I don't know for sure if this is what Rafael was doing nearly 20 years ago with content types in 8cee0a0 - but it is an educated guess.

Note the issue that @jantje has raised doesn't affect CDT as he points out - CDT has had proprietary handling of content types since early days of CDT (~17 years ago .C vs .c support was added)

CDT has a more complete and complex type system than is available in Eclipse Platform, allowing users to add arbitrary file patterns as different content types, and allows different projects to have the same file name patterns for different content types (or vice-versa). (FWIW this feature of CDT is very much used in the field - ask if you want more details)

The above mentioned feature request would not be useful for CDT itself (although perhaps it would be for @jantje's extension) as an additional particularity of CDT's handling is that .h (lower case only) can sometimes map to C++ content type and sometimes to C content type. See this code and related methods for the details.

This ycombinator discussion reminded me that .CPP and .cpp are both understood by GCC as C++ files.

@jantje
Copy link
Author

jantje commented Sep 11, 2023

@jonahgraham
Thanks for the trip down memory lane.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants