Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null termination issues in StaticString #2

Closed
tshort opened this issue Feb 28, 2022 · 8 comments
Closed

Null termination issues in StaticString #2

tshort opened this issue Feb 28, 2022 · 8 comments

Comments

@tshort
Copy link
Collaborator

tshort commented Feb 28, 2022

Here are some notes:

  • getindex doesn't add a null termination as noted in a comment.
  • display is broken without null termination. Maybe this should use length instead of the termination.
  • length should be N-1 and not count the null at the end.
  • Should codeunits include the termination? It doesn't for a String.

Could the null termination be removed? I don't think Julia strings need it.

@brenhinkeller
Copy link
Owner

brenhinkeller commented Feb 28, 2022

Ah, these are good questions. Fixing display would be easy enough and we should definitely do that one way or another.

The big question is whether we should try to be more consistent with including null termination or if we should try to eliminate it altogether.

In principle, dropping it would be great -- because it'll be hard to have getindex return maximally efficient views / substrings if we have to append. In practice, it may be tricky, since even LLVM is in practice expecting strings to be null-terminated (i.e., can't use puts or printf, even via direct llvmcall, without null termination).

We could maybe get around that by only ever using putc / putchar, but that'd be a nontrivial undertaking at least.

And, while Julia strings are officially "not" null-terminated, in practice they actually are every time I've checked, i.e.:

julia> str = "Hello there, I am a string"
"Hello there, I am a string"

julia> Base.unsafe_load(pointer(str), length(str)+1)
0x00

julia> substr = str[1:5]
"Hello"

julia> Base.unsafe_load(pointer(substr), length(substr)+1)
0x00

@brenhinkeller
Copy link
Owner

My initial reasoning for including null-termination in codeunits and in length and etc. is basically just that this is a low-level package where implementation details really matter, so my general instinct is to expose the nitty-gritty details that are often otherwise hidden -- but I could probably be convinced to go the other way on this

@brenhinkeller
Copy link
Owner

Also: hooray, first issue 🎉

@brenhinkeller
Copy link
Owner

brenhinkeller commented Feb 28, 2022

One option may be to have getindex on ranges of a StaticString return a StaticSubString or StaticStringView view type which is explicitly not null-terminated and dispatches to methods which are aware of that

@tshort
Copy link
Collaborator Author

tshort commented Mar 1, 2022

Null termination for output is a good consideration. But, we could add the null termination prior to passing to output functions.

I don't think getindex should be a view. That doesn't mesh well with other defaults in Base.

Regarding codeunits and length, I don't think we want the null termination to show up in these. They don't for normal Strings. We also want c"abc" == "abc". That's currently broken.

Also, I see that MallocString has these same issues. Maybe we could do what Base does and allocate the null termination but keep it hidden.

Semi-relevant Julia issues: JuliaLang/julia#10994, JuliaLang/julia#16731, JuliaLang/julia#7036

@brenhinkeller
Copy link
Owner

Yeah, MallocString and StaticString have more or less identical interfaces, just different implementations of backing memory so the former can be on the heap and the latter on the stack.

WIP here: https://github.com/brenhinkeller/StaticTools.jl/tree/refactor-strings

That'll make c"abc" == "abc" which we definitely want, and make length equal to that of base strings, which we probably also want -- I do have some mixed feelings though.

@brenhinkeller
Copy link
Owner

brenhinkeller commented Aug 7, 2022

So just an update on the current status of this:

julia> length(c"asdf") == length(m"asdf") == length("asdf") == 4
true

julia> c"asdf" == m"asdf" == "asdf"
true

julia> s = c"this is a longer string"
c"this is a longer string"

julia> s[9:23]
15-byte StringView:
 "a longer string"

We still differ from Base, however, in including the null termination in codeunits, ncodeunits, and sizeof.

What do people think, should we change those to hide the null termination as well?

@brenhinkeller
Copy link
Owner

Given no objections, I'm going to go ahead and close this as completed, but feel free to restart the discussion if it seems like we should go farther!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants