Skip to content

null termination for strings #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
guybedford opened this issue Jan 5, 2023 · 2 comments
Open

null termination for strings #148

guybedford opened this issue Jan 5, 2023 · 2 comments

Comments

@guybedford
Copy link
Collaborator

guybedford commented Jan 5, 2023

Strings in the component model are represented as pointers with a length and are allocated into component memory via the cabi_realloc function provided by the component.

When interfacing with APIs that work with "legacy" null-terminated strings, there is a need to perform an additional realloc to add the null termination to the string pointer, which can result in an unnecessary copy.

If, the component model were to explicitly define strings with null terminators for these kinds of legacy API use cases, that would avoid the need for this extra interop work.

It's come up a couple of times for me already now in practical component model workflows. I think it would be useful, but I don't have a very strong opinion on it. I thought it worth discussing as an option.

@guybedford
Copy link
Collaborator Author

guybedford commented Jan 6, 2023

One issue with this is that null chars are valid characters in USV strings, so that null termination semantics still wouldn't be guaranteed.

Admittedly as well, when I've come across these use cases there has been an alternative approach in using APIs with a length property.

@lukewagner
Copy link
Member

Great question! Because of the reason you mentioned with null being a valid character, it seemed like bindings generators would necessarily need to keep a string length and use string operations that took an explicit length rather than relying on null-termination. Thinking about it some more though, I could imagine cases where you first validate the string (in the process, rejecting null characters) and then want to call a syscall that expects null-termination on those same bytes. Is that what you're seeing or is it something else?

As for how to do this, I could imagine a few options:

  • Add a canonopt specific to strings that asks for the null termination (orthogonal to string-encoding).
  • Allow specifying different realloc functions for different types, so that string realloc could leave room at the end for a null terminator (manually added by the glue code).

The former is the more-targeted fix, but the latter would allow more more flexibility for allocating space before or after or putting different allocations in different arenas. I lean toward the former, but I'd be interested if the latter seemed actually useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants