Skip to content

Not able to convert between byte index and UTF indices #12216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

yegappan
Copy link
Member

@yegappan yegappan commented Apr 1, 2023

The language server protocol supports specifying offsets in text documents using UTF-8 or UTF-16 or UTF-32 code units.
The UTF-16 code unit is the default.

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments

Different language servers have different levels of support for using the different code units. Vim uses the UTF-32
code units for the offsets. This makes it difficult to support different language servers from a Vim LSP plugin.

The following changes are introduced in this PR:

  1. Add the utf16idx() function to return the UTF16 offset in a string given either byte or character offset.
  2. Add the UTF-16 flag to the byteidx(), byteidxcomp() and charidx() functions to accept a UTF-16 offset and return the corresponding byte or character offset.
  3. Add the strutf16len() function to return the length of a string in UTF-16 code points.

@codecov
Copy link

codecov bot commented Apr 1, 2023

Codecov Report

Merging #12216 (5ffc5ba) into master (f39d9e9) will decrease coverage by 0.09%.
The diff coverage is 82.10%.

❗ Current head 5ffc5ba differs from pull request most recent head 67ea267. Consider uploading reports for the commit 67ea267 to get more accurate results

@@            Coverage Diff             @@
##           master   #12216      +/-   ##
==========================================
- Coverage   82.04%   81.96%   -0.09%     
==========================================
  Files         160      164       +4     
  Lines      193181   194254    +1073     
  Branches    43367    43869     +502     
==========================================
+ Hits       158505   159229     +724     
- Misses      21807    22184     +377     
+ Partials    12869    12841      -28     
Flag Coverage Δ
huge-clang-none 82.68% <80.00%> (-0.04%) ⬇️
huge-gcc-none 53.88% <80.00%> (?)
huge-gcc-testgui 51.97% <80.00%> (?)
huge-gcc-unittests 0.29% <0.00%> (?)
linux 82.40% <80.00%> (-0.32%) ⬇️
mingw-x64-HUGE 76.56% <80.00%> (+0.01%) ⬆️
mingw-x86-HUGE 77.02% <80.00%> (+0.01%) ⬆️
windows 78.15% <80.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/evalfunc.c 90.38% <ø> (+0.09%) ⬆️
src/strings.c 92.26% <82.10%> (-0.55%) ⬇️

... and 121 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@brammool
Copy link
Contributor

brammool commented Apr 12, 2023 via email

@DominiquePelle-TomTom
Copy link
Contributor

DominiquePelle-TomTom commented Apr 12, 2023

This feature looks related to one of my earlier post at https://groups.google.com/g/vim_dev/c/AVpp8DT2_Vc/m/L_p6gzATBQAJ

I will probably find it useful to have this feature for my vim-LanguageTool plugin.

@@ -604,6 +606,7 @@ strptime({format}, {timestring})
strridx({haystack}, {needle} [, {start}])
Number last index of {needle} in {haystack}
strtrans({expr}) String translate string to make it printable
strutfindex({expr} [, {index}]) List byte index to utf-32 and ut-16 indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ut-16? I assume you meant utf-16.

@@ -8975,8 +8978,22 @@ str2nr({string} [, {base} [, {quoted}]]) *str2nr()*

Can also be used as a |method|: >
GetText()->str2nr()
<
strbyteindex({string} [, {index} [, {use_utf16}]) *strbyteindex()*
Convert a UTF-32 or UTF-16 {index} to a byte index. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the doc in the PR uses "UTF-16" and sometimes "utf-16".
Let's be consistent (the capitalized one is better IMO).

@brammool
Copy link
Contributor

brammool commented Apr 12, 2023 via email

@Shane-XB-Qian
Copy link
Contributor

This feature looks related to one of my earlier post at https://groups.google.com/g/vim_dev/c/AVpp8DT2_Vc/m/L_p6gzATBQAJ

this is for LSP impl, the default encoding of lsp server is utf-16, hence some e.g non-utf32 chars symbol maybe located incorrectly at client if no such funcs (e.g from this pr) from vim itself.

@vim-ml
Copy link

vim-ml commented Apr 13, 2023 via email

@yegappan yegappan force-pushed the vimlsp branch 2 times, most recently from ab0ac01 to 51281f4 Compare April 14, 2023 04:52
@vim-ml
Copy link

vim-ml commented Apr 14, 2023 via email

@vim-ml
Copy link

vim-ml commented Apr 14, 2023 via email

@yegappan yegappan force-pushed the vimlsp branch 10 times, most recently from bf8424a to 61cbea7 Compare April 22, 2023 01:40
… the byteidx(), byteidxcomp() and charidx() functions
@brammool brammool closed this in 67672ef Apr 24, 2023
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Apr 26, 2023
Problem:    no functions for converting from/to UTF-16 index.
Solution:   Add UTF-16 flag to existing funtions and add strutf16len() and
            utf16idx(). (Yegappan Lakshmanan, closes vim/vim#12216)

vim/vim@67672ef

Co-authored-by: Christian Brabandt <[email protected]>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Apr 26, 2023
Problem:    no functions for converting from/to UTF-16 index.
Solution:   Add UTF-16 flag to existing funtions and add strutf16len() and
            utf16idx(). (Yegappan Lakshmanan, closes vim/vim#12216)

vim/vim@67672ef

Co-authored-by: Yegappan Lakshmanan <[email protected]>
zeertzjq added a commit to neovim/neovim that referenced this pull request Apr 26, 2023
…23318)

Problem:    no functions for converting from/to UTF-16 index.
Solution:   Add UTF-16 flag to existing funtions and add strutf16len() and
            utf16idx(). (Yegappan Lakshmanan, closes vim/vim#12216)

vim/vim@67672ef

Co-authored-by: Yegappan Lakshmanan <[email protected]>
@vim-ml
Copy link

vim-ml commented May 2, 2023 via email

folke pushed a commit to folke/neovim that referenced this pull request May 22, 2023
…eovim#23318)

Problem:    no functions for converting from/to UTF-16 index.
Solution:   Add UTF-16 flag to existing funtions and add strutf16len() and
            utf16idx(). (Yegappan Lakshmanan, closes vim/vim#12216)

vim/vim@67672ef

Co-authored-by: Yegappan Lakshmanan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants