Skip to content

A problem for editing Non-ASCII characters #282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tkfm-yamaguchi opened this issue Feb 7, 2019 · 6 comments
Closed

A problem for editing Non-ASCII characters #282

tkfm-yamaguchi opened this issue Feb 7, 2019 · 6 comments

Comments

@tkfm-yamaguchi
Copy link

Hi,

When I edit a file which includes Non-ASCII characters as the comments, string literals, ..etc., vim-lsp goes wrong.

For example:

export enum E {
  S = '𐐀',
}

When I remove 𐐀 (assigning a empty string to E.S) in the code above,

  • Unterminated literal string error is shown on that line (the line S is defined).
  • The completion for E does not work.

After some investigation, what I've got is:

  • The number of character in the parameter to the server should be counted on a UTF-16 string representation (LSP SPEC).
  • strlen() is used in vim-lsp for the calculation.

=> vim-lsp sends the wrong character parameter to the server when the non-ascii characters (whose code point is more than 0x10000) are edited.

A solution for this problem is, I think, replacing strlen() with the function like below:

function! s:count_utf16_code_units(str) abort
  let l:len = strchars(a:str)
  let l:i = 0
  let l:cnt = 0

  while l:i < l:len
    let l:chr = strcharpart(a:str, l:i, 1)
    if char2nr(l:chr) > 0x10000
      let l:cnt = l:cnt + 2
    else
      let l:cnt = l:cnt + 1
    endif

    let l:i = l:i + 1
  endwhile

  return l:cnt
endfunction

I'm not good at Vim script, so I hope these information helps fixing the problem.

@tkfm-yamaguchi tkfm-yamaguchi changed the title A problem while editing Non-ASCII characters A problem for editing Non-ASCII characters Feb 7, 2019
mattn added a commit that referenced this issue Feb 7, 2019
@mattn
Copy link
Collaborator

mattn commented Feb 7, 2019

Thanks for your report about this. Could you please try #284 ?

@tkfm-yamaguchi
Copy link
Author

tkfm-yamaguchi commented Feb 7, 2019

@mattn
Thank you for the quick response.

Could you please try #284 ?

I've tried it in my TS project and tiny Python scripts (which occurred the similar errors), and both work well in my env.

I found the benchmarks in PR. Do you think this introduce the undue overhead ?

@mattn
Copy link
Collaborator

mattn commented Feb 7, 2019

Thanks your confirm. I have to make another benchmark case. Current code use small string. It should use long string too.

@tkfm-yamaguchi
Copy link
Author

It should use long string too.

I see, that should be done. If there are anything I can do, let me know.

I wish if this proposal submitted to official LSP was accepted ...

@tkfm-yamaguchi
Copy link
Author

tkfm-yamaguchi commented Feb 8, 2019

Could you please try #284 ?

Sorry, my inspection was insufficient, and it seems to need some more process for the 2 code units characters.

Adding such characters is now OK on count-utf16 branch, but removing is still not.

I compare the ts-server logs, for the '𐐀' removal on the example of this issue, which comes from vim(-lsp) and VisualStudioCode, and found that the position calculation for removing the 2 code units characters should also be aware the size of code units.

I mean, even though text is "", it should be treated as 2 size of characters are changed when the 2 code unit character is removed.

Here is the logs of ts-server:

vim(-lsp):

Info 104  [10:39:18.394] request:
    {"command":"change","seq":8,"type":"request","arguments":{"file":"[masked]","line":2,"offset":9,"endLine":2,"endOffset":10,"insertString":""}}

VSC:

[Trace  - 10:41:09] Sending request: change (218). Response expected: no. Current queue length: 0
Arguments: {
    "insertString": "",
    "file": "[masked]",
    "line": 2,
    "offset": 9,
    "endLine": 2,
    "endOffset": 11
}

(the values of "endOffset" are different)

I'm not sure, but it seems hard to detect the length of the removed character's code units (from the both point of implementation and efficiency).

@mattn
Copy link
Collaborator

mattn commented Jul 26, 2019

Fixed by #447

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants