-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Correctly map between UTF-8 and UTF-16 positions #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current concerns:
|
Note: the LSP supports three line ending styles: |
Yeah that's a lot of duplicated boilerplate. I would rather have one "file_line_index" type method instead of both |
Yeah, let's make |
crates/ra_editor/src/col_index.rs
Outdated
let mut utf16_chars = Vec::new(); | ||
let mut line = 0; | ||
let mut curr = 0.into(); | ||
for c in text.chars() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this loop could work better as
for line_idx, line in text.lines().enumerate() {
}
That way, we don't need to worry about lines interfering.
crates/ra_editor/src/col_index.rs
Outdated
ColIndex { utf16_lines } | ||
} | ||
|
||
pub fn utf8_to_utf16_col(&self, mut line_col: LineCol) -> LineCol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this LineCol
-> LineCol
API seems a bit error prone, because it uses the same type for different units of measure. This can lead to errors: just yesterday my bank showed me the amount of money on my account in rubles, while using euro as a currency sign :D
I think a lower-level API might be safer:
pub fn col_as_utf16(&self, line_col: LineCol) -> usize {...}
Note that, by definition, TextUnit
is always a utf_8
length, so using it for utf-16 is not correct.
crates/ra_editor/src/col_index.rs
Outdated
assert!(col_index.utf16_to_utf8_col(line_col) == line_col); | ||
|
||
// UTF-16 to UTF-8 | ||
assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's assert_eq!
macro
bors r+ |
227: Correctly map between UTF-8 and UTF-16 positions r=aochagavia a=aochagavia Fixes #202 Co-authored-by: Adolfo Ochagavía <[email protected]>
bors r- |
Canceled |
I think it's important to mark somehow that |
Otherwise, LGTM! 👍 |
@matklad In the values of the |
@aochagavia I was thinking about |
Just pushed a commit to update |
bors r+ |
227: Correctly map between UTF-8 and UTF-16 positions r=aochagavia a=aochagavia Fixes #202 Co-authored-by: Adolfo Ochagavía <[email protected]> Co-authored-by: Adolfo Ochagavía <[email protected]>
Canceled |
bors r+ |
227: Correctly map between UTF-8 and UTF-16 positions r=aochagavia a=aochagavia Fixes #202 Co-authored-by: Adolfo Ochagavía <[email protected]> Co-authored-by: Adolfo Ochagavía <[email protected]>
Build succeeded |
Fixes #202