Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

Character normalization bug #495

Closed
joshbetz opened this issue Jul 21, 2017 · 6 comments
Closed

Character normalization bug #495

joshbetz opened this issue Jul 21, 2017 · 6 comments
Labels

Comments

@joshbetz
Copy link
Contributor

There seems to be a normalization bug in the Status() function. I put together a gist to demonstrate the issue.

git clone https://gist.github.com/joshbetz/fe8d6af45f7801c1f4e96e1cc41b5062
cd fe8d6af45f7801c1f4e96e1cc41b5062
go run main.go

If you run git status after that, you'll see there's a new file with a different name, though it's rendered as the same character as another filename that already exists.

https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization

@mcuadros
Copy link
Contributor

mcuadros commented Jul 21, 2017

git:master@fe8d6af45f7801c1f4e96e1cc41b5062> go run main.go
git:master@fe8d6af45f7801c1f4e96e1cc41b5062> code main.go
git:master@fe8d6af45f7801c1f4e96e1cc41b5062> git status   
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

@joshbetz
Copy link
Contributor Author

joshbetz commented Jul 21, 2017

Interesting. I get:

$ go version
go version go1.8 darwin/amd64
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
$ go run main.go
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   "\341\204\221\341\205\246"

@mcuadros
Copy link
Contributor

wow, this is very weird, maybe is related with macOS?

go version go1.8.3 linux/amd64

@joshbetz
Copy link
Contributor Author

Seems like it might be related. I'm doing some testing.

@mcuadros mcuadros added the bug label Jul 25, 2017
@mcuadros
Copy link
Contributor

@joshbetz
Copy link
Contributor Author

I think we'd need to use something like https://godoc.org/golang.org/x/text/unicode/norm to detect equivalent filenames.

mcarmonaa pushed a commit to mcarmonaa/go-git that referenced this issue Aug 4, 2017
Some multibyte characters can have multiple representations. Before
comparing strings, we need to normalize them. In this case we're
normalizing to normalized form C, but it shouldn't matter as long as
both strings are normalized to the same form.

Fixes src-d#495
traidare pushed a commit to traidare/go-git that referenced this issue Oct 26, 2024
Some multibyte characters can have multiple representations. Before
comparing strings, we need to normalize them. In this case we're
normalizing to normalized form C, but it shouldn't matter as long as
both strings are normalized to the same form.

Fixes src-d/go-git#495
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants