-
Notifications
You must be signed in to change notification settings - Fork 18k
x/text/unicode/bidi: Perhaps incorrect implementation of algorithm #71809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CC @mpvl |
This is my go.mod file
|
The following should fix the problem: a) in // A Direction indicates the overall flow of text.
type Direction int
const (
// Neutral means that text contains no left-to-right and right-to-left
// characters and that no default direction has been set.
Neutral Direction = iota
// LeftToRight indicates the text contains no right-to-left characters and
// that either there are some left-to-right characters or the option
// DefaultDirection(LeftToRight) was passed.
LeftToRight
// RightToLeft indicates the text contains no left-to-right characters and
// that either there are some right-to-left characters or the option
// DefaultDirection(RightToLeft) was passed.
RightToLeft
// Mixed indicates text contains both left-to-right and right-to-left
// characters.
Mixed
) (make sure that the neutral direction is 0, the zero value of the type Direction) b) in bidi.go, change the function Order: func (p *Paragraph) Order() (Ordering, error) {
if len(p.types) == 0 {
return Ordering{}, nil
}
for _, fn := range p.opts {
fn(&p.options)
}
lvl := level(-1)
switch p.options.defaultDirection {
case RightToLeft:
lvl = 1
case LeftToRight:
lvl = 0
}
para, err := newParagraph(p.types, p.pairTypes, p.pairValues, lvl)
if err != nil {
return Ordering{}, err
}
levels := para.getLevels([]int{len(p.types)})
p.o = calculateOrdering(levels, p.runes)
return p.o, nil
} The change: switch p.options.defaultDirection {
case RightToLeft:
lvl = 1
case LeftToRight:
lvl = 0
} so the variable lvl is set to -1 in the uninitialised state, 0 in case the paragraph level is set to left to right, 1 otherwise. All tests pass. A new test could be: func TestN2(t *testing.T) {
str := `ع a`
p := Paragraph{}
p.SetString(str, DefaultDirection(LeftToRight))
order, err := p.Order()
if err != nil {
log.Fatal(err)
}
expectedRuns := []runInformation{
{"ع", RightToLeft, 0, 0},
{" a", LeftToRight, 1, 2},
}
if nr, want := order.NumRuns(), len(expectedRuns); nr != want {
t.Errorf("order.NumRuns() = %d; want %d", nr, want)
}
for i, want := range expectedRuns {
r := order.Run(i)
if got := r.String(); got != want.str {
t.Errorf("Run(%d) = %q; want %q", i, got, want.str)
}
if s, e := r.Pos(); s != want.start || e != want.end {
t.Errorf("Run(%d).start = %d, .end = %d; want start = %d, end = %d", i, s, e, want.start, want.end)
}
if d := r.Direction(); d != want.dir {
t.Errorf("Run(%d).Direction = %d; want %d", i, d, want.dir)
}
}
} The complete diff |
This might break library compatibility, so there should be another solution. I will think of it. edit: The current implementation does not make much sense. Having left to right as a default rather than "unset" makes it difficult to distinguish an explicit setting which leads to a different outcome. IMO this should be changed. The Readme in bidi package https://pkg.go.dev/golang.org/x/[email protected]/unicode/bidi reads: NOTE: UNDER CONSTRUCTION. This API may change in backwards incompatible ways and without notice. I'd like to have advice on how to fix this problem. |
Go version
go version go1.24.0 darwin/arm64
Output of
go env
in your module/workspace:What did you do?
What did you see happen?
What did you expect to see?
Notice the space now belongs to the second output.
See https://util.unicode.org/UnicodeJsps/bidi.jsp?a=ع+a&p=LTR and https://util.unicode.org/UnicodeJsps/bidic.jsp?s=ع+a&b=0&u=140&d=2
The output of these pages are (the first page)
Memory Position 0 1 2
Character ع a
Bidi Class AL WS L
Rules Applied W3→R
N2→L
Resulting Level
L1 L0 L0
The resulting levels are 1, 0 and 0, so the first run contains the Arabic letter, the second run contains both the space character and the letter a.
The text was updated successfully, but these errors were encountered: