Skip to content

x/text/unicode/bidi: Perhaps incorrect implementation of algorithm #71809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pgundlach opened this issue Feb 18, 2025 · 4 comments
Open

x/text/unicode/bidi: Perhaps incorrect implementation of algorithm #71809

pgundlach opened this issue Feb 18, 2025 · 4 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@pgundlach
Copy link
Contributor

Go version

go version go1.24.0 darwin/arm64

Output of go env in your module/workspace:

AR='ar'
CC='cc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='c++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='arm64'
GOARM64='v8.0'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/var/folders/md/l2nnr5490tq114003qtxfnk40000gn/T//gocache'
GOCACHEPROG=''
GODEBUG=''
GOENV='/Users/patrick/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/md/l2nnr5490tq114003qtxfnk40000gn/T/go-build220397128=/tmp/go-build -gno-record-gcc-switches -fno-common'
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMOD='/Users/patrick/prog/go/segmentize/go.mod'
GOMODCACHE='/Users/patrick/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/patrick/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.24.0/libexec'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/patrick/Library/Application Support/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.24.0/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.24.0'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

package main

import (
	"fmt"
	"log"

	"golang.org/x/text/unicode/bidi"
)

func dothings() error {
	text := "ع a"
	p := bidi.Paragraph{}
	_, err := p.SetString(text, bidi.DefaultDirection(bidi.LeftToRight))
	if err != nil {
		return err
	}

	order, err := p.Order()
	if err != nil {
		return err
	}
	for v := range order.NumRuns() {
		thisrun := order.Run(v)
		fmt.Println(thisrun.Direction())
		fmt.Printf("~~> thisrun.String() %#v\n", thisrun.String())
	}
	return nil
}

func main() {
	if err := dothings(); err != nil {
		log.Fatal(err)
	}
}

What did you see happen?

1
~~> thisrun.String() "ع "
0
~~> thisrun.String() "a"

What did you expect to see?

1
~~> thisrun.String() "ع"
0
~~> thisrun.String() " a"

Notice the space now belongs to the second output.

See https://util.unicode.org/UnicodeJsps/bidi.jsp?a=ع+a&p=LTR and https://util.unicode.org/UnicodeJsps/bidic.jsp?s=ع+a&b=0&u=140&d=2

The output of these pages are (the first page)

Memory Position 0 1 2
Character ‎ع‎ a
Bidi Class AL WS L
Rules Applied W3R
N2L
Resulting Level
L1 L0 L0

The resulting levels are 1, 0 and 0, so the first run contains the Arabic letter, the second run contains both the space character and the letter a.

@gopherbot gopherbot added this to the Unreleased milestone Feb 18, 2025
@ianlancetaylor
Copy link
Member

CC @mpvl

@pgundlach
Copy link
Contributor Author

This is my go.mod file

module segm

go 1.24.0

require golang.org/x/text v0.22.0

@pgundlach
Copy link
Contributor Author

The following should fix the problem:

a) in bidi.go

// A Direction indicates the overall flow of text.
type Direction int

const (
	// Neutral means that text contains no left-to-right and right-to-left
	// characters and that no default direction has been set.
	Neutral Direction = iota

	// LeftToRight indicates the text contains no right-to-left characters and
	// that either there are some left-to-right characters or the option
	// DefaultDirection(LeftToRight) was passed.
	LeftToRight

	// RightToLeft indicates the text contains no left-to-right characters and
	// that either there are some right-to-left characters or the option
	// DefaultDirection(RightToLeft) was passed.
	RightToLeft

	// Mixed indicates text contains both left-to-right and right-to-left
	// characters.
	Mixed
)

(make sure that the neutral direction is 0, the zero value of the type Direction)

b) in bidi.go, change the function Order:

func (p *Paragraph) Order() (Ordering, error) {
	if len(p.types) == 0 {
		return Ordering{}, nil
	}

	for _, fn := range p.opts {
		fn(&p.options)
	}
	lvl := level(-1)
	switch p.options.defaultDirection {
	case RightToLeft:
		lvl = 1
	case LeftToRight:
		lvl = 0
	}
	para, err := newParagraph(p.types, p.pairTypes, p.pairValues, lvl)
	if err != nil {
		return Ordering{}, err
	}

	levels := para.getLevels([]int{len(p.types)})

	p.o = calculateOrdering(levels, p.runes)
	return p.o, nil
}

The change:

	switch p.options.defaultDirection {
	case RightToLeft:
		lvl = 1
	case LeftToRight:
		lvl = 0
	}

so the variable lvl is set to -1 in the uninitialised state, 0 in case the paragraph level is set to left to right, 1 otherwise.

All tests pass.

A new test could be:

func TestN2(t *testing.T) {
	str := `ع a`
	p := Paragraph{}
	p.SetString(str, DefaultDirection(LeftToRight))
	order, err := p.Order()
	if err != nil {
		log.Fatal(err)
	}

	expectedRuns := []runInformation{
		{"ع", RightToLeft, 0, 0},
		{" a", LeftToRight, 1, 2},
	}

	if nr, want := order.NumRuns(), len(expectedRuns); nr != want {
		t.Errorf("order.NumRuns() = %d; want %d", nr, want)
	}

	for i, want := range expectedRuns {
		r := order.Run(i)
		if got := r.String(); got != want.str {
			t.Errorf("Run(%d) = %q; want %q", i, got, want.str)
		}
		if s, e := r.Pos(); s != want.start || e != want.end {
			t.Errorf("Run(%d).start = %d, .end = %d; want start = %d, end = %d", i, s, e, want.start, want.end)
		}
		if d := r.Direction(); d != want.dir {
			t.Errorf("Run(%d).Direction = %d; want %d", i, d, want.dir)
		}
	}
}

The complete diff

diff.txt

@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 18, 2025
@pgundlach
Copy link
Contributor Author

pgundlach commented Feb 18, 2025

This might break library compatibility, so there should be another solution. I will think of it.

edit:

The current implementation does not make much sense. Having left to right as a default rather than "unset" makes it difficult to distinguish an explicit setting which leads to a different outcome.

IMO this should be changed. The Readme in bidi package https://pkg.go.dev/golang.org/x/[email protected]/unicode/bidi reads:

NOTE: UNDER CONSTRUCTION. This API may change in backwards incompatible ways and without notice.

I'd like to have advice on how to fix this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants