Skip to content

Fix actual master and add -race in CI #907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion driver/driver.go
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ func (c *conn) CheckNamedValue(nv *sqldriver.NamedValue) error {
} else {
// we've found an error, if the error is driver.ErrSkip then
// keep looking otherwise return the unknown error
if !goErrors.Is(sqldriver.ErrSkip, err) {
if !goErrors.Is(err, sqldriver.ErrSkip) {
return err
}
}
Expand Down
16 changes: 9 additions & 7 deletions driver/driver_options_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,17 @@ import (
"reflect"
"strconv"
"strings"
"sync/atomic"
"testing"
"time"

"github.com/go-mysql-org/go-mysql/client"
"github.com/go-mysql-org/go-mysql/mysql"
"github.com/go-mysql-org/go-mysql/server"
"github.com/pingcap/errors"
"github.com/siddontang/go/log"
"github.com/stretchr/testify/require"

"github.com/go-mysql-org/go-mysql/client"
"github.com/go-mysql-org/go-mysql/mysql"
"github.com/go-mysql-org/go-mysql/server"
)

var _ server.Handler = &mockHandler{}
Expand All @@ -32,7 +34,7 @@ type testServer struct {

type mockHandler struct {
// the number of times a query executed
queryCount int
Copy link
Collaborator

@lance6716 lance6716 Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we have this race. (Although I can reproduce it with -race test in my local environment.)

In the unit test (I choose TestDriverOptions_SetRetriesOff), the happened-before relation should be conn.QueryContentstarts -> mockHandler.handleQuery modifies h.queryCount -> conn.QueryContent ends -> TestDriverOptions_SetRetriesOff reads srv.handler.queryCount. I need more time to check it 😢

Because even if we use atomic variable to avoid race on this variable, the race of testing logic will let the check fail if atomic.Load happens before atomic.Add


==================
WARNING: DATA RACE
Read at 0x00c0000147e8 by goroutine 19:
  github.com/go-mysql-org/go-mysql/driver.TestDriverOptions_SetRetriesOff()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:78 +0x26d
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/lib/go/src/testing/testing.go:1742 +0x44

Previous write at 0x00c0000147e8 by goroutine 23:
  github.com/go-mysql-org/go-mysql/driver.(*mockHandler).handleQuery()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:313 +0xb2
  github.com/go-mysql-org/go-mysql/driver.(*mockHandler).HandleQuery()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:370 +0x44
  github.com/go-mysql-org/go-mysql/server.(*Conn).dispatch()
      /home/lance/Projects/go-mysql/server/command.go:78 +0x3e9
  github.com/go-mysql-org/go-mysql/server.(*Conn).HandleCommand()
      /home/lance/Projects/go-mysql/server/command.go:53 +0xd6
  github.com/go-mysql-org/go-mysql/driver.CreateMockServer.func1.1()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:287 +0x96

Goroutine 19 (running) created at:
  testing.(*T).Run()
      /usr/lib/go/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
      /usr/lib/go/src/testing/testing.go:2161 +0x85
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1689 +0x21e
  testing.runTests()
      /usr/lib/go/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
      /usr/lib/go/src/testing/testing.go:2027 +0xf17
  main.main()
      _testmain.go:67 +0x2bd

Goroutine 23 (running) created at:
  github.com/go-mysql-org/go-mysql/driver.CreateMockServer.func1()
      /home/lance/Projects/go-mysql/driver/driver_options_test.go:281 +0x53
==================

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this more closely too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stop the mock server before reading the queryCount:

func TestDriverOptions_SetRetriesOff(t *testing.T) {
	log.SetLevel(log.LevelDebug)
	srv := CreateMockServer(t)
	defer srv.Stop()
...
	rows, err := conn.QueryContext(context.TODO(), "select * from slow;")
...
	require.EqualValues(t, 1, srv.handler.queryCount.Load())
}
func TestDriverOptions_SetRetriesOff(t *testing.T) {
	log.SetLevel(log.LevelDebug)
	srv := CreateMockServer(t)
	defer srv.Stop() // it's ok here to call it twice
...
	rows, err := conn.QueryContext(context.TODO(), "select * from slow;")
	srv.Stop() // stop here before accessing to queryCount
...
	require.EqualValues(t, 1, srv.handler.queryCount.Load())
}

What do you think about it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the original int variable, I see the data race is still reported even if we manually stop the mock server. So it's likely that with atomic variables the Load may still happen before Add

Sorry I'm oncall this week, I think I can start to locate the problem in next week.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have unsynchronized access to this int from different goroutines that are created here:

handler := &mockHandler{}

go func() {
	for {
		conn, _ := l.Accept()
		go func() {
			co, _ := server.NewCustomizedConn(conn, defaultServer, inMemProvider, handler)
			for {
				_ = co.HandleCommand() // common `handler` is reused by different goroutines
			}
		}()
	}
}()

TestDriverOptions_SetRetriesOn sends 3 queries, and we panic on the second.

Copy link
Collaborator

@lance6716 lance6716 Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I have understood now. We have set readTimeout for the tests TestDriverOptions_SetRetriesOn and TestDriverOptions_SetRetriesOff. So when conn.QueryContext returns it's caused by the timeout, rather than receieved the response. It means when conn.QueryContext returns (*mockHandler).handleQuery may not be executed to generate the response.

I was mistakenly thought that there's a happened-before relation of h.queryCount++ -> (*mockHandler).handleQuery finished -> mock server write response -> conn.QueryContext read the response and return -> testing goroutine reads srv.handler.queryCount. But it's not true because the returning of conn.QueryContext is not caused by getting a response. So the write and read of queryCount indeed has no happened-before relation which is a data race.

And I suggest this fix

diff --git a/driver/driver_options_test.go b/driver/driver_options_test.go
index eaa863f..204d2bd 100644
--- a/driver/driver_options_test.go
+++ b/driver/driver_options_test.go
@@ -10,6 +10,7 @@ import (
        "reflect"
        "strconv"
        "strings"
+       "sync"
        "testing"
        "time"
 
@@ -33,12 +34,16 @@ type testServer struct {
 type mockHandler struct {
        // the number of times a query executed
        queryCount int
+       modifier   *sync.WaitGroup
 }
 
 func TestDriverOptions_SetRetriesOn(t *testing.T) {
        log.SetLevel(log.LevelDebug)
        srv := CreateMockServer(t)
        defer srv.Stop()
+       var wg sync.WaitGroup
+       srv.handler.modifier = &wg
+       wg.Add(3)
 
        conn, err := sql.Open("mysql", "[email protected]:3307/test?readTimeout=1s")
        defer func() {
@@ -52,6 +57,7 @@ func TestDriverOptions_SetRetriesOn(t *testing.T) {
        // we want to get a golang database/sql/driver ErrBadConn
        require.ErrorIs(t, err, sqlDriver.ErrBadConn)
 
+       wg.Wait()
        // here we issue assert that even though we only issued 1 query, that the retries
        // remained on and there were 3 calls to the DB.
        require.Equal(t, 3, srv.handler.queryCount)
@@ -61,6 +67,9 @@ func TestDriverOptions_SetRetriesOff(t *testing.T) {
        log.SetLevel(log.LevelDebug)
        srv := CreateMockServer(t)
        defer srv.Stop()
+       var wg sync.WaitGroup
+       srv.handler.modifier = &wg
+       wg.Add(1)
 
        conn, err := sql.Open("mysql", "[email protected]:3307/test?readTimeout=1s&retries=off")
        defer func() {
@@ -73,6 +82,7 @@ func TestDriverOptions_SetRetriesOff(t *testing.T) {
        // we want the native error from this driver implementation
        require.ErrorIs(t, err, mysql.ErrBadConn)
 
+       wg.Wait()
        // here we issue assert that even though we only issued 1 query, that the retries
        // remained on and there were 3 calls to the DB.
        require.Equal(t, 1, srv.handler.queryCount)
@@ -309,6 +319,11 @@ func (h *mockHandler) UseDB(dbName string) error {
 }
 
 func (h *mockHandler) handleQuery(query string, binary bool, args []interface{}) (*mysql.Result, error) {
+       defer func() {
+               if h.modifier != nil {
+                       h.modifier.Done()
+               }
+       }()
        h.queryCount++
        ss := strings.Split(query, " ")
        switch strings.ToLower(ss[0]) {

We add synchronization and still use int. It does not rely on time interval 1 second (readTimeout) or 5 seconds (we inject the sleep in around line 345 of driver_options_test.go). And we can reduce these two values now.

Copy link
Member Author

@atercattus atercattus Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we don't need int, because WaitGroup.Wait() will hang until the required number of requests.

The only case is if there are more retries than necessary (eg. h.queryCount==4 and wg.Add(3)). But here will we get unsynchronized access?

Copy link
Collaborator

@lance6716 lance6716 Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can remove the int now. And maybe add -race to ci.yml to find other unsynchronized access and fail that PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll update this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, if you are busy (not response in a few days) I can update this PR.

queryCount atomic.Int32
}

func TestDriverOptions_SetRetriesOn(t *testing.T) {
Expand All @@ -54,7 +56,7 @@ func TestDriverOptions_SetRetriesOn(t *testing.T) {

// here we issue assert that even though we only issued 1 query, that the retries
// remained on and there were 3 calls to the DB.
require.Equal(t, 3, srv.handler.queryCount)
require.EqualValues(t, 3, srv.handler.queryCount.Load())
}

func TestDriverOptions_SetRetriesOff(t *testing.T) {
Expand All @@ -75,7 +77,7 @@ func TestDriverOptions_SetRetriesOff(t *testing.T) {

// here we issue assert that even though we only issued 1 query, that the retries
// remained on and there were 3 calls to the DB.
require.Equal(t, 1, srv.handler.queryCount)
require.EqualValues(t, 1, srv.handler.queryCount.Load())
}

func TestDriverOptions_SetCollation(t *testing.T) {
Expand Down Expand Up @@ -309,7 +311,7 @@ func (h *mockHandler) UseDB(dbName string) error {
}

func (h *mockHandler) handleQuery(query string, binary bool, args []interface{}) (*mysql.Result, error) {
h.queryCount++
h.queryCount.Add(1)
ss := strings.Split(query, " ")
switch strings.ToLower(ss[0]) {
case "select":
Expand Down
3 changes: 1 addition & 2 deletions mysql/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -197,11 +197,10 @@ func PutLengthEncodedInt(n uint64) []byte {
case n <= 0xffffff:
return []byte{0xfd, byte(n), byte(n >> 8), byte(n >> 16)}

case n <= 0xffffffffffffffff:
default: // n <= 0xffffffffffffffff:
return []byte{0xfe, byte(n), byte(n >> 8), byte(n >> 16), byte(n >> 24),
byte(n >> 32), byte(n >> 40), byte(n >> 48), byte(n >> 56)}
}
return nil
}

// LengthEncodedString returns the string read as a bytes slice, whether the value is NULL,
Expand Down