Skip to content

Replication uses huge amounts of memory (~53GB after ~20 hours running) #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
snacker81 opened this issue May 10, 2020 · 2 comments
Open
Assignees

Comments

@snacker81
Copy link

Hi together,

I am using go-mysql replication feature and after running for roughly 20 hours it already consumes ~53GB of ram.

go tool pprof http://10.10.23.91:6060/debug/pprof/heap:

Fetching profile over HTTP from http://10.10.23.91:6060/debug/pprof/heap
Saved profile in /home/stefan.becker/pprof/pprof.searchd.alloc_objects.alloc_space.inuse_objects.inuse_space.039.pb.gz
File: searchd
Type: inuse_space
Time: May 10, 2020 at 8:04am (CEST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top3
Showing nodes accounting for 72.24GB, 99.40% of 72.67GB total
Dropped 42 nodes (cum <= 0.36GB)
Showing top 3 nodes out of 17
      flat  flat%   sum%        cum   cum%
   53.16GB 73.15% 73.15%    53.18GB 73.18%  github.com/siddontang/go-mysql/packet.(*Conn).ReadPacket
   11.18GB 15.38% 88.53%    11.18GB 15.38%  xxx/memorydb.newBuffer
    7.90GB 10.87% 99.40%     7.90GB 10.87%  github.com/allegro/bigcache/v2/queue.(*BytesQueue).allocateAdditionalMemory
(pprof)

Using "list ReadPacket" in pprof I get the following:

(pprof) list ReadPacket
Total: 73.75GB
ROUTINE ======================== github.com/siddontang/go-mysql/packet.(*Conn).ReadPacket in /home/stefan.becker/go/pkg/mod/github.com/siddontang/[email protected]/packet/conn.go
   54.25GB    54.26GB (flat, cum) 73.57% of Total
         .          .     77:   return c
         .          .     78:}
         .          .     79:
         .          .     80:func (c *Conn) ReadPacket() ([]byte, error) {
         .          .     81:   // Here we use `sync.Pool` to avoid allocate/destroy buffers frequently.
         .     6.05MB     82:   buf := c.bufPool.Get()
         .          .     83:   defer c.bufPool.Return(buf)
         .          .     84:
         .        2MB     85:   if err := c.ReadPacketTo(buf); err != nil {
         .          .     86:           return nil, errors.Trace(err)
         .          .     87:   } else {
   54.25GB    54.25GB     88:           result := append([]byte{}, buf.Bytes()...)
         .          .     89:           return result, nil
         .          .     90:   }
         .          .     91:}
         .          .     92:
         .          .     93:func (c *Conn) ReadPacketTo(w io.Writer) error {

My environment:

Some other friend of mine is running version be37886 and he does not have the same issue. When I compare the 2 versions I see that sync.Pool has been introduced:

https://github.com/siddontang/go-mysql/compare/be3788669bb4..803944a6e4ea#diff-ff50afcdad18dfa11cc6ca7509e863d6R82

To get a feeling about the "replication amount" see a list of my binlogs:

-rw-r----- 1 mysql mysql  101M Mai 10 08:06 mysql-bin.083923
-rw-r----- 1 mysql mysql  109M Mai 10 08:07 mysql-bin.083924
-rw-r----- 1 mysql mysql  101M Mai 10 08:07 mysql-bin.083925
-rw-r----- 1 mysql mysql  101M Mai 10 08:08 mysql-bin.083926
-rw-r----- 1 mysql mysql  101M Mai 10 08:08 mysql-bin.083927
-rw-r----- 1 mysql mysql  101M Mai 10 08:09 mysql-bin.083928
-rw-r----- 1 mysql mysql  101M Mai 10 08:09 mysql-bin.083929
-rw-r----- 1 mysql mysql  101M Mai 10 08:09 mysql-bin.083930
-rw-r----- 1 mysql mysql  109M Mai 10 08:10 mysql-bin.083931
-rw-r----- 1 mysql mysql  101M Mai 10 08:10 mysql-bin.083932
-rw-r----- 1 mysql mysql  107M Mai 10 08:11 mysql-bin.083933
-rw-r----- 1 mysql mysql  102M Mai 10 08:11 mysql-bin.083934
-rw-r----- 1 mysql mysql  101M Mai 10 08:11 mysql-bin.083935
-rw-r----- 1 mysql mysql  109M Mai 10 08:12 mysql-bin.083936

Any idea how to fix this? I know that you use sync.Pool to lower GC pressure of course. A quick fix would probably be to remove sync.Pool but I do not know if this is the right way.

Thanks for your help!

@snacker81
Copy link
Author

snacker81 commented May 21, 2020

Hi together,

I looked deeper into this issue and found the root cause. Because of the following hack all string values from rows (selecting rows from MySQL) are referencing the data slice ("buffer") you are using for reading the packets from the MySQL connection:

https://github.com/siddontang/go/blob/master/hack/hack.go#L10
https://github.com/siddontang/go-mysql/blob/7a62847fcdb5f46399ad0bc58cccca4a0f7fbb0d/replication/row_event.go#L835

Because I put this string in some kind of long term in memory database the data slices ("buffer") are never garbage collected! I proofed in detail by printing the underlying storage pointers from the data slice and from the final string after reading the rows: they are the same!

Right now I fixed the problem by doing a string copy where I copy the underlying storage from the string:

copiedColumnValue = string([]byte(columnValueFromGoMysqlLibrary))

(I didnot find another way)

This one was really nasty but I learned a lot about Gos memory handling :)

From my point of view this should be fixed like the following:

  • Remove the hack and make a real copy (storage) inside your library (you will generate more garbage of course, I do not know how much this will harm overall performance of the library/users)
  • Add an option somewhere: copy or no copy
  • Extend the documantion either way, if we keep it like this or the default option value from above will be no copy this should be highlighted in the documenation

Thanks for your help,
Stefan

@atercattus atercattus self-assigned this Jul 17, 2020
@meshenka
Copy link

here is my pprof of the same issue

Showing nodes accounting for 344.78MB, 99.12% of 347.84MB total
Dropped 23 nodes (cum <= 1.74MB)
      flat  flat%   sum%        cum   cum%
  155.18MB 44.61% 44.61%   234.68MB 67.47%  github.com/siddontang/go-mysql/replication.(*RowsEvent).decodeRows
   71.55MB 20.57% 65.18%    71.55MB 20.57%  github.com/siddontang/go-mysql/packet.(*Conn).ReadPacketReuseMem
   50.50MB 14.52% 79.70%    79.50MB 22.86%  github.com/siddontang/go-mysql/replication.(*RowsEvent).decodeValue
   13.53MB  3.89% 83.59%   248.21MB 71.36%  github.com/siddontang/go-mysql/replication.(*RowsEvent).Decode
      13MB  3.74% 87.33%       13MB  3.74%  time.Time.Format
       9MB  2.59% 89.91%       22MB  6.32%  github.com/siddontang/go-mysql/replication.(*RowsEvent).parseFracTime
    8.51MB  2.45% 92.36%     8.51MB  2.45%  sync.(*Pool).pinSlow

hope it helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants