performance improvement for ReadUvarint() #14

jwinkler2083233 · 2022-03-09T21:18:42Z

This ReadUvarint() method is time-sensitive, because it's running inside a mutex. It's called very often.
I'd like to offer a more performant version, that will improve latencies:

Previous version:

func ReadUvarint(r io.ByteReader) (uint64, error) {
	// Modified from the go standard library. Copyright the Go Authors and
	// released under the BSD License.
	var x uint64
	var s uint
	for i := 0; ; i++ {
		b, err := r.ReadByte()
		if err != nil {
			if err == io.EOF && i != 0 {
				// "eof" will look like a success.
				// If we've read part of a value, this is not a
				// success.
				err = io.ErrUnexpectedEOF
			}
			return 0, err
		}
		if (i == 8 && b >= 0x80) || i >= MaxLenUvarint63 {
			// this is the 9th and last byte we're willing to read, but it
			// signals there's more (1 in MSB).
			// or this is the >= 10th byte, and for some reason we're still here.
			return 0, ErrOverflow
		}
		if b < 0x80 {
			if b == 0 && s > 0 {
				return 0, ErrNotMinimal
			}
			return x | uint64(b)<<s, nil
		}
		x |= uint64(b&0x7f) << s
		s += 7
	}
}

New version:

func ReadUvarint(r io.ByteReader) (uint64, error) {
	// Modified from the go standard library. Copyright the Go Authors and
	// released under the BSD License.
	var x uint64
	var s uint
	for s = 0; ; s+=7 {
		b, err := r.ReadByte()
		if err != nil {
			if err == io.EOF && i != 0 {
				// "eof" will look like a success.
				// If we've read part of a value, this is not a
				// success.
				err = io.ErrUnexpectedEOF
			}
			return 0, err
		}
		if (s == 56 && b >= 0x80) || s >= (7 * MaxLenUvarint63) {
			// this is the 9th and last byte we're willing to read, but it
			// signals there's more (1 in MSB).
			// or this is the >= 10th byte, and for some reason we're still here.
			return 0, ErrOverflow
		}
                 if b < 0x80 {
			if b == 0 && s > 0 {
				return 0, ErrNotMinimal
			}
			return x | uint64(b)<<s, nil
		}
		x |= uint64(b&0x7f) << s
		s += 7
	}
}

The 'i == 8' is replaced with 's == 56'. MaxLenUvarint63 is '9', currently, so that's not an overflow problem.
This change removes the unnecessary addition operation for 'i'.

The text was updated successfully, but these errors were encountered:

jwinkler2083233 · 2022-03-09T22:17:19Z

Since Golang compilers don't allow us to unroll automatically, here's this method even more optimized. Most of the variables are not really necessary:

func ReadUvarint(r io.ByteReader) (uint64, error) {
        // Modified from the go standard library. Copyright the Go Authors and
        // released under the BSD License.
        var x uint64

        // byte index 0  (i = 0)
        b, err := r.ReadByte()
        if err != nil {
                return 0, err
        }
        if b < 0x80 {
                return x | uint64(b), nil
        }
        x |= uint64(b & 0x7f)

        // byte index 1 (i = 1)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        // "eof" will look like a success.
                        // If we've read part of a value, this is not a
                        // success.
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<7, nil
        }
        x |= uint64(b&0x7f) << 7

        // byte index 2 (i = 2, s = 14)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<14, nil
        }
        x |= uint64(b&0x7f) << 14

        // byte index 3 (i = 3, s = 21)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<21, nil
        }
        x |= uint64(b&0x7f) << 21

        // byte index 4 (i = 4, s = 28)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<28, nil
        }
        x |= uint64(b&0x7f) << 28

        // byte index 5 (i = 5, s = 35)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<35, nil
        }
        x |= uint64(b&0x7f) << 35

        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<42, nil
        }
        x |= uint64(b&0x7f) << 42

        // byte index 7 (i = 7, s = 49)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<49, nil
        }
        x |= uint64(b&0x7f) << 49

        // byte index 8 (i = 8, s = 56)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b >= 0x80 {
                // this is the 9th and last byte we're willing to read, but it
                // signals there's more (1 in MSB).
                // or this is the >= 10th byte, and for some reason we're still here.
                return 0, ErrOverflow
        } else {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<56, nil
        }
}

jwinkler2083233 · 2022-03-16T22:48:52Z

Testing shows that the unrolled version cut the time spent inside a mutex by 30-60%

Stebalien · 2022-03-20T20:13:29Z

Your first version is faster (~20%) but the loop unrolling didn't seem to help. In general, loop unrolling only helps with tight loops. In this case, the indirect call to ReadByte is likely removing the benefits from the unrolling.

@jwinkler2083233

This speeds up decoding by about 20%. Suggested by @jwinkler2083233 in Fixes #14.

@jwinkler2083233

This speeds up decoding by about 20%. Suggested by @jwinkler2083233 in Fixes #14.

@jwinkler2083233

This speeds up decoding by about 20%. Suggested by @jwinkler2083233 in Fixes #14.

Stebalien added a commit that referenced this issue Mar 20, 2022

feat: optimize decoding

2281d72

This speeds up decoding by about 20%. Suggested by @jwinkler2083233 in Fixes #14.

Stebalien mentioned this issue Mar 20, 2022

feat: optimize decoding #15

Merged

Stebalien added a commit that referenced this issue Mar 20, 2022

feat: optimize decoding

6a12e0b

This speeds up decoding by about 20%. Suggested by @jwinkler2083233 in Fixes #14.

Stebalien closed this as completed in #15 Nov 23, 2022

Stebalien added a commit that referenced this issue Nov 23, 2022

feat: optimize decoding (#15)

df67645

This speeds up decoding by about 20%. Suggested by @jwinkler2083233 in Fixes #14.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

performance improvement for ReadUvarint() #14

performance improvement for ReadUvarint() #14

jwinkler2083233 commented Mar 9, 2022

jwinkler2083233 commented Mar 9, 2022

Uh oh!

jwinkler2083233 commented Mar 16, 2022

Uh oh!

Stebalien commented Mar 20, 2022

Uh oh!

performance improvement for ReadUvarint() #14

performance improvement for ReadUvarint() #14

Comments

jwinkler2083233 commented Mar 9, 2022

jwinkler2083233 commented Mar 9, 2022

Uh oh!

jwinkler2083233 commented Mar 16, 2022

Uh oh!

Stebalien commented Mar 20, 2022

Uh oh!