Skip to content

performance improvement for ReadUvarint() #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jwinkler2083233 opened this issue Mar 9, 2022 · 3 comments · Fixed by #15
Closed

performance improvement for ReadUvarint() #14

jwinkler2083233 opened this issue Mar 9, 2022 · 3 comments · Fixed by #15

Comments

@jwinkler2083233
Copy link

This ReadUvarint() method is time-sensitive, because it's running inside a mutex. It's called very often.
I'd like to offer a more performant version, that will improve latencies:

Previous version:

func ReadUvarint(r io.ByteReader) (uint64, error) {
	// Modified from the go standard library. Copyright the Go Authors and
	// released under the BSD License.
	var x uint64
	var s uint
	for i := 0; ; i++ {
		b, err := r.ReadByte()
		if err != nil {
			if err == io.EOF && i != 0 {
				// "eof" will look like a success.
				// If we've read part of a value, this is not a
				// success.
				err = io.ErrUnexpectedEOF
			}
			return 0, err
		}
		if (i == 8 && b >= 0x80) || i >= MaxLenUvarint63 {
			// this is the 9th and last byte we're willing to read, but it
			// signals there's more (1 in MSB).
			// or this is the >= 10th byte, and for some reason we're still here.
			return 0, ErrOverflow
		}
		if b < 0x80 {
			if b == 0 && s > 0 {
				return 0, ErrNotMinimal
			}
			return x | uint64(b)<<s, nil
		}
		x |= uint64(b&0x7f) << s
		s += 7
	}
}

New version:

func ReadUvarint(r io.ByteReader) (uint64, error) {
	// Modified from the go standard library. Copyright the Go Authors and
	// released under the BSD License.
	var x uint64
	var s uint
	for s = 0; ; s+=7 {
		b, err := r.ReadByte()
		if err != nil {
			if err == io.EOF && i != 0 {
				// "eof" will look like a success.
				// If we've read part of a value, this is not a
				// success.
				err = io.ErrUnexpectedEOF
			}
			return 0, err
		}
		if (s == 56 && b >= 0x80) || s >= (7 * MaxLenUvarint63) {
			// this is the 9th and last byte we're willing to read, but it
			// signals there's more (1 in MSB).
			// or this is the >= 10th byte, and for some reason we're still here.
			return 0, ErrOverflow
		}
                 if b < 0x80 {
			if b == 0 && s > 0 {
				return 0, ErrNotMinimal
			}
			return x | uint64(b)<<s, nil
		}
		x |= uint64(b&0x7f) << s
		s += 7
	}
}

The 'i == 8' is replaced with 's == 56'. MaxLenUvarint63 is '9', currently, so that's not an overflow problem.
This change removes the unnecessary addition operation for 'i'.

@jwinkler2083233
Copy link
Author

Since Golang compilers don't allow us to unroll automatically, here's this method even more optimized. Most of the variables are not really necessary:

func ReadUvarint(r io.ByteReader) (uint64, error) {
        // Modified from the go standard library. Copyright the Go Authors and
        // released under the BSD License.
        var x uint64

        // byte index 0  (i = 0)
        b, err := r.ReadByte()
        if err != nil {
                return 0, err
        }
        if b < 0x80 {
                return x | uint64(b), nil
        }
        x |= uint64(b & 0x7f)

        // byte index 1 (i = 1)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        // "eof" will look like a success.
                        // If we've read part of a value, this is not a
                        // success.
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<7, nil
        }
        x |= uint64(b&0x7f) << 7

        // byte index 2 (i = 2, s = 14)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<14, nil
        }
        x |= uint64(b&0x7f) << 14

        // byte index 3 (i = 3, s = 21)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<21, nil
        }
        x |= uint64(b&0x7f) << 21

        // byte index 4 (i = 4, s = 28)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<28, nil
        }
        x |= uint64(b&0x7f) << 28

        // byte index 5 (i = 5, s = 35)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<35, nil
        }
        x |= uint64(b&0x7f) << 35

        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<42, nil
        }
        x |= uint64(b&0x7f) << 42

        // byte index 7 (i = 7, s = 49)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b < 0x80 {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<49, nil
        }
        x |= uint64(b&0x7f) << 49

        // byte index 8 (i = 8, s = 56)
        b, err = r.ReadByte()
        if err != nil {
                if err == io.EOF {
                        err = io.ErrUnexpectedEOF
                }
                return 0, err
        }
        if b >= 0x80 {
                // this is the 9th and last byte we're willing to read, but it
                // signals there's more (1 in MSB).
                // or this is the >= 10th byte, and for some reason we're still here.
                return 0, ErrOverflow
        } else {
                if b == 0 {
                        return 0, ErrNotMinimal
                }
                return x | uint64(b)<<56, nil
        }
}

@jwinkler2083233
Copy link
Author

Testing shows that the unrolled version cut the time spent inside a mutex by 30-60%

@Stebalien
Copy link
Member

Your first version is faster (~20%) but the loop unrolling didn't seem to help. In general, loop unrolling only helps with tight loops. In this case, the indirect call to ReadByte is likely removing the benefits from the unrolling.

Stebalien added a commit that referenced this issue Mar 20, 2022
This speeds up decoding by about 20%. Suggested by @jwinkler2083233 in

Fixes #14.
Stebalien added a commit that referenced this issue Mar 20, 2022
This speeds up decoding by about 20%. Suggested by @jwinkler2083233 in

Fixes #14.
Stebalien added a commit that referenced this issue Nov 23, 2022
This speeds up decoding by about 20%. Suggested by @jwinkler2083233 in

Fixes #14.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants