feature/add-benchmark-case-for-deserialize-and-large #241

KennethanCeyer · 2021-05-19T08:49:24Z

Background

When attempting to parse the List[bytes] type as betterproto, approx. 1,000 elements or more. 500ms is required.
Our service needs to parse more than 1,000 unit bytes using betterproto, and it was determined that it would be a problem for the business to require more than 500ms of resources for each request, so I decided to improve the parse function of betterproto.

Therefore, it is necessary to know the baseline for deserialize performance improvement before working on improvement proposals in subsequent PR. In betterproto, we added benchmark cases for three message cases (simple message, nested message, repeated message) and worked on this PR. First, we want to merge them to get the baseline.

In the future, I will proceed with the following attempts.

Analysis and improvement of profiling and time complexity for parse method
If the above method determines that work improvement is impossible due to the influence of the betterproto lifecycle, proceed as follows.
- Fork betterproto and parse itself through CDLL Extension in C language, and perform computation-intensive work in C native-level by extracting .so and .dll (I don't know if merger is possible, the impact on multi-platform increases. )
We will also review whether the parse method can direct-dump struct memory, and look at issues and PRs for discussions in advance.

Features

Add benchmark cases as follows
- time_deserialize
- time_serialize_nested
- time_deserialize_nested
- time_serialize_repeated
- time_deserialize_repeated
How to test

<spec>
- Darwin 19.6.0
- x86_64
- Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
- num cores: 16
- 32GB mem

asv run
· Creating environments
· Discovering benchmarks
· Running 12 total benchmarks (1 commits * 1 environments * 12 benchmarks)
[  0.00%] · For python-betterproto commit 02e41afd <v2.0.0b3>:
[  0.00%] ·· Benchmarking virtualenv-py3.9
[  4.17%] ··· Running (benchmarks.BenchMessage.time_attribute_access--)...........
[ 54.17%] ··· benchmarks.BenchMessage.time_attribute_access                          879±70ns
[ 58.33%] ··· benchmarks.BenchMessage.time_attribute_setting                       4.82±0.2μs
[ 62.50%] ··· benchmarks.BenchMessage.time_deserialize                             17.1±0.2μs
[ 66.67%] ··· benchmarks.BenchMessage.time_deserialize_nested                         217±4μs
[ 70.83%] ··· benchmarks.BenchMessage.time_deserialize_repeated                      99.3±2μs
[ 75.00%] ··· benchmarks.BenchMessage.time_init_with_values                        7.71±0.3μs
[ 79.17%] ··· benchmarks.BenchMessage.time_instantiation                           7.57±0.2μs
[ 83.33%] ··· benchmarks.BenchMessage.time_overhead                                   319±8μs
[ 87.50%] ··· benchmarks.BenchMessage.time_serialize                               10.4±0.3μs
[ 91.67%] ··· benchmarks.BenchMessage.time_serialize_nested                           172±2μs
[ 95.83%] ··· benchmarks.BenchMessage.time_serialize_repeated                        50.0±2μs
[100.00%] ··· benchmarks.MemSuite.mem_instance                                            408

Gobot1234 · 2021-05-19T11:33:54Z

Just FYI I'm currently working on seeing if I can use mypyc to compile the code I'm interested to see what performance it yields.

Gobot1234 · 2021-05-29T08:57:37Z

As an update here are the numbers, they aren't very good although I think it's just because mypyc doesn't support native byte(array) methods

# c
# time_attribute_access ran in 0.09705529699999982
# time_attribute_setting ran in 0.21463662300000008
# time_deserialize ran in 0.49555612699999996
# time_deserialize_nested ran in 6.588607696
# time_deserialize_repeated ran in 2.2661100229999995
# time_init_with_values ran in 0.2958655099999987
# time_instantiation ran in 0.27691258399999974
# time_overhead ran in 10.969843118
# time_serialize ran in 0.35886646899999874
# time_serialize_nested ran in 5.025464720999999
# time_serialize_repeated ran in 1.277189609999997

# no c
# time_attribute_access ran in 0.04139304299999996
# time_attribute_setting ran in 0.043615654000000004
# time_deserialize ran in 0.146516945
# time_deserialize_nested ran in 1.219142667
# time_deserialize_repeated ran in 0.7966103840000001
# time_init_with_values ran in 0.03998950099999998
# time_instantiation ran in 0.0585064639999997
# time_overhead ran in 1.5101277
# time_serialize ran in 0.04643382699999954
# time_serialize_nested ran in 0.75196694
# time_serialize_repeated ran in 0.25223746999999985

KennethanCeyer · 2021-05-29T12:25:04Z

@Gobot1234 that's... Interesting
Could I receive the code permalink written in mypyc?
I would like to refer to the native C dll extension and performance change factors.

Gobot1234 · 2021-05-29T12:49:12Z

https://github.com/Gobot1234/python-betterproto/tree/mypyc

KennethanCeyer · 2021-06-08T16:02:00Z

Could I get a review for this PR?
After this PR is merged, we want to proceed with the optimization PR work by comparing the metric change with asv.

nat-n

Hi @KennethanCeyer Thanks for working on this and sorry for the slow response.

Have you considered using Cython? All else being equal I think it would be the preferable way to compiled performance gains for parsing. The reason being that adding other languages would complicate distribution and maintenance of betterproto going forward.

Gobot1234 · 2021-08-06T14:46:57Z

mypyc now supports native operations on bytearray, the numbers for this now look a lot better.

KennethanCeyer · 2021-08-18T03:03:41Z

@nat-n
I agree with your consideration,
For maintain purpose, I'll try to dig down around into Cython to speed up betterproto deserialize scenarios

Gobot1234 · 2021-10-18T20:20:52Z

@KennethanCeyer Any update on the cython front? I'd be more than happy to do this myself with the new cython pep 484 type hint support stuff to avoid a lot of duplication.

feat: add cases for nested, repeat and deserialize

b8348e9

nat-n approved these changes Jun 21, 2021

View reviewed changes

nat-n merged commit ad8b917 into danielgtaylor:master Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/add-benchmark-case-for-deserialize-and-large #241

feature/add-benchmark-case-for-deserialize-and-large #241

KennethanCeyer commented May 19, 2021 •

edited

Loading

Gobot1234 commented May 19, 2021

Gobot1234 commented May 29, 2021

KennethanCeyer commented May 29, 2021 •

edited

Loading

Gobot1234 commented May 29, 2021

KennethanCeyer commented Jun 8, 2021

nat-n left a comment

Gobot1234 commented Aug 6, 2021

KennethanCeyer commented Aug 18, 2021 •

edited

Loading

Gobot1234 commented Oct 18, 2021

feature/add-benchmark-case-for-deserialize-and-large #241

feature/add-benchmark-case-for-deserialize-and-large #241

Conversation

KennethanCeyer commented May 19, 2021 • edited Loading

Background

Features

Gobot1234 commented May 19, 2021

Gobot1234 commented May 29, 2021

KennethanCeyer commented May 29, 2021 • edited Loading

Gobot1234 commented May 29, 2021

KennethanCeyer commented Jun 8, 2021

nat-n left a comment

Choose a reason for hiding this comment

Gobot1234 commented Aug 6, 2021

KennethanCeyer commented Aug 18, 2021 • edited Loading

Gobot1234 commented Oct 18, 2021

KennethanCeyer commented May 19, 2021 •

edited

Loading

KennethanCeyer commented May 29, 2021 •

edited

Loading

KennethanCeyer commented Aug 18, 2021 •

edited

Loading