Skip to content

feature/add-benchmark-case-for-deserialize-and-large #241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

KennethanCeyer
Copy link
Contributor

@KennethanCeyer KennethanCeyer commented May 19, 2021

Background

When attempting to parse the List[bytes] type as betterproto, approx. 1,000 elements or more. 500ms is required.
Our service needs to parse more than 1,000 unit bytes using betterproto, and it was determined that it would be a problem for the business to require more than 500ms of resources for each request, so I decided to improve the parse function of betterproto.

Therefore, it is necessary to know the baseline for deserialize performance improvement before working on improvement proposals in subsequent PR. In betterproto, we added benchmark cases for three message cases (simple message, nested message, repeated message) and worked on this PR. First, we want to merge them to get the baseline.

In the future, I will proceed with the following attempts.

  • Analysis and improvement of profiling and time complexity for parse method
  • If the above method determines that work improvement is impossible due to the influence of the betterproto lifecycle, proceed as follows.
    • Fork betterproto and parse itself through CDLL Extension in C language, and perform computation-intensive work in C native-level by extracting .so and .dll (I don't know if merger is possible, the impact on multi-platform increases. )
  • We will also review whether the parse method can direct-dump struct memory, and look at issues and PRs for discussions in advance.

Features

  • Add benchmark cases as follows

    • time_deserialize
    • time_serialize_nested
    • time_deserialize_nested
    • time_serialize_repeated
    • time_deserialize_repeated
  • How to test

<spec>
- Darwin 19.6.0
- x86_64
- Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
- num cores: 16
- 32GB mem
asv run
· Creating environments
· Discovering benchmarks
· Running 12 total benchmarks (1 commits * 1 environments * 12 benchmarks)
[  0.00%] · For python-betterproto commit 02e41afd <v2.0.0b3>:
[  0.00%] ·· Benchmarking virtualenv-py3.9
[  4.17%] ··· Running (benchmarks.BenchMessage.time_attribute_access--)...........
[ 54.17%] ··· benchmarks.BenchMessage.time_attribute_access                          879±70ns
[ 58.33%] ··· benchmarks.BenchMessage.time_attribute_setting                       4.82±0.2μs
[ 62.50%] ··· benchmarks.BenchMessage.time_deserialize                             17.1±0.2μs
[ 66.67%] ··· benchmarks.BenchMessage.time_deserialize_nested                         217±4μs
[ 70.83%] ··· benchmarks.BenchMessage.time_deserialize_repeated                      99.3±2μs
[ 75.00%] ··· benchmarks.BenchMessage.time_init_with_values                        7.71±0.3μs
[ 79.17%] ··· benchmarks.BenchMessage.time_instantiation                           7.57±0.2μs
[ 83.33%] ··· benchmarks.BenchMessage.time_overhead                                   319±8μs
[ 87.50%] ··· benchmarks.BenchMessage.time_serialize                               10.4±0.3μs
[ 91.67%] ··· benchmarks.BenchMessage.time_serialize_nested                           172±2μs
[ 95.83%] ··· benchmarks.BenchMessage.time_serialize_repeated                        50.0±2μs
[100.00%] ··· benchmarks.MemSuite.mem_instance                                            408

@Gobot1234
Copy link
Collaborator

Just FYI I'm currently working on seeing if I can use mypyc to compile the code I'm interested to see what performance it yields.

@Gobot1234
Copy link
Collaborator

As an update here are the numbers, they aren't very good although I think it's just because mypyc doesn't support native byte(array) methods

# c
# time_attribute_access ran in 0.09705529699999982
# time_attribute_setting ran in 0.21463662300000008
# time_deserialize ran in 0.49555612699999996
# time_deserialize_nested ran in 6.588607696
# time_deserialize_repeated ran in 2.2661100229999995
# time_init_with_values ran in 0.2958655099999987
# time_instantiation ran in 0.27691258399999974
# time_overhead ran in 10.969843118
# time_serialize ran in 0.35886646899999874
# time_serialize_nested ran in 5.025464720999999
# time_serialize_repeated ran in 1.277189609999997

# no c
# time_attribute_access ran in 0.04139304299999996
# time_attribute_setting ran in 0.043615654000000004
# time_deserialize ran in 0.146516945
# time_deserialize_nested ran in 1.219142667
# time_deserialize_repeated ran in 0.7966103840000001
# time_init_with_values ran in 0.03998950099999998
# time_instantiation ran in 0.0585064639999997
# time_overhead ran in 1.5101277
# time_serialize ran in 0.04643382699999954
# time_serialize_nested ran in 0.75196694
# time_serialize_repeated ran in 0.25223746999999985

@KennethanCeyer
Copy link
Contributor Author

KennethanCeyer commented May 29, 2021

@Gobot1234 that's... Interesting
Could I receive the code permalink written in mypyc?
I would like to refer to the native C dll extension and performance change factors.

@Gobot1234
Copy link
Collaborator

https://github.com/Gobot1234/python-betterproto/tree/mypyc

@KennethanCeyer
Copy link
Contributor Author

Could I get a review for this PR?
After this PR is merged, we want to proceed with the optimization PR work by comparing the metric change with asv.

Copy link
Collaborator

@nat-n nat-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @KennethanCeyer Thanks for working on this and sorry for the slow response.

Have you considered using Cython? All else being equal I think it would be the preferable way to compiled performance gains for parsing. The reason being that adding other languages would complicate distribution and maintenance of betterproto going forward.

@nat-n nat-n merged commit ad8b917 into danielgtaylor:master Jun 21, 2021
@Gobot1234
Copy link
Collaborator

mypyc now supports native operations on bytearray, the numbers for this now look a lot better.

@KennethanCeyer
Copy link
Contributor Author

KennethanCeyer commented Aug 18, 2021

@nat-n
I agree with your consideration,
For maintain purpose, I'll try to dig down around into Cython to speed up betterproto deserialize scenarios

@Gobot1234
Copy link
Collaborator

@KennethanCeyer Any update on the cython front? I'd be more than happy to do this myself with the new cython pep 484 type hint support stuff to avoid a lot of duplication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants