|
| 1 | +# Benchmarks |
| 2 | + |
| 3 | +Benchmarks are tests to measure the performance of pandas. There are two different |
| 4 | +kinds of benchmarks relevant to pandas: |
| 5 | + |
| 6 | +* Internal pandas benchmarks to measure speed and memory usage over time |
| 7 | +* Community benchmarks comparing the speed or memory usage of different tools at |
| 8 | + doing the same job |
| 9 | + |
| 10 | +## pandas benchmarks |
| 11 | + |
| 12 | +pandas benchmarks are implemented in the [asv_bench](https://github.com/pandas-dev/pandas/tree/main/asv_bench) |
| 13 | +directory of our repository. The benchmarks are implemented for the |
| 14 | +[airspeed velocity](https://asv.readthedocs.io/en/v0.6.1/) (asv for short) framework. |
| 15 | + |
| 16 | +The benchmarks can be run locally by any pandas developer. This can be done |
| 17 | +with the `asv run` command, and it can be useful to detect if local changes have |
| 18 | +an impact in performance, by running the benchmarks before and after the changes. |
| 19 | + |
| 20 | +Note that benchmarks are not deterministic, and running in different hardware or |
| 21 | +running in the same hardware with different levels of stress have a big impact in |
| 22 | +the result. Even running the benchmarks with identical hardware and almost identical |
| 23 | +conditions produces significant differences when running the same exact code. |
| 24 | + |
| 25 | +## pandas benchmarks servers |
| 26 | + |
| 27 | +We currently have two physical servers running the benchmarks of pandas for every |
| 28 | +(or almost every) commit to the `main` branch. The servers run independently from |
| 29 | +each other. The original server has been running for a long time, and it is physically |
| 30 | +located with one of the pandas maintainers. The newer server is in a datacenter |
| 31 | +kindly sponsored by [OVHCloud](https://www.ovhcloud.com/). More information about |
| 32 | +pandas sponsors, and how your company can support the development of pandas is |
| 33 | +available at the [pandas sponsors]({{ base_url }}about/sponsors.html) page. |
| 34 | + |
| 35 | +Results of the benchmarks are available at: |
| 36 | + |
| 37 | +- Original server: [asv](https://asv-runner.github.io/asv-collection/pandas/) |
| 38 | +- OVH server: [asv](https://pandas.pydata.org/benchmarks/asv/) [conbench](https://pandas.pydata.org/benchmarks/conbench/) |
| 39 | + |
| 40 | +### Original server configuration |
| 41 | + |
| 42 | +The machine can be configured with the Ansible playbook in |
| 43 | +[tomaugspurger/asv-runner](https://github.com/tomaugspurger/asv-runner). |
| 44 | +The results are published to another GitHub repository, |
| 45 | +[tomaugspurger/asv-collection](https://github.com/tomaugspurger/asv-collection). |
| 46 | + |
| 47 | +The benchmarks are scheduled by [Airflow](https://airflow.apache.org/). |
| 48 | +It has a dashboard for viewing and debugging the results. |
| 49 | +You’ll need to setup an SSH tunnel to view them: |
| 50 | + |
| 51 | +``` |
| 52 | +ssh -L 8080:localhost:8080 [email protected] |
| 53 | +``` |
| 54 | + |
| 55 | +### OVH server configuration |
| 56 | + |
| 57 | +TODO |
| 58 | + |
| 59 | +## Community benchmarks |
| 60 | + |
| 61 | +The main benchmarks comparing dataframe tools that include pandas are: |
| 62 | + |
| 63 | +- [H2O.ai benchmarks](https://h2oai.github.io/db-benchmark/) |
| 64 | +- [TPCH benchmarks](https://pola.rs/posts/benchmarks/) |
0 commit comments