Skip to content

Commit 301c5c7

Browse files
WEB: Add page about benchmarks (#56907)
1 parent e14a9bd commit 301c5c7

File tree

3 files changed

+81
-28
lines changed

3 files changed

+81
-28
lines changed

doc/source/development/maintaining.rst

-28
Original file line numberDiff line numberDiff line change
@@ -326,34 +326,6 @@ a milestone before tagging, you can request the bot to backport it with:
326326
@Meeseeksdev backport <branch>
327327
328328
329-
.. _maintaining.asv-machine:
330-
331-
Benchmark machine
332-
-----------------
333-
334-
The team currently owns dedicated hardware for hosting a website for pandas' ASV performance benchmark. The results
335-
are published to https://asv-runner.github.io/asv-collection/pandas/
336-
337-
Configuration
338-
`````````````
339-
340-
The machine can be configured with the `Ansible <http://docs.ansible.com/ansible/latest/index.html>`_ playbook in https://github.com/tomaugspurger/asv-runner.
341-
342-
Publishing
343-
``````````
344-
345-
The results are published to another GitHub repository, https://github.com/tomaugspurger/asv-collection.
346-
Finally, we have a cron job on our docs server to pull from https://github.com/tomaugspurger/asv-collection, to serve them from ``/speed``.
347-
Ask Tom or Joris for access to the webserver.
348-
349-
Debugging
350-
`````````
351-
352-
The benchmarks are scheduled by Airflow. It has a dashboard for viewing and debugging the results. You'll need to setup an SSH tunnel to view them
353-
354-
ssh -L 8080:localhost:8080 [email protected]
355-
356-
357329
.. _maintaining.release:
358330

359331
Release process

web/pandas/community/benchmarks.md

+79
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Benchmarks
2+
3+
Benchmarks are tests to measure the performance of pandas. There are two different
4+
kinds of benchmarks relevant to pandas:
5+
6+
* Internal pandas benchmarks to measure speed and memory usage over time
7+
* Community benchmarks comparing the speed or memory usage of different tools at
8+
doing the same job
9+
10+
## pandas benchmarks
11+
12+
pandas benchmarks are implemented in the [asv_bench](https://github.com/pandas-dev/pandas/tree/main/asv_bench)
13+
directory of our repository. The benchmarks are implemented for the
14+
[airspeed velocity](https://asv.readthedocs.io/en/v0.6.1/) (asv for short) framework.
15+
16+
The benchmarks can be run locally by any pandas developer. This can be done
17+
with the `asv run` command, and it can be useful to detect if local changes have
18+
an impact in performance, by running the benchmarks before and after the changes.
19+
More information on running the performance test suite is found
20+
[here](https://pandas.pydata.org/docs/dev/development/contributing_codebase.html#running-the-performance-test-suite).
21+
22+
Note that benchmarks are not deterministic, and running in different hardware or
23+
running in the same hardware with different levels of stress have a big impact in
24+
the result. Even running the benchmarks with identical hardware and almost identical
25+
conditions produces significant differences when running the same exact code.
26+
27+
## pandas benchmarks servers
28+
29+
We currently have two physical servers running the benchmarks of pandas for every
30+
(or almost every) commit to the `main` branch. The servers run independently from
31+
each other. The original server has been running for a long time, and it is physically
32+
located with one of the pandas maintainers. The newer server is in a datacenter
33+
kindly sponsored by [OVHCloud](https://www.ovhcloud.com/). More information about
34+
pandas sponsors, and how your company can support the development of pandas is
35+
available at the [pandas sponsors]({{ base_url }}about/sponsors.html) page.
36+
37+
Results of the benchmarks are available at:
38+
39+
- Original server: [asv](https://asv-runner.github.io/asv-collection/pandas/)
40+
- OVH server: [asv](https://pandas.pydata.org/benchmarks/asv/) (benchmarks results can
41+
also be visualized in this [Conbench PoC](http://57.128.112.95:5000/)
42+
43+
### Original server configuration
44+
45+
The machine can be configured with the Ansible playbook in
46+
[tomaugspurger/asv-runner](https://github.com/tomaugspurger/asv-runner).
47+
The results are published to another GitHub repository,
48+
[tomaugspurger/asv-collection](https://github.com/tomaugspurger/asv-collection).
49+
50+
The benchmarks are scheduled by [Airflow](https://airflow.apache.org/).
51+
It has a dashboard for viewing and debugging the results.
52+
You’ll need to setup an SSH tunnel to view them:
53+
54+
```
55+
ssh -L 8080:localhost:8080 [email protected]
56+
```
57+
58+
### OVH server configuration
59+
60+
The server used to run the benchmarks has been configured to reduce system
61+
noise and maximize the stability of the benchmarks times.
62+
63+
The details on how the server is configured can be found in the
64+
[pandas-benchmarks repository](https://github.com/pandas-dev/pandas-benchmarks).
65+
There is a quick summary here:
66+
67+
- CPU isolation: Avoid user space tasks to execute in the same CPU as benchmarks, possibly interrupting them during the execution (include all virtual CPUs using a physical core)
68+
- NoHZ: Stop the kernel tick that enables context switching in the isolated CPU
69+
- IRQ affinity: Ban benchmarks CPU to avoid many (but not all) kernel interruption in the isolated CPU
70+
- TurboBoost: Disable CPU scaling based on high CPU demand
71+
- P-States: Use "performance" governor to disable P-States and CPU frequency changes based on them
72+
- C-States: Set C-State to 0 and disable changes to avoid slower CPU after system inactivity
73+
74+
## Community benchmarks
75+
76+
The main benchmarks comparing dataframe tools that include pandas are:
77+
78+
- [H2O.ai benchmarks](https://h2oai.github.io/db-benchmark/)
79+
- [TPCH benchmarks](https://pola.rs/posts/benchmarks/)

web/pandas/config.yml

+2
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ navbar:
5454
target: community/coc.html
5555
- name: "Ecosystem"
5656
target: community/ecosystem.html
57+
- name: "Benchmarks"
58+
target: community/benchmarks.html
5759
- name: "Contribute"
5860
target: contribute.html
5961
blog:

0 commit comments

Comments
 (0)