Skip to content

Commit 59fb94b

Browse files
WEB: Add page about benchmarks
1 parent e379692 commit 59fb94b

File tree

3 files changed

+66
-28
lines changed

3 files changed

+66
-28
lines changed

Diff for: doc/source/development/maintaining.rst

-28
Original file line numberDiff line numberDiff line change
@@ -326,34 +326,6 @@ a milestone before tagging, you can request the bot to backport it with:
326326
@Meeseeksdev backport <branch>
327327
328328
329-
.. _maintaining.asv-machine:
330-
331-
Benchmark machine
332-
-----------------
333-
334-
The team currently owns dedicated hardware for hosting a website for pandas' ASV performance benchmark. The results
335-
are published to https://asv-runner.github.io/asv-collection/pandas/
336-
337-
Configuration
338-
`````````````
339-
340-
The machine can be configured with the `Ansible <http://docs.ansible.com/ansible/latest/index.html>`_ playbook in https://github.com/tomaugspurger/asv-runner.
341-
342-
Publishing
343-
``````````
344-
345-
The results are published to another GitHub repository, https://github.com/tomaugspurger/asv-collection.
346-
Finally, we have a cron job on our docs server to pull from https://github.com/tomaugspurger/asv-collection, to serve them from ``/speed``.
347-
Ask Tom or Joris for access to the webserver.
348-
349-
Debugging
350-
`````````
351-
352-
The benchmarks are scheduled by Airflow. It has a dashboard for viewing and debugging the results. You'll need to setup an SSH tunnel to view them
353-
354-
ssh -L 8080:localhost:8080 [email protected]
355-
356-
357329
.. _maintaining.release:
358330

359331
Release process

Diff for: web/pandas/community/benchmarks.md

+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Benchmarks
2+
3+
Benchmarks are tests to measure the performance of pandas. There are two different
4+
kinds of benchmarks relevant to pandas:
5+
6+
* Internal pandas benchmarks to measure speed and memory usage over time
7+
* Community benchmarks comparing the speed or memory usage of different tools at
8+
doing the same job
9+
10+
## pandas benchmarks
11+
12+
pandas benchmarks are implemented in the [asv_bench](https://github.com/pandas-dev/pandas/tree/main/asv_bench)
13+
directory of our repository. The benchmarks are implemented for the
14+
[airspeed velocity](https://asv.readthedocs.io/en/v0.6.1/) (asv for short) framework.
15+
16+
The benchmarks can be run locally by any pandas developer. This can be done
17+
with the `asv run` command, and it can be useful to detect if local changes have
18+
an impact in performance, by running the benchmarks before and after the changes.
19+
20+
Note that benchmarks are not deterministic, and running in different hardware or
21+
running in the same hardware with different levels of stress have a big impact in
22+
the result. Even running the benchmarks with identical hardware and almost identical
23+
conditions produces significant differences when running the same exact code.
24+
25+
## pandas benchmarks servers
26+
27+
We currently have two physical servers running the benchmarks of pandas for every
28+
(or almost every) commit to the `main` branch. The servers run independently from
29+
each other. The original server has been running for a long time, and it is physically
30+
located with one of the pandas maintainers. The newer server is in a datacenter
31+
kindly sponsored by [OVHCloud](https://www.ovhcloud.com/). More information about
32+
pandas sponsors, and how your company can support the development of pandas is
33+
available at the [pandas sponsors]({{ base_url }}about/sponsors.html) page.
34+
35+
Results of the benchmarks are available at:
36+
37+
- Original server: [asv](https://asv-runner.github.io/asv-collection/pandas/)
38+
- OVH server: [asv](https://pandas.pydata.org/benchmarks/asv/) [conbench](https://pandas.pydata.org/benchmarks/conbench/)
39+
40+
### Original server configuration
41+
42+
The machine can be configured with the Ansible playbook in
43+
[tomaugspurger/asv-runner](https://github.com/tomaugspurger/asv-runner).
44+
The results are published to another GitHub repository,
45+
[tomaugspurger/asv-collection](https://github.com/tomaugspurger/asv-collection).
46+
47+
The benchmarks are scheduled by [Airflow](https://airflow.apache.org/).
48+
It has a dashboard for viewing and debugging the results.
49+
You’ll need to setup an SSH tunnel to view them:
50+
51+
```
52+
ssh -L 8080:localhost:8080 [email protected]
53+
```
54+
55+
### OVH server configuration
56+
57+
TODO
58+
59+
## Community benchmarks
60+
61+
The main benchmarks comparing dataframe tools that include pandas are:
62+
63+
- [H2O.ai benchmarks](https://h2oai.github.io/db-benchmark/)
64+
- [TPCH benchmarks](https://pola.rs/posts/benchmarks/)

Diff for: web/pandas/config.yml

+2
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ navbar:
5454
target: community/coc.html
5555
- name: "Ecosystem"
5656
target: community/ecosystem.html
57+
- name: "Benchmarks"
58+
target: community/benchmarks.html
5759
- name: "Contribute"
5860
target: contribute.html
5961
blog:

0 commit comments

Comments
 (0)