WEB: Add page about benchmarks

datapythonista · datapythonista · commit 59fb94bd557d · 2024-01-16T16:48:48.000+07:00
diff --git a/doc/source/development/maintaining.rst b/doc/source/development/maintaining.rst
@@ -326,34 +326,6 @@ a milestone before tagging, you can request the bot to backport it with:
    @Meeseeksdev backport <branch>
 
 
-.. _maintaining.asv-machine:
-
-Benchmark machine
------------------
-
-The team currently owns dedicated hardware for hosting a website for pandas' ASV performance benchmark. The results
-are published to https://asv-runner.github.io/asv-collection/pandas/
-
-Configuration
-`````````````
-
-The machine can be configured with the `Ansible <http://docs.ansible.com/ansible/latest/index.html>`_ playbook in https://github.com/tomaugspurger/asv-runner.
-
-Publishing
-``````````
-
-The results are published to another GitHub repository, https://github.com/tomaugspurger/asv-collection.
-Finally, we have a cron job on our docs server to pull from https://github.com/tomaugspurger/asv-collection, to serve them from ``/speed``.
-Ask Tom or Joris for access to the webserver.
-
-Debugging
-`````````
-
-The benchmarks are scheduled by Airflow. It has a dashboard for viewing and debugging the results. You'll need to setup an SSH tunnel to view them
-
-    ssh -L 8080:localhost:8080 pandas@panda.likescandy.com
-
-
 .. _maintaining.release:
 
 Release process
diff --git a/web/pandas/community/benchmarks.md b/web/pandas/community/benchmarks.md
@@ -0,0 +1,64 @@
+# Benchmarks
+
+Benchmarks are tests to measure the performance of pandas. There are two different
+kinds of benchmarks relevant to pandas:
+
+* Internal pandas benchmarks to measure speed and memory usage over time
+* Community benchmarks comparing the speed or memory usage of different tools at
+  doing the same job
+
+## pandas benchmarks
+
+pandas benchmarks are implemented in the [asv_bench](https://github.com/pandas-dev/pandas/tree/main/asv_bench)
+directory of our repository. The benchmarks are implemented for the
+[airspeed velocity](https://asv.readthedocs.io/en/v0.6.1/) (asv for short) framework.
+
+The benchmarks can be run locally by any pandas developer. This can be done
+with the `asv run` command, and it can be useful to detect if local changes have
+an impact in performance, by running the benchmarks before and after the changes.
+
+Note that benchmarks are not deterministic, and running in different hardware or
+running in the same hardware with different levels of stress have a big impact in
+the result. Even running the benchmarks with identical hardware and almost identical
+conditions produces significant differences when running the same exact code.
+
+## pandas benchmarks servers
+
+We currently have two physical servers running the benchmarks of pandas for every
+(or almost every) commit to the `main` branch. The servers run independently from
+each other. The original server has been running for a long time, and it is physically
+located with one of the pandas maintainers. The newer server is in a datacenter
+kindly sponsored by [OVHCloud](https://www.ovhcloud.com/). More information about
+pandas sponsors, and how your company can support the development of pandas is
+available at the [pandas sponsors]({{ base_url }}about/sponsors.html) page.
+
+Results of the benchmarks are available at:
+
+- Original server: [asv](https://asv-runner.github.io/asv-collection/pandas/)
+- OVH server: [asv](https://pandas.pydata.org/benchmarks/asv/) [conbench](https://pandas.pydata.org/benchmarks/conbench/)
+
+### Original server configuration
+
+The machine can be configured with the Ansible playbook in
+[tomaugspurger/asv-runner](https://github.com/tomaugspurger/asv-runner).
+The results are published to another GitHub repository,
+[tomaugspurger/asv-collection](https://github.com/tomaugspurger/asv-collection).
+
+The benchmarks are scheduled by [Airflow](https://airflow.apache.org/).
+It has a dashboard for viewing and debugging the results.
+You’ll need to setup an SSH tunnel to view them:
+
+```
+ssh -L 8080:localhost:8080 pandas@panda.likescandy.com
+```
+
+### OVH server configuration
+
+TODO
+
+## Community benchmarks
+
+The main benchmarks comparing dataframe tools that include pandas are:
+
+- [H2O.ai benchmarks](https://h2oai.github.io/db-benchmark/)
+- [TPCH benchmarks](https://pola.rs/posts/benchmarks/)
diff --git a/web/pandas/config.yml b/web/pandas/config.yml
@@ -54,6 +54,8 @@ navbar:
       target: community/coc.html
     - name: "Ecosystem"
       target: community/ecosystem.html
+    - name: "Benchmarks"
+      target: community/benchmarks.html
   - name: "Contribute"
     target: contribute.html
 blog: