Skip to content

Commit 1b74d4d

Browse files
weiji14seisman
andauthored
Initialize data version control for managing test images (#1036)
Using a data version control package called [`dvc`](https://github.com/iterative/dvc) to manage the PNG test images in the PyGMT repo! In a nutshell, store only the hash of the PNG on GitHub (in a *.png.dvc file), while having the actual PNG stored on DAGsHub at https://dagshub.com/GenericMappingTools/pygmt. * Initialize data version control Adding dvc package to environment.yml and running `dvc init` to get the barebones .dvcignore, .dvc/config & .dvc/.gitignore files. * Set dvc remote as https://dagshub.com/GenericMappingTools/pygmt.dvc * Temporarily installing dvc using pip instead of conda to make CI work * Refactor test_logo to use mpl_image_compare and track png files in dvc * Add dvc pull as a step in ci_tests.yaml to pull in data * List files in pygmt/tests/baseline/ to see what happens after dvc pull * Do `dvc pull` before `pip install dist/*` otherwise test PNGs aren't there * First draft of instructions for using dvc to store baseline images * Instruct to do `git push` first and then `dvc push` Technically the order shouldn't matter, but most tutorials seem to use `git push` first so follow that. * New checklist item for maintainers to get added to DAGsHub dvc remote * Move pygmt/tests/baseline/.gitignore to top-level * Clarify that `git rm -r --cached` only needs to run during migration * Try installing dvc from conda again now that there is a Py3.9 package * Install dvc and do `dvc pull` on GMT dev tests too * Refactor test_logo tests to be simpler and more unit-test like * Mention dvc status command to see which files need staging * Update test_image to use SI units and long aliases Co-authored-by: Dongdong Tian <[email protected]>
1 parent 1a2289a commit 1b74d4d

17 files changed

+127
-27
lines changed

.dvc/.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
/config.local
2+
/tmp
3+
/cache

.dvc/config

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
[core]
2+
remote = upstream
3+
['remote "upstream"']
4+
url = https://dagshub.com/GenericMappingTools/pygmt.dvc

.dvcignore

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Add patterns of files dvc should ignore, which could improve
2+
# the performance. Learn more at
3+
# https://dvc.org/doc/user-guide/dvcignore

.github/workflows/ci_tests.yaml

+7-1
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ jobs:
8383
- name: Install dependencies
8484
run: |
8585
conda install gmt=6.1.1 numpy pandas xarray netCDF4 packaging \
86-
codecov coverage[toml] ipython make \
86+
codecov coverage[toml] dvc ipython make \
8787
pytest-cov pytest-mpl pytest>=6.0 \
8888
sphinx-gallery
8989
@@ -109,6 +109,12 @@ jobs:
109109
touch ~/.gmt/server/gmt_data_server.txt ~/.gmt/server/gmt_hash_server.txt
110110
ls -lhR ~/.gmt
111111
112+
# Pull baseline image data from dvc remote (DAGsHub)
113+
- name: Pull baseline image data from dvc remote
114+
run: |
115+
dvc pull
116+
ls -lhR pygmt/tests/baseline/
117+
112118
# Install the package that we want to test
113119
- name: Install the package
114120
run: |

.github/workflows/ci_tests_dev.yaml

+11-4
Original file line numberDiff line numberDiff line change
@@ -77,11 +77,12 @@ jobs:
7777
channels: conda-forge
7878
miniconda-version: "latest"
7979

80-
# Install build dependencies from conda-forge
81-
- name: Install build dependencies
80+
# Install dependencies from conda-forge
81+
- name: Install dependencies
8282
run: |
83-
conda install ninja cmake libblas libcblas liblapack fftw gdal ghostscript \
84-
libnetcdf hdf5 zlib curl pcre ipython pytest pytest-cov pytest-mpl
83+
conda install ninja cmake libblas libcblas liblapack fftw gdal \
84+
ghostscript libnetcdf hdf5 zlib curl pcre ipython \
85+
dvc pytest pytest-cov pytest-mpl
8586
8687
# Build and install latest GMT from GitHub
8788
- name: Install GMT ${{ matrix.gmt_git_ref }} branch (Linux/macOS)
@@ -113,6 +114,12 @@ jobs:
113114
touch ~/.gmt/server/gmt_data_server.txt ~/.gmt/server/gmt_hash_server.txt
114115
ls -lhR ~/.gmt
115116
117+
# Pull baseline image data from dvc remote (DAGsHub)
118+
- name: Pull baseline image data from dvc remote
119+
run: |
120+
dvc pull
121+
ls -lhR pygmt/tests/baseline/
122+
116123
# Install the package that we want to test
117124
- name: Install the package
118125
run: |

.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,6 @@ doc/tutorials/
4444

4545
# macOS
4646
.DS_Store
47+
48+
# Data files (tracked using dvc)
49+
pygmt/tests/baseline/test_*.png

CONTRIBUTING.md

+69-1
Original file line numberDiff line numberDiff line change
@@ -423,7 +423,75 @@ If it's correct, copy it (and only it) to `pygmt/tests/baseline`.
423423
When you run `make test` the next time, your test should be executed and
424424
passing.
425425

426-
Don't forget to commit the baseline image as well.
426+
Don't forget to commit the baseline image as well!
427+
The images should be pushed up into a remote repository using `dvc` (instead of
428+
`git`) as will be explained in the next section.
429+
430+
#### Using data version control ([dvc](https://dvc.org)) to manage test images
431+
432+
As the baseline images are quite large blob files that can change often (e.g.
433+
with new GMT versions), it is not ideal to store them in `git` (which is meant
434+
for tracking plain text files). Instead, we will use [`dvc`](https://dvc.org)
435+
which is like `git` but for data. What `dvc` does is to store the hash (md5sum)
436+
of a file. For example, given an image file like `test_logo.png`, `dvc` will
437+
generate a `test_logo.png.dvc` plain text file containing the hash of the
438+
image. This `test_logo.png.dvc` file can be stored as usual on GitHub, while
439+
the `test_logo.png` file can be stored separately on our `dvc` remote at
440+
https://dagshub.com/GenericMappingTools/pygmt.
441+
442+
To **pull** or sync files from the `dvc` remote to your local repository, use
443+
the commands below. Note how `dvc` commands are very similar to `git`.
444+
445+
dvc status # should report any files 'not_in_cache'
446+
dvc pull # pull down files from DVC remote cache (fetch + checkout)
447+
448+
Once the sync/download is complete, you should notice two things. There will be
449+
images stored in the `pygmt/tests/baseline` folder (e.g. `test_logo.png`) and
450+
these images are technically reflinks/symlinks/copies of the files under the
451+
`.dvc/cache` folder. You can now run the image comparison test suite as per
452+
usual.
453+
454+
pytest pygmt/tests/test_logo.py # run only one test
455+
make test # run the entire test suite
456+
457+
To **push** or sync changes from your local repository up to the `dvc` remote
458+
at DAGsHub, you will first need to set up authentication using the commands
459+
below. This only needs to be done once, i.e. the first time you contribute a
460+
test image to the PyGMT project.
461+
462+
dvc remote modify upstream --local auth basic
463+
dvc remote modify upstream --local user "$DAGSHUB_USER"
464+
dvc remote modify upstream --local password "$DAGSHUB_PASS"
465+
466+
The configuration will be stored inside your `.dvc/config.local` file. Note
467+
that the $DAGSHUB_PASS token can be generated at
468+
https://dagshub.com/user/settings/tokens after creating a DAGsHub account
469+
(can be linked to your GitHub account). Once you have an account set up, please
470+
ask one of the PyGMT maintainers to add you as a collaborator at
471+
https://dagshub.com/GenericMappingTools/pygmt/settings/collaboration before
472+
proceeding with the next steps.
473+
474+
The entire workflow for generating or modifying baseline test images can be
475+
summarized as follows:
476+
477+
# Sync with both git and dvc remotes
478+
git pull
479+
dvc pull
480+
481+
# Generate new baseline images
482+
pytest --mpl-generate-path=baseline pygmt/tests/test_logo.py
483+
mv baseline/*.png pygmt/tests/baseline/
484+
485+
# Generate hash for baseline image and stage the *.dvc file in git
486+
git rm -r --cached 'pygmt/tests/baseline/test_logo.png' # only run if migrating existing image from git to dvc
487+
dvc status # check which files need to be added to dvc
488+
dvc add pygmt/tests/baseline/test_logo.png
489+
git add pygmt/tests/baseline/test_logo.png.dvc
490+
491+
# Commit changes and push to both the git and dvc remotes
492+
git commit -m "Add test_logo.png into DVC"
493+
git push
494+
dvc push
427495

428496
### Documentation
429497

MAINTENANCE.md

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ If you want to make a contribution to the project, see the
2222
## Onboarding Access Checklist
2323

2424
- [ ] Added to [python-maintainers](https://github.com/orgs/GenericMappingTools/teams/python-maintainers) team in the [GenericMappingTools](https://github.com/orgs/GenericMappingTools/teams/) organization on GitHub (gives 'maintain' permissions)
25+
- [ ] Added as collaborator on [DAGsHub](https://dagshub.com/GenericMappingTools/pygmt/settings/collaboration) (gives 'write' permission to dvc remote storage)
2526
- [ ] Added as moderator on [GMT forum](https://forum.generic-mapping-tools.org) (to see mod-only discussions)
2627
- [ ] Added as member on the [PyGMT devs Slack channel](https://pygmtdevs.slack.com) (for casual conversations)
2728
- [ ] Added as maintainer on [PyPI](https://pypi.org/project/pygmt/) and [Test PyPI](https://test.pypi.org/project/pygmt) [optional]

environment.yml

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ dependencies:
1717
- codecov
1818
- coverage[toml]
1919
- docformatter
20+
- dvc
2021
- flake8
2122
- ipython
2223
- isort>=5

pygmt/tests/baseline/test_image.png

-13.9 KB
Binary file not shown.
+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
outs:
2+
- md5: de86468aa453b14912c8362c67e51064
3+
size: 10403
4+
path: test_image.png

pygmt/tests/baseline/test_logo.png

-33.1 KB
Binary file not shown.
+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
outs:
2+
- md5: 905d5b9f0f8d8b809899dfe9e87d0e91
3+
size: 33347
4+
path: test_logo.png
-113 KB
Binary file not shown.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
outs:
2+
- md5: 409119aeeec2680d106e32527009c255
3+
size: 77366
4+
path: test_logo_on_a_map.png

pygmt/tests/test_image.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,5 @@ def test_image():
1717
Place images on map.
1818
"""
1919
fig = Figure()
20-
fig.image(TEST_IMG, D="x0/0+w1i", F="+pthin,blue")
20+
fig.image(TEST_IMG, position="x0/0+w2c", box="+pthin,blue")
2121
return fig

pygmt/tests/test_logo.py

+12-20
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,26 @@
11
"""
22
Tests for fig.logo.
33
"""
4+
import pytest
45
from pygmt import Figure
5-
from pygmt.helpers.testing import check_figures_equal
66

77

8-
@check_figures_equal()
8+
@pytest.mark.mpl_image_compare
99
def test_logo():
1010
"""
11-
Plot a GMT logo of a 2 inch width as a stand-alone plot.
11+
Plot the GMT logo as a stand-alone plot.
1212
"""
13-
fig_ref, fig_test = Figure(), Figure()
14-
# Use single-character arguments for the reference image
15-
fig_ref.logo(D="x0/0+w2i")
16-
fig_test.logo(position="x0/0+w2i")
17-
return fig_ref, fig_test
13+
fig = Figure()
14+
fig.logo()
15+
return fig
1816

1917

20-
@check_figures_equal()
18+
@pytest.mark.mpl_image_compare
2119
def test_logo_on_a_map():
2220
"""
23-
Plot a GMT logo in the upper right corner of a map.
21+
Plot the GMT logo at the upper right corner of a map.
2422
"""
25-
fig_ref, fig_test = Figure(), Figure()
26-
# Use single-character arguments for the reference image
27-
fig_ref.coast(R="-90/-70/0/20", J="M6i", G="chocolate", B="")
28-
fig_ref.logo(D="jTR+o0.1i/0.1i+w3i", F="")
29-
30-
fig_test.coast(
31-
region=[-90, -70, 0, 20], projection="M6i", land="chocolate", frame=True
32-
)
33-
fig_test.logo(position="jTR+o0.1i/0.1i+w3i", box=True)
34-
return fig_ref, fig_test
23+
fig = Figure()
24+
fig.basemap(region=[-90, -70, 0, 20], projection="M15c", frame=True)
25+
fig.logo(position="jTR+o0.25c/0.25c+w7.5c", box=True)
26+
return fig

0 commit comments

Comments
 (0)