Skip to content

Update 45 minute overview for OceanHackWeek 2022 #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 10, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 73 additions & 59 deletions overview/xarray-in-45-min.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"\n",
"We'll start by reviewing the various components of the Xarray data model, represented here visually:\n",
"\n",
"<img src=\"https://docs.xarray.dev/en/stable/_images/dataset-diagram.png\" align=\"center\" width=\"80%\">"
"<img src=\"https://docs.xarray.dev/en/stable/_images/dataset-diagram.png\" align=\"center\" width=\"60%\">"
]
},
{
Expand All @@ -40,9 +40,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Xarray has a few small real-world tutorial datasets hosted in this GitHub repository https://github.com/pydata/xarray-data\n",
"Xarray has a few small real-world tutorial datasets hosted in the [xarray-data](https://github.com/pydata/xarray-data) GitHub repository.\n",
"\n",
"[xarray.tutorial.load_dataset](https://docs.xarray.dev/en/stable/generated/xarray.tutorial.open_dataset.html#xarray.tutorial.open_dataset) is a convenience function to download and open DataSets by name. Here we'll use `air temperature` from National Centers for Environmental Prediction. Xarray objects have convenient HTML representations to give an overview of what we're working with:"
"[xarray.tutorial.load_dataset](https://docs.xarray.dev/en/stable/generated/xarray.tutorial.open_dataset.html#xarray.tutorial.open_dataset) is a convenience function to download and open DataSets by name (listed at that link).\n",
"\n",
"Here we'll use `air temperature` from the [National Center for Environmental Prediction](https://www.weather.gov/ncep/). Xarray objects have convenient HTML representations to give an overview of what we're working with:"
]
},
{
Expand All @@ -59,7 +61,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that behind the scenes the [`xarray.open_dataset`](https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html#xarray-open-dataset) function is opening this tutorial data with the \"netCDF engine\" because the data is stored in that format. A few things are done automatically upon opening, but controlled by keyword arguments. For example, try passing the keyword argument `mask_and_scale=False`... what happens?"
"Note that behind the scenes the `tutorial.open_dataset` downloads a file. It then uses [`xarray.open_dataset`](https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html#xarray-open-dataset) function to open that file (which for this datasets is a [netCDF](https://www.unidata.ucar.edu/software/netcdf/) file). \n",
"\n",
"A few things are done automatically upon opening, but controlled by keyword arguments. For example, try passing the keyword argument `mask_and_scale=False`... what happens?"
]
},
{
Expand All @@ -70,7 +74,9 @@
"\n",
"*Many DataArrays!* \n",
"\n",
"Datasets are dictionay-like containers of DataArrays. They are a mapping of\n",
"What's a DataArray?\n",
"\n",
"Datasets are dictionary-like containers of DataArrays. They are a mapping of\n",
"variable name to DataArray:"
]
},
Expand Down Expand Up @@ -233,7 +239,7 @@
"\n",
"<img src=\"https://raw.githubusercontent.com/numpy/numpy/623bc1fae1d47df24e7f1e29321d0c0ba2771ce0/branding/logo/primary/numpylogo.svg\" width=\"25%\">\n",
"\n",
"Xarray structures wrap underlying simpler data structures. This part of Xarray is quite extensible allowing for GPU arrays, sparse arrays, arrays with units etc. which we'll look at later in this tutorial."
"Xarray structures wrapunderlying simpler array-like data structures. This part of Xarray is quite extensible allowing for distributed array, GPU arrays, sparse arrays, arrays with units etc. We'll briefly look at this later in this tutorial."
]
},
{
Expand Down Expand Up @@ -355,7 +361,7 @@
"- label-based indexing using `.sel`\n",
"- position-based indexing using `.isel`\n",
"\n",
"See the documentation for more: https://docs.xarray.dev/en/stable/indexing.html\n"
"See the [user guide](https://docs.xarray.dev/en/stable/indexing.html) for more."
]
},
{
Expand Down Expand Up @@ -476,7 +482,7 @@
"So the [area element for lat-lon coordinates](https://en.wikipedia.org/wiki/Spherical_coordinate_system#Integration_and_differentiation_in_spherical_coordinates) is\n",
"\n",
"\n",
"$$ \\delta A = R^2 \\delta \\phi \\delta \\lambda \\cos(\\phi) $$\n",
"$$ \\delta A = R^2 \\delta\\phi \\, \\delta\\lambda \\cos(\\phi) $$\n",
"\n",
"where $\\phi$ is latitude, $\\delta \\phi$ is the spacing of the points in latitude, $\\delta \\lambda$ is the spacing of the points in longitude, and $R$ is Earth's radius. (In this formula, $\\phi$ and $\\lambda$ are measured in radians)"
]
Expand All @@ -487,7 +493,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Earth's average radius\n",
"# Earth's average radius in meters\n",
"R = 6.371e6\n",
"\n",
"# Coordinate spacing for this dataset is 2.5 x 2.5 degrees\n",
Expand All @@ -498,6 +504,15 @@
"dlon = R * dλ * np.cos(np.deg2rad(ds.air.lat))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two concepts here:\n",
"1. you can call functions like `np.cos` and `np.deg2rad` ([\"numpy ufuncs\"](https://numpy.org/doc/stable/reference/ufuncs.html)) on Xarray objects and receive an Xarray object back.\n",
"2. We used [ones_like](https://docs.xarray.dev/en/stable/generated/xarray.ones_like.html) to create a DataArray that looks like `ds.air.lon` in all respects, except that the data are all ones"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -524,7 +539,7 @@
"source": [
"### Broadcasting: expanding data\n",
"\n",
"Our longitude and latitude length DataArrays are both 1D with different dimension names. If we multiple these DataArrays together the dimensionality is expanded to 2D via `broadcasting`:"
"Our longitude and latitude length DataArrays are both 1D with different dimension names. If we multiple these DataArrays together the dimensionality is expanded to 2D by _broadcasting_:"
]
},
{
Expand All @@ -545,11 +560,10 @@
"`lat` are different so it automatically \"broadcasts\" to get a 2D result. See the\n",
"last row in this image from _Jake VanderPlas Python Data Science Handbook_\n",
"\n",
"<img src=\"https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png\">\n",
"<img src=\"https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png\" align=\"center\">\n",
"\n",
"Because xarray knows about dimension names we avoid having to create unnecessary\n",
"size-1 dimensions using `np.newaxis` or `.reshape`. For more, see\n",
"https://docs.xarray.dev/en/stable/user-guide/computation.html#broadcasting-by-dimension-name\n"
"size-1 dimensions using `np.newaxis` or `.reshape`. For more, see the [user guide](https://docs.xarray.dev/en/stable/user-guide/computation.html#broadcasting-by-dimension-name)\n"
]
},
{
Expand Down Expand Up @@ -613,7 +627,7 @@
"means that your xarray coordinates were not aligned _exactly_.\n",
"\n",
"For more, see\n",
"https://docs.xarray.dev/en/stable/user-guide/computation.html#automatic-alignment\n"
"[the Xarray documentation](https://docs.xarray.dev/en/stable/user-guide/computation.html#automatic-alignment). [This tutorial notebook](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html) also covers alignment and broadcasting.\n"
]
},
{
Expand All @@ -638,7 +652,10 @@
"1. `coarsen` :\n",
" [Downsample your data](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n",
"1. `weighted` :\n",
" [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n"
" [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n",
"\n",
"\n",
"Below we quickly demonstrate these patterns. See the user guide links above and [the tutorial](https://tutorial.xarray.dev/intermediate/01-high-level-computation-patterns.html) for more."
]
},
{
Expand Down Expand Up @@ -674,7 +691,7 @@
"metadata": {},
"source": [
"The seasons are out of order (they are alphabetically sorted). This is a common\n",
"annoyance. The solution is to use `.reindex`\n"
"annoyance. The solution is to use `.sel` to change the order of labels\n"
]
},
{
Expand All @@ -683,7 +700,7 @@
"metadata": {},
"outputs": [],
"source": [
"seasonal_mean = seasonal_mean.reindex(season=[\"DJF\", \"MAM\", \"JJA\", \"SON\"])\n",
"seasonal_mean = seasonal_mean.sel(season=[\"DJF\", \"MAM\", \"JJA\", \"SON\"])\n",
"seasonal_mean"
]
},
Expand Down Expand Up @@ -731,12 +748,10 @@
"\n",
"(`.plot`)\n",
"\n",
"For more see https://docs.xarray.dev/en/stable/plotting.html and\n",
"https://docs.xarray.dev/en/stable/examples/visualization_gallery.html\n",
"\n",
"We have seen very simple plots earlier. Xarray has some support for visualizing\n",
"We have seen very simple plots earlier. Xarray also lets you easily visualize\n",
"3D and 4D datasets by presenting multiple facets (or panels or subplots) showing\n",
"variations across rows and/or columns.\n"
"variations across rows and/or columns."
]
},
{
Expand Down Expand Up @@ -778,6 +793,13 @@
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more see the [user guide](https://docs.xarray.dev/en/stable/plotting.html), the [gallery](https://docs.xarray.dev/en/stable/examples/visualization_gallery.html), and [the tutorial material](https://tutorial.xarray.dev/fundamentals/04.0_plotting.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -787,7 +809,7 @@
"## Reading and writing files\n",
"\n",
"Xarray supports many disk formats. Below is a small example using netCDF. For\n",
"more see https://docs.xarray.dev/en/stable/user-guide/io.html\n"
"more see the [documentation](https://docs.xarray.dev/en/stable/user-guide/io.html)\n"
]
},
{
Expand Down Expand Up @@ -835,9 +857,11 @@
"metadata": {},
"source": [
"**Tip:** A common use case to read datasets that are a collection of many netCDF\n",
"files. See\n",
"https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets for how\n",
"to handle that\n"
"files. See the [documentation](https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets) for how\n",
"to handle that.\n",
"\n",
"Finally to read other file formats, you might find yourself reading in the data using a different library and then creating a DataArray([docs](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#creating-a-dataarray), [tutorial](https://tutorial.xarray.dev/fundamentals/01.1_creating_data_structures.html)) from scratch. For example, you might use `h5py` to open an HDF5 file and then create a Dataset from that.\n",
"For MATLAB files you might use `scipy.io.loadmat` or `h5py` depending on the version of MATLAB file you're opening and then construct a Dataset."
]
},
{
Expand All @@ -861,13 +885,8 @@
"source": [
"### Pandas: tabular data structures\n",
"\n",
"You can easily convert between xarray and pandas structures:\n",
"https://pandas.pydata.org/\n",
"\n",
"This allows you to conveniently use the extensive pandas ecosystem of packages\n",
"(like seaborn) for your work.\n",
"\n",
"See https://docs.xarray.dev/en/stable/pandas.html\n"
"You can easily [convert](https://docs.xarray.dev/en/stable/pandas.html) between xarray and [pandas](https://pandas.pydata.org/) structures. This allows you to conveniently use the extensive pandas \n",
"ecosystem of packages (like [seaborn](https://seaborn.pydata.org/)) for your work.\n"
]
},
{
Expand Down Expand Up @@ -895,27 +914,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Numpy alternatives\n",
"### Alternative array types\n",
"\n",
"Xarray can wrap other array types! For example:\n",
"This notebook has focused on Numpy arrays. Xarray can wrap [other array](https://docs.xarray.dev/en/stable/user-guide/duckarrays.html) types! For example:\n",
"\n",
"<img src=\"https://docs.dask.org/en/latest/_static/images/dask-horizontal-white.svg\" width=\"25%\">\n",
"<img src=\"https://docs.dask.org/en/stable/_images/dask_horizontal.svg\" width=\"20%\"> [distributed parallel arrays](https://docs.dask.org/en/latest/array.html) & [Xarray user guide on Dask](https://docs.xarray.dev/en/stable/user-guide/dask.html)\n",
"\n",
"**dask** : parallel arrays https://docs.xarray.dev/en/stable/user-guide/dask.html &\n",
"https://docs.dask.org/en/latest/array.html\n",
"\n",
"<img src=\"https://sparse.pydata.org/en/stable/_images/logo.png\" width=\"12%\">\n",
"<img src=\"https://sparse.pydata.org/en/stable/_images/logo.png\" width=\"15%\"> **pydata/sparse** : [sparse arrays](https://sparse.pydata.org)\n",
"\n",
"**pydata/sparse** : sparse arrays https://sparse.pydata.org\n",
"<img src=\"https://raw.githubusercontent.com/cupy/cupy.dev/master/images/cupy_logo.png\" width=\"22%\"> [GPU arrays](https://cupy.dev) & [cupy-xarray](https://cupy-xarray.readthedocs.io/)\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/cupy/cupy.dev/master/images/cupy_logo.png\" width=\"22%\">\n",
"\n",
"**cupy** : GPU arrays https://cupy.dev\n",
"\n",
"<img src=\"https://pint.readthedocs.io/en/stable/_images/logo-full.jpg\" width=\"10%\">\n",
"\n",
"**pint** : unit-aware computations https://pint.readthedocs.io &\n",
"https://github.com/xarray-contrib/pint-xarray\n"
"<img src=\"https://pint.readthedocs.io/en/stable/_images/logo-full.jpg\" width=\"10%\"> **pint** : [unit-aware arrays](https://pint.readthedocs.io) & [pint-xarray](https://github.com/xarray-contrib/pint-xarray)\n"
]
},
{
Expand All @@ -927,7 +937,7 @@
"Dask cuts up NumPy arrays into blocks and parallelizes your analysis code across\n",
"these blocks\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/dask/dask/main/docs/source/images/dask-array.svg\" style=\"width:55%\">\n"
"<img src=\"https://raw.githubusercontent.com/dask/dask/main/docs/source/images/dask-array.svg\" style=\"width:45%\">\n"
]
},
{
Expand Down Expand Up @@ -1045,7 +1055,7 @@
"outputs": [],
"source": [
"# describe cf attributes in dataset\n",
"ds.air.cf.describe()"
"ds.air.cf"
]
},
{
Expand Down Expand Up @@ -1096,23 +1106,27 @@
"- [MetPy](https://unidata.github.io/MetPy/latest/index.html) : tools for working\n",
" with weather data\n",
"\n",
"Check Xarray documentation for even more! https://docs.xarray.dev/en/stable/related-projects.html"
"Check the Xarray [Ecosystem](https://docs.xarray.dev/en/stable/ecosystem.html) page and [this tutorial](https://tutorial.xarray.dev/intermediate/xarray_ecosystem.html) for even more packages and demonstrations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## More information\n",
"## Next\n",
"\n",
"1. Read the [tutorial](https://tutorial.xarray.dev) material and [user guide](https://docs.xarray.dev/en/stable/user-guide/index.html)\n",
"1. See the description of [common terms](https://docs.xarray.dev/en/stable/terminology.html) used in the xarray documentation: \n",
"1. Answers to common questions on \"how to do X\" with Xarray are [here](https://docs.xarray.dev/en/stable/howdoi.html)\n",
"1. Ryan Abernathey has a book on data analysis with a [chapter on Xarray](https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html)\n",
"1. [Project Pythia](https://projectpythia.org/) has [foundational](https://foundations.projectpythia.org/landing-page.html) and more [advanced](https://cookbooks.projectpythia.org/) material on Xarray. Pythia also aggregates other [Python learning resources](https://projectpythia.org/resource-gallery.html).\n",
"1. The [Xarray Github Discussions](https://github.com/pydata/xarray/discussions) and [Pangeo Discourse](https://discourse.pangeo.io/) are good places to ask questions.\n",
"1. Tell your friends! Tweet!\n",
"\n",
"\n",
"## Welcome!\n",
"\n",
"1. A description of common terms used in the xarray documentation:\n",
" https://docs.xarray.dev/en/stable/terminology.html\n",
"1. For information on how to create a DataArray from an existing numpy array:\n",
" https://docs.xarray.dev/en/stable/user-guide/data-structures.html#creating-a-dataarray\n",
"1. Answers to common questions on \"how to do X\" are here:\n",
" https://docs.xarray.dev/en/stable/howdoi.html\n",
"1. Ryan Abernathey has a book on data analysis with a chapter on Xarray:\n",
" https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html\n"
"Xarray is an open-source project and gladly welcomes all kinds of contributions. This could include reporting bugs, discussing new enhancements, contributing code, helping answer user questions, contributing documentation (even small edits like fixing spelling mistakes or rewording to make the text clearer). Welcome!"
]
}
],
Expand Down