Skip to content

Commit a17d892

Browse files
committed
Consolidate and edit IEA EWEB docs
1 parent 47b9b2b commit a17d892

File tree

3 files changed

+121
-122
lines changed

3 files changed

+121
-122
lines changed

doc/api/data-sources.rst

Lines changed: 119 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,72 +1,141 @@
11
Tools for specific data sources
22
*******************************
33

4-
IEA World Energy Balances
5-
=========================
4+
.. _tools-iea:
65

7-
.. currentmodule:: message_ix_models.tools.iea_web
6+
International Energy Agency (IEA) (:mod:`.tools.iea`)
7+
=====================================================
88

9-
.. automodule:: message_ix_models.tools.iea_web
10-
:members:
9+
The IEA publishes many kinds of data.
10+
Each distinct data source is handled by a separate submodule of :mod:`message_ix_models.tools.iea`.
1111

12-
The raw data are in CSV or compressed CSV format.
13-
They have file names like:
12+
Documentation for all module contents:
1413

15-
- :file:`cac5fa90-en.zip` —the complete, extended energy balances, ZIP compressed, containing a single file with a name like :file:`WBIG_2021-2021-1-EN-20211119T100005.csv`.
14+
.. currentmodule:: message_ix_models.tools
1615

17-
- :file:`WBAL_12052022124930839.csv` —a subset or ‘highlights’
16+
.. autosummary::
17+
:toctree: _autosummary
18+
:template: autosummary-module.rst
19+
:recursive:
1820

19-
The data have the following structure:
21+
iea
2022

21-
=========== ======================
22-
Column name Example value
23-
=========== ======================
24-
UNIT [1]_ KTOE
25-
Unit ktoe
26-
COUNTRY WLD
27-
Country World
28-
PRODUCT COAL
29-
Product Coal and coal products
30-
FLOW INDPROD
31-
Flow Production
32-
TIME 2012
33-
Time 2012
34-
Value 1234.5678
35-
Flag Codes M
36-
Flags Missing value; data cannot exist
37-
=========== ======================
23+
.. _tools-iea-web:
3824

39-
.. [1] the column is sometimes labelled "MEASURE", but the contents appear to be the same.
25+
(Extended) World Energy Balances (:mod:`.tools.iea.web`)
26+
--------------------------------------------------------
4027

41-
Code lists
42-
----------
43-
The following files, in :file:`message_ix_models/data/iea/`, contain code lists extracted from the paired columns of the raw data.
44-
The (longer, human-readable) names are not returned by :func:`.load_data`; only the (shorter) code IDs.
28+
.. contents::
29+
:local:
30+
:backlinks: none
4531

46-
These can be used with other package utilities:
32+
.. note:: These data are **proprietary** and require a paid subscription.
4733

48-
.. code-block:: python
34+
The approach to handling proprietary data is the same as in :mod:`.project.advance` and :mod:`.project.ssp`:
35+
36+
- Copies of the data are stored in the (private) :mod:`message_data` repository using Git LFS.
37+
This respository is accessible only to users who have a license for the data.
38+
- :mod:`message_ix_models` contains only a ‘fuzzed’ version of the data (same structure, random values) for testing purposes.
39+
- Non-IIASA users must obtain their own license to access and use the data; obtain the data themselves; and place it on the system where they use :mod:`message_ix_models`.
40+
41+
The module :mod:`message_ix_models.tools.iea.web` attempts to detect and support both the providers/formats described below.
42+
The code supports using data from any of the above locations and formats, in multiple ways:
43+
44+
- Use :func:`.tools.iea.web.load_data` to load data as :class:`pandas.DataFrame` and apply further pandas processing.
45+
- Use :class:`.IEA_EWEB` via :func:`.tools.exo_data.prepare_computer` to use the data in :mod:`genno` structured calculations.
46+
47+
The **documentation** for the `2023 edition <https://iea.blob.core.windows.net/assets/0acb1453-1221-421b-9131-632ce71a4c1a/WORLDBAL_Documentation.pdf>`__ of the IEA source/format is publicly available.
4948

50-
from message_ix_models.util import as_codes, load_package_data
49+
Structure
50+
~~~~~~~~~
5151

52-
# a list of sdmx.model.Code objects
53-
cl = as_codes(load_package_data("iea", "product.yaml"))
52+
The data have the following conceptual dimensions, each enumerated by a different list of codes:
5453

55-
# …etc.
54+
- ``FLOW`, ``PRODUCT``: for both of these, the lists of codes appearing in the data are the same from 2021 and 2023 inclusive.
55+
- ``COUNTRY``: The data provided by IEA directly contain codes that are all caps, abbreviated country names, for instance "DOMINICANR".
56+
The data provided by the OECD contain ISO 3166-1 alpha-3 codes, for instance "DOM".
57+
In both cases, there are additional labels denoting country groupings; these are defined in the documentation linked above.
5658

59+
Changes visible in these lists include:
5760

58-
.. literalinclude:: ../../message_ix_models/data/iea/country.yaml
59-
:language: yaml
60-
:caption: COUNTRY / node (:file:`country.yaml`)
61+
- 2022 → 2023:
6162

62-
.. literalinclude:: ../../message_ix_models/data/iea/product.yaml
63-
:language: yaml
64-
:caption: PRODUCT / commodity (:file:`product.yaml`)
63+
- New codes: ASEAN, BFA, GREENLAND, MALI, MRT, PSE, TCD.
64+
- Removed: MASEAN.
6565

66-
.. literalinclude:: ../../message_ix_models/data/iea/flag-codes.yaml
67-
:language: yaml
68-
:caption: FLAG (:file:`flag-codes.yaml`)
66+
- 2021 → 2022:
67+
68+
- New codes: GNQ, MDG, MKD, RWA, SWZ, UGA.
69+
- Removed: EQGUINEA, GREENLAND, MALI, MBURKINAFA, MCHAD, MMADAGASCA, MMAURITANI, MPALESTINE, MRWANDA, MUGANDA, NORTHMACED.
70+
71+
- TIME: always a year.
72+
- MEASURE: unit of measurement, either "TJ" or "ktoe".
73+
74+
:mod:`message_ix_models` is packaged with SDMX structure data (stored in :file:`message_ix_models/data/sdmx/`) comprising code lists extracted from the raw data for the COUNTRY, FLOW, and PRODUCT dimensions.
75+
These can be used with other package utilities, for instance:
76+
77+
.. code-block:: python
6978
70-
.. literalinclude:: ../../message_ix_models/data/iea/flow.yaml
71-
:language: yaml
72-
:caption: FLOW (:file:`flow.yaml`)
79+
>>> from message_ix_models.util.sdmx import read
80+
81+
# Read a code list from file: codes used in the
82+
# 2022 edition data from the OECD provider
83+
>>> cl = read("IEA:PRODUCT_OECD(2022)")
84+
85+
# Show some of its elements
86+
>>> print("\n".join(sorted(cl.items[:5])))
87+
ADDITIVE
88+
ANTCOAL
89+
AVGAS
90+
BIODIESEL
91+
BIOGASES
92+
93+
The documentation linked above has full descriptions of each code.
94+
95+
IEA provider/format
96+
~~~~~~~~~~~~~~~~~~~
97+
98+
From 2023 (or earlier), the data are provided directly on the IEA website at https://www.iea.org/data-and-statistics/data-product/world-energy-balances.
99+
These data are available in two formats; ‘IVT’ or “Beyond 20/20” format (not supported by this module) or fixed-width text files.
100+
The latter are characterized by:
101+
102+
- Multiple ZIP archives with names like :file:`WBIG[12].zip`, each containing a portion of the data and typically 110–130 MiB compressed size
103+
- …each containing a single, fixed-with TXT file with a name like :file:`WORLDBIG[12].TXT`, typically 3–4 GiB uncompressed,
104+
- …with no column headers, but data resembling::
105+
106+
WORLD HARDCOAL 1960 INDPROD KTOE ..
107+
108+
…that appear to correspond to, respectively, the COUNTRY, PRODUCT, TIME, FLOW, and MEASURE dimensions and "Value" column of the above data, respectively.
109+
110+
OECD provider/format
111+
~~~~~~~~~~~~~~~~~~~~
112+
113+
Up until 2023, the EWEB data were available from the OECD iLibrary with DOI `10.1787/enestats-data-en <https://doi.org/10.1787/enestats-data-en>`__.
114+
These files were characterized by:
115+
116+
- Single ZIP archives with names like :file:`cac5fa90-en.zip`; typically ~850 MiB compressed size,
117+
- …containing a single CSV file with a name like :file:`WBIG_2022-2022-1-EN-20230406T100006.csv`, typically >20 GiB uncompressed,
118+
- …with a particular list of columns like: "MEASURE", "Unit", "COUNTRY", "Country", "PRODUCT", "Product", "FLOW", "Flow", "TIME", "Time", "Value", "Flag Codes", "Flags",
119+
- …with contents that duplicated code IDs—for instance, in the "FLOW" column—with human-readable labels—for instance in the "Flow" column:
120+
121+
============ ===
122+
Column name Example value
123+
============ ===
124+
MEASURE [1]_ KTOE
125+
Unit ktoe
126+
COUNTRY WLD
127+
Country World
128+
PRODUCT COAL
129+
Product Coal and coal products
130+
FLOW INDPROD
131+
Flow Production
132+
TIME 2012
133+
Time 2012
134+
Value 1234.5678
135+
Flag Codes M
136+
Flags Missing value; data cannot exist
137+
============ ===
138+
139+
.. [1] the column is sometimes labelled "UNIT", but the contents appear to be the same.
140+
141+
This source is discontinued and will not publish subsequent editions of the data.

doc/api/tools.rst

Lines changed: 0 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -98,78 +98,6 @@ IAMC data structures (:mod:`.tools.iamc`)
9898
.. automodule:: message_ix_models.tools.iamc
9999
:members:
100100

101-
.. _tools-iea:
102-
103-
International Energy Agency (IEA) data and structure (:mod:`.tools.iea`)
104-
========================================================================
105-
106-
.. currentmodule:: message_ix_models.tools
107-
108-
Documentation for all module contents:
109-
110-
.. autosummary::
111-
:toctree: _autosummary
112-
:template: autosummary-module.rst
113-
:recursive:
114-
115-
iea
116-
117-
(Extended) World Energy Balances (:mod:`.tools.iea.web`)
118-
--------------------------------------------------------
119-
120-
These data are proprietary and require a paid subscription.
121-
122-
Up until 2023, the EWEB data were available from the OECD iLibrary with DOI `10.1787/enestats-data-en <https://doi.org/10.1787/enestats-data-en>`__.
123-
These files were characterized by:
124-
125-
- Single ZIP archives with names like :file:`cac5fa90-en.zip`; typically ~850 MiB compressed size,
126-
- …containing a single CSV file with a name like :file:`WBIG_2022-2022-1-EN-20230406T100006.csv`, typically >20 GiB uncompressed,
127-
- …with a particular list of columns like: "MEASURE", "Unit", "COUNTRY", "Country", "PRODUCT", "Product", "FLOW", "Flow", "TIME", "Time", "Value", "Flag Codes", "Flags",
128-
- …with contents that duplicated code IDs—for instance, in the "FLOW" column—with human-readable labels—for instance in the "Flow" column.
129-
130-
This source is now discontinued.
131-
132-
From 2023 (or earlier), the data are also available directly from the IEA website at https://www.iea.org/data-and-statistics/data-product/world-energy-balances.
133-
This source is available in two formats; ‘IVT’ or “Beyond 20/20” format (not supported by this module) or fixed-width text files.
134-
The latter are characterized by:
135-
136-
- Multiple ZIP archives with names like :file:`WBIG[12].zip`, each containing a portion of the data and typically 110–130 MiB compressed size
137-
- …each containing a single, fixed-with TXT file with a name like :file:`WORLDBIG[12].TXT`, typically 3–4 GiB uncompressed,
138-
- …with no column headers, but data resembling::
139-
140-
WORLD HARDCOAL 1960 INDPROD KTOE ..
141-
142-
…that appear to correspond to, respectively, the COUNTRY, PRODUCT, TIME, FLOW, and MEASURE dimensions and "Value" column of the above data, respectively.
143-
144-
This source comes with documentation (`2023 edition <https://iea.blob.core.windows.net/assets/0acb1453-1221-421b-9131-632ce71a4c1a/WORLDBAL_Documentation.pdf>`__) that, unlike the data, *is* publicly accessible.
145-
146-
The module :mod:`message_ix_models.tools.iea.web` attempts to detect and support both formats.
147-
The approach to handling proprietary data is the same as in :mod:`.project.advance` and :mod:`.project.ssp`:
148-
149-
- A copy of the data are stored in :mod:`message_data`.
150-
Non-IIASA users must obtain their own license to access and use the data.
151-
- :mod:`message_ix_models` contains only a ‘fuzzed’ version of the data (same structure, random values) for testing purposes.
152-
153-
The data include the following dimensions:
154-
155-
- FLOW, PRODUCT: the lists of codes appearing in the data are identical between 2021 and 2023 inclusive.
156-
- COUNTRY: The data directly from IEA contain codes that are all caps, abbreviated country names, for instance "DOMINICANR".
157-
The data from the OECD source contain ISO 3166-1 alpha-3 codes, for instance "DOM".
158-
In both cases, there are additional labels denoting country groupings; these are defined in the documentation linked above.
159-
160-
Known changes:
161-
162-
- 2022 → 2023:
163-
164-
- New codes: ASEAN, BFA, GREENLAND, MALI, MRT, PSE, TCD,
165-
- Removed: MASEAN
166-
167-
- 2021 → 2022:
168-
169-
- New codes: GNQ, MDG, MKD, RWA, SWZ, UGA.
170-
- Removed: EQGUINEA, GREENLAND, MALI, MBURKINAFA, MCHAD, MMADAGASCA, MMAURITANI, MPALESTINE, MRWANDA, MUGANDA, NORTHMACED.
171-
172-
173101
.. _tools-wb:
174102

175103
World Bank structures (:mod:`.tools.wb`)

message_ix_models/tools/iea/web.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ class IEA_EWEB(ExoDataSource):
8383
extra_dims = ("product", "flow")
8484

8585
def __init__(self, source, source_kw):
86+
"""Initialize the data source."""
8687
if source != self.id:
8788
raise ValueError(source)
8889

@@ -103,6 +104,7 @@ def __init__(self, source, source_kw):
103104
raise ValueError(_kw)
104105

105106
def __call__(self):
107+
"""Load and process the data."""
106108
# - Load the data.
107109
# - Convert to pd.Series, then genno.Quantity.
108110
# - Map dimensions.

0 commit comments

Comments
 (0)