Skip to content

Missing Module Dependency - Tables #1252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zhammond147 opened this issue Jul 2, 2021 · 14 comments · Fixed by #1299
Closed

Missing Module Dependency - Tables #1252

zhammond147 opened this issue Jul 2, 2021 · 14 comments · Fixed by #1299
Milestone

Comments

@zhammond147
Copy link

In the pvlib.clearsky.lookup_linke_turbidity() function, you have the following error handling:

try:
import tables
except ImportError:
raise ImportError('The Linke turbidity lookup table requires tables. '
'You can still use clearsky.ineichen if you '
'supply your own turbidities.')

Unfortunately, the tables module has not been included as a dependency of the pvlib package.

@kandersolar
Copy link
Member

It is an optional dependency: https://github.com/pvlib/pvlib-python/blob/master/setup.py#L56
See also https://pvlib-python.readthedocs.io/en/stable/installation.html#compatibility

Should this be explained better in the online docs? Or maybe mention something about pvlib[optional] in that error message?

@cwhanse
Copy link
Member

cwhanse commented Jul 2, 2021

Is there a good reason that tables is still optional? This is a pothole frequently hit by users.

@kahemker
Copy link
Contributor

I ran into many problems yesterday building a new environment yesterday on Windows and they all seemed to revolve around getting tables installed properly. I eventually caved and just installed Ubuntu via WSL and ran through the environment creation on Linux.

I'll eventually figure out how to build the environment on Windows again since I prefer working with PyCharm and it's debugger, but if there is any advice you all can provide for the Windows environment build process that would be great.

@kandersolar
Copy link
Member

@kahemker can you post details about the commands you're running and the errors messages you got? I'm happy to help figure out a solution, but I can't seem to reproduce the issue myself -- I went to some effort in #1287 to make sure tables wouldn't be a barrier to installation (except OSX and py3.9), and setting up a new environment and installing pvlib/master completes successfully on both our Windows CI environments and my Windows computer:

conda create -n pvlib-windows python=3.8
conda activate pvlib-windows
pip install .

It's just grabbing the wheel off PyPI:

Collecting tables
  Downloading tables-3.6.1-2-cp38-cp38-win_amd64.whl (3.1 MB)
     |████████████████████████████████| 3.1 MB 2.2 MB/s

@kahemker
Copy link
Contributor

Wow. Thank you @kanderso-nrel . I think the problem was trying to install the environment on python=3.9 and following these instructions on the documentation for setting up a virtual environment.

The primary errors in building wheel for tables-3.6.1 on Python 3.9 seem to revolve around the lzo decompression libraries. There are a ton of link errors like this:

LINK : fatal error LNK1181: cannot open input file 'lzo2.lib'
LINK : fatal error LNK1181: cannot open input file 'liblzo.lib'
LINK : fatal error LNK1181: cannot open input file 'bzip2.lib'
LINK : fatal error LNK1181: cannot open input file 'blosc.lib'
LINK : fatal error LNK1181: cannot open input file 'hdf5.lib'
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\link.exe' failed with exit code 1181

The error list gets very long and starts to revolve around Microsoft Visual Studio 2019 build tools

I'll stick with Python 3.8 for now. Wish I would have found #1287 yesterday! I did learn a lot about WSL and it seems like a pretty solid option for developing on Linux OS without the overhead of VMs.

@kandersolar
Copy link
Member

Oh right, I should have asked if you were using python 3.9. At the moment tables has only released 3.9 wheels for linux (ctrl-F cp39 here) and not Windows or OS X: PyTables/PyTables#823. So pip install tables on py39 can't install a pre-built binary on any platform except linux and has to attempt building one from source, which is very likely to fail on a normal Windows installation.

For anyone running into an issue installing tables on python 3.9, here are some options to avoid the hassle of trying to build it from source, ordered from most to least recommended:

  1. If you have conda available, you can run conda install pytables before installing pvlib. That way it is already available and pip can skip it instead of resorting to trying to build it from source. Note that the package is called pytables by conda but tables by PyPI. You could also install a complete environment using conda with the environment files listed here: https://github.com/pvlib/pvlib-python/tree/master/ci
  2. Use an older python version like 3.7 or 3.8 instead of 3.9, just because PyPI has tables wheels available for those versions. Easy to do with conda, though I strongly recommend creating a new environment for this instead of replacing an existing python installation.
  3. Windows users could try using Cristoph Gohlke's wheels: https://www.lfd.uci.edu/~gohlke/pythonlibs/#pytables
  4. Use WSL like @kahemker did, or even a VM (see this semi-relevant quote). In case it's not obvious: this option is significantly more complex than the previous options and not something you should attempt without having done some reading to know what you're doing.

@mikofski
Copy link
Member

  • Maybe post this to the google group?
  • And/Or we should start either a discussion and pin it?
  • And/Or create a new "installation troubleshooting" wiki section?
  • or ditto for sphinx docs

I think we're likely to run into this many, many times. probably should do a stack overflow search for pvlib + tables or install. One constant source of confusion is that in conda it's called "pytables" but in pypi it's just "tables"

@wholmgren
Copy link
Member

wholmgren commented Aug 26, 2021

All great ideas. For the sphinx docs, I think adding a short note with a link to this would be fine. It would make sense to do that before 0.9 is tagged.

At the risk of issue scope creep, it's also worth considering if we should use a different format since this seems likely to repeat with python 3.10.

@mikofski
Copy link
Member

mikofski commented Aug 27, 2021

I prefer h5py, it's a lot easier to use, more stable, better maintained imho, reuses the numpy api for structured arrays, and it's from the actual makers of the HDF5. But unfortunately, pandas chose to use tables which was unfortunate imo. It wouldn't be too hard to switch the raw linke turbidity data to use h5py. Once extracted, the numpy API makes it super easy to create a data frame by just passing the structured array directly to pd.DataFrame(). Another advantage of h5py over pytables is that the h5 archive could be read from MATLAB, R, or any other codebase, not just limited to python.

Another option is parquet, which is built into pandas and is quite popular.

And of course, we're already using netcdf4, which seems like the obvious choice, but I don't believe there's a pandas read_nc function either. And parsing and usage is a lot more difficult imo than hdf5

@adriesse
Copy link
Member

I used to use pandas.to_hdf() all the time, and when I'm lazy I still do for short-term storage. For a brief time I thought h5py was the way to go for better sharing, but it's really pretty low level (e.g. you have to encode and parse timestamps and transpose 2d structures in matlab). Currently I'm a big fan of netcdf4 made easy by xarray.

@kandersolar
Copy link
Member

Here is some basic exploration comparing packages for reading the TL data stored as .h5 and .nc files: https://gist.github.com/kanderso-nrel/09c320d08ef8daac80f3302e4b11b1ac

To summarize:

  • tables, h5py, netcdf4, and xarray all fetch the TL data in .h5 and .nc form plenty fast (<5ms) with nearly trivial code (for this simple TL table, anyway)
  • .h5 and .nc files are more or less the same size (<0.1% difference)

I did not try parquet. Does it support lazy loading/indexing like h5 and nc? I think pd.read_parquet requires pyarrow or fastparquet, so we'd still need a dependency for this.

@mikofski
Copy link
Member

OK, I just assumed that we were using tables because it works with pandas, but it turns out that the LinkeTurbidity.h5 file is completely sane and pvlib doesn't use pandas at all with it, so there is ZERO reason to use tables here. I've just tested and it works absolutely fine with h5py. Thanks @kanderso-nrel for testing it with netcdf4 and xarray where it also works totally fine out of the box.

Since we already import netcdf4 and I believe we're about to start using xarray, I'm in favor of using one of those two. I'm slightly more in favor of using xarray, and wondering if we can replace netcdf4 everywhere with xarray, even tho it comes with extra latency, bc then it's just one package and it plays well with pandas. I'd also be happy to use h5py. Just not tables.

Let's remove the tables dependency before we ship v0.9 so we don't have to make further changes. Tables and PyTables is an unnecessary headache in my book.

@mikofski
Copy link
Member

mikofski commented Aug 27, 2021

BTW: I though TL data was originally available for download as an .nc file, am I wrong? I guess it's not available from soda pro anymore?

Anyone know are we using the 2003 or 2010 values? what's the difference?

@wholmgren
Copy link
Member

Thanks @kanderso-nrel for the careful comparisons!

I think we should switch to h5py before releasing 0.9. I don't see any value in adding the netcdf4 layer on top of the hdf5 file for this data set. I also don't see any reason use xarray for this simple file and read operation.

Two things that people probably already know but I feel like are not really addressed in some of the discussion above:

  1. netcdf4 uses hdf5 -- a netcdf4 file is a hdf5 file organized in a particular way plus extra metadata
  2. pip install xarray[io] brings in h5py along with netcdf4 and h5netcdf4, see details below

Looks like we only went with pytables because that's what pandas used and we didn't put much more thought into it: #437 (comment).

$ pip install xarray[io]
$ pip list
Package            Version
------------------ -------------------
affine             2.3.0
appdirs            1.4.4
asciitree          0.3.3
attrs              21.2.0
beautifulsoup4     4.9.3
certifi            2021.5.30
cffi               1.14.6
cfgrib             0.9.9.0
cftime             1.5.0
charset-normalizer 2.0.4
click              8.0.1
click-plugins      1.1.1
cligj              0.7.2
docopt             0.6.2
eccodes            1.3.3
fasteners          0.16.3
findlibs           0.0.2
fsspec             2021.7.0
h5netcdf           0.11.0
h5py               3.4.0
idna               3.2
Jinja2             3.0.1
MarkupSafe         2.0.1
netCDF4            1.5.7
numcodecs          0.9.0
numpy              1.21.2
packaging          21.0
pandas             1.3.2
pip                21.2.4
pooch              1.5.1
pycparser          2.20
Pydap              3.2.2
pyparsing          2.4.7
python-dateutil    2.8.2
pytz               2021.1
rasterio           1.2.6
requests           2.26.0
scipy              1.7.1
setuptools         52.0.0.post20210125
six                1.16.0
snuggs             1.4.7
soupsieve          2.2.1
urllib3            1.26.6
WebOb              1.8.7
wheel              0.37.0
xarray             0.19.0
zarr               2.9.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants