-
Notifications
You must be signed in to change notification settings - Fork 1.1k
lookup_linke_turbidity is slow #368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I ran line_profiler on a year of data at 1 minute resolution. The bottlenecks appear to be
Not sure what to do about loading the file, but the leap year list comprehension can be replaced with the pandas pvlib 0.5 time = pd.DatetimeIndex(start='20170101', end='20171231', freq='1min')
latitude = 32.2
longitude = -110.9
%load_ext line_profiler
%lprun -f pvlib.clearsky.lookup_linke_turbidity pvlib.clearsky.lookup_linke_turbidity(time, latitude, longitude)
proposed changes
|
Turns out that the Anyone interested in taking this on? |
I must admit that my first suggestion is actually more of an indication of what I think the default behaviour should be, and I think the comment in the code about "might be slow" motivated me to include the suggestion here. The second suggestion improves the speed for the non-interpolated mode--could you profile that too? I suspect the bottleneck there is the |
Your With all of the proposed changes, the difference in speed between |
If it helps, we can speed up the leap year calculation by a factor of 10 by writing our own code that avoids the import calendar
import numpy as np
import pandas as pd
from datetime import datetime
from timeit import timeit
def leap(year):
return (np.mod(year, 4) == 0) & ((np.mod(year, 100) != 0) | (np.mod(year, 400) == 0))
idx = pd.DatetimeIndex(start=datetime(1999, 1, 1, 0, 0, 0),
end=datetime(2005, 12, 31, 0, 0, 0), freq='1H')
isleap = [calendar.isleap(t.year) for t in idx]
print(timeit(stmt="isleap = [calendar.isleap(t.year) for t in idx]",
number=1000,
globals={'idx': idx, 'calendar': calendar}))
yrs = np.asarray(idx.year)
alt_isleap = leap(yrs)
print(timeit(stmt="alt_isleap = leap(np.asarray(idx.year))",
number=1000,
globals={'idx': idx, 'np': np, 'leap': leap})) |
Thanks @cwhanse. 3 ticks for code blocks. See here for github code formatting. I incorporated your suggestion into the I realized that the interpolation code is much worse for the case when the time series contains both leap years and non-leap years. 10 years of 1 minute data took ~180 seconds to complete due to the horrible performance of the list comprehension within the
|
This was closed by #369. API changes can be discussed in a new issue. |
When using time series with millions of entries,
lookup_linke_turbidity
becomes a bottleneck.My first suggestion is to set
interp_turbidity=False
by default.My second suggestion is to to replace the fragment:
with something like this:
The interpolation could be sped up by expanding
g
to 366 values and then indexing by dayofyear.PS. I know this would make a perfect issue for my first pull request, but I'm afraid I'm not ready for that step! :)
The text was updated successfully, but these errors were encountered: