-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Making the irradiation and weather DataFrame of the ModelChain more flexible #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the nice start, @uvchik. I like the ideas. A few things to think about... Should dni_determination_method be a ModelChain argument rather than a prepare_inputs (or similar) argument? I think this change would make it more consistent with the existing API. The ModelChain init does have a transposition_model argument, though it's not actually used at this time. Should we get rid of the irradiance kwarg all together and insist that all of the data be put into weather? Or perhaps we should allow either, but set self.irradiance = self.weather[['dni', 'ghi', 'dhi']] if only weather is supplied. I think you'll need to check for Is your check_df function only going to check for keys/columns or is it going to consider nans as well? If it's only going to check for keys/columns then I don't know if separate function adds much over a inline try/except or set logic. In [3]: df = pd.DataFrame(columns=['ghi', 'dni'])
In [4]: columns = set(df.columns)
In [5]: {'ghi', 'dni'} <= columns and not {'dhi'} <= columns
Out[5]: True I'll try to get a fix for those forecast module errors up in the next few days so that they don't clutter your test reports. |
I just forgot the def run_model(self, times, irradiance=None, weather=None, **kwargs):
self.prepare_inputs(times, irradiance, weather, **kwargs)
def prepare_inputs(self, times, irradiance=None, weather=None, **kwargs):
self.prepare_irradiance(**kwargs)
def prepare_irradiance(self, dni_determination_method='dirint', **kwargs):
if dni_determination_method is something do something...
# The call:
my_chain.run_model(times, irradiance_df, dni_determination_method='erbs') We could use this structure for default values in other methods/functions, too.
For me that would be okay but I think we are not allowed to change the API in minor releases so we have to wait until >>> import pandas as pd
>>> a = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
>>> a
A B
0 1 4
1 2 5
2 3 6
>>> b = a
>>> b['C']=[7,8,9]
>>> a
A B C
0 1 4 7
1 2 5 8
2 3 6 9 You are right, I will use an inline method to check the columns. I like your example with the |
I'm hesitant to hide this model choice in **kwargs. To keep ModelChain as explicit as possible, I prefer something more like these two options:
Regarding the misnamed prepare_inputs, I've previously considered renaming prepare_inputs to prepare_intermediates. I've also considered removing the assumption of clear sky if irradiance=None. I didn't follow through only for lack of time, but I'd like to make these changes in 0.5. This may or may not change your opinion on how the proposed functionality should be implemented.
True. Whatever we decide to do, let's make it non-breaking for now and reconsider the breaking change later on.
My only concern was that people might be confused when they see non-irradiance data in the irradiance attribute. It's probably not a big deal. |
That changes a lot. What about defining an irradiation (or weather) class. Then we can deal with all that attributes defining an irradiation (or weather) object and the ModelChain will get its dni, dhi and ghi regardless how they have been created. The best thing is, that it is still possible to pass a weather DataFrame to the run_model because This might be a way to not overload the ModelChain but keep all functionalities like clearsky and my prepare_irradiation stuff. |
A weather class might be useful and I agree that it should be optional. Keep in mind that you'll need to support item lookup in addition to attribute lookup. The vast majority of pvlib uses item lookup, including ModelChain. I think it should be doable if you define one or more of the get* methods. I'm not sure how long it would take to develop this class, so I'd also be ok with following one of the approaches outlined above for now and then changing it later. Documentation and rigorous testing always takes a lot longer than I expect. My main concern is that the default behavior remains that an error is raised if a required irradiance value does not exist. |
Okay, so let us keep it simple by now and let us conclude.
Please point out with which statement you agree/disagree. |
1-3: I agree. 4: I don't particularly like the name, but I'm not opposed to it. 5: I agree. We should make this clear in the documentation, though. |
|
I would raise a word of caution about calculating the missing component from the two that are given among GHI, DNI and DHI. In real-world data they rarely add up perfectly. If you have only one of the three, ignorance is bliss and you can model the other two so they do add up perfectly. But if you have two, then the third one may end up being negative or some other unreasonable value. Also, dividing by cos(zenith) where zenith -> 90 is a sensitive operation. |
@wholmgren Thanks for you feedback. I soon will provide my second approach. Could take some weeks because I have some deadlines within the next three weeks. @adriesse Thank you for bringing up this point. I almost forgot about this point but it was my concern when I started issue 233. I often got global and diffuse irradiation measured on a horizontal surface and had to calculate DNI using them. I got negative values for I think the pvlib would be a could place to talk about these problems and offer tools to deal with it. What can we do to avoid this? Providing an average zenith? Providing a "middle" zenith for timestamp+/-30 minutes? What if it is caused by a measurement error, shall we cut small angles above the horizon? Of course we could leave it to the users but in my opinion the pvlib is a good place to discuss these problems and offer solutions. |
There are many tricky aspects to this radiation processing, so indeed I think that the users of these tools need to be knowledgable. It's ok with me to discuss here, but probably you want to have a larger group. For your specific example, you can keep DNI values somewhat sane by not allowing them to be greater than a value obtained from a clear-sky model. Calculating the sun angles for the midpoint of the averaging interval is common practice (I hope). For long time steps like one hour, you can split the intervals that includes sunrise/set into two parts (one before and one after the event). |
Ø Calculating the angles for the midpoint of the averaging interval is common practice (I hope). It ought to be common practice. SAM does it, as does PVsyst I think, but other applications? We ought to build this capability into pvlib I think, as part of model chain – a kwarg switch perhaps? |
I agree with all of the above, though it's up to @uvchik to decide the scope of this particular addition. I have slight preference for finishing this small, non-breaking, non-default-changing pull request, and then moving on to address the time resolutions in a separate issue and pull request. I am ok with addressing it all at once, too. I will request that we add detailed documentation with examples when we address the time resolution issue, though. |
I would suggest the following steps.
|
I like the plan. The code mostly looks good, but let me know when you're ready for a few line by line comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I made a handful of minor comments on the diff.
Please add a summary of the changes and your name or username to the next whatsnew file (you might need to create a new one).
I agree with you regarding appveyor.
@@ -316,6 +317,10 @@ def __init__(self, system, location, | |||
self.losses_model = losses_model | |||
self.orientation_strategy = orientation_strategy | |||
|
|||
self.weather = pd.DataFrame() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this a DataFrame instead of None (like the values below)? I'm not sure if one way is better than the other, but consistency might be more important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It saves one if-clause later on. I have check weather the DataFrame exists and if the specific column exists. Creating an empty DataFrame I do not need the first check.
Commit 1bff19f shows the consequences of this change.
Determine the missing irradiation columns. Only two of the following | ||
data columns (dni, ghi, dhi) are needed to calculate the missing data. | ||
|
||
This function is not save at the moment. Results can be too high or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save --> safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
Parameters | ||
---------- | ||
times : datetime index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas.DataFrame | ||
Containing the missing column of the data sets passed with the | ||
weather DataFrame. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unneeded extra line here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.times = times | ||
self.solar_position = self.location.get_solarposition(self.times) | ||
icolumns = set(self.weather.columns) | ||
wrn_txt = "This function is not save at the moment.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save --> safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better to write as
wrn_text = ("text" +
"text" +
...)
instead of using +=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
icolumns = set(self.weather.columns) | ||
wrn_txt = "This function is not save at the moment.\n" | ||
wrn_txt += "Results can be too high or negative.\n" | ||
wrn_txt += "Help to improve this function on github." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a space or \n after the .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The following part could be removed together with the irradiance | ||
# parameter at version v0.5 or v0.6. | ||
# **** Begin **** | ||
wrn_txt = "The irradiance parameter will be removed soon.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same += comment as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.times = times | ||
# Add columns that does not exist and overwrite existing columns | ||
# Maybe there is a more elegant way to do this. Any ideas? | ||
if weather is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be best for pvlib to only use a simple assignment. Users can do this in their own code before passing the parameter or making the assignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if irradiance is not None: | ||
warnings.warn(wrn_txt, FutureWarning) | ||
for column in irradiance.columns: | ||
self.weather[column] = irradiance.pop(column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you pop the column? I think this would mutate the input data, which is usually not a nice thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how it works now. This could be an example, but it is not working due to undefined input data. A working example would be too overloaded in my opinion. I could add the following code to the docstring using skip_doctest or just leave it as an information here. It just shows that you have to pass the weather data and the index only once. Examples
--------
>>> from pvlib.modelchain import ModelChain
>>> # my_weather containing 'dhi' and 'ghi'.
>>> mc = ModelChain(my_system, my_location)
>>> mc.complete_irradiance(my_datetime_index, my_weather)
>>> mc.run_model()
>>> # my_weather containing 'dhi', 'ghi' and 'dni'.
>>> mc = ModelChain(my_system, my_location)
>>> mc.run_model(my_datetime_index, my_weather) |
Last comment (hopefully). Using pandas.DataFrame, I do not understand why we pass the DatetimeIndex and the columns separately. In my applications it looks like this: mc = ModelChain(my_system, my_location)
mc.run_model(my_weather.index, my_weather) I think it is good to force the people to pass weather data with a working DatetimeIndex. If we do so we do not have to pass them separately. But this might be a new discussion. |
I think adding your simple example to the docstring would be useful. Just let me know if I should merge it now or wait for the change. Passing the times separately was needed to support the default clear sky and weather data. I am open to changing these behaviors in a separate issue/PR. |
|
||
* Adding a complete_irradiance method to the ModelChain to make it possible to calculate missing irradiation data from the existing columns [beta] (:issue:`239`) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding and fixing the typo in the ModelChain repr. You could add a note for that here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not clean to fix it within this PR but I was too lazy to do it separately. So I added this fix linking it to its commit. I guess there is no commit-role in your sphinx documentation similar to your issue-role, so it will not be a real link afterwards.
I hope everything is fine now 😄 If you find typos etc. you can fix them you directly or let me know. Otherwise feel free to merge. |
Great, thanks so much! I'll check out the next PR soon. |
The merge confused me but I think now everything is in order again (-> #250). |
This is the PR for #233. It is just a proposal. I am still not sure if this approach makes sense.
There are two changes:
ghi
.The new
tools.check_df
function (does not exist so far) should check if some columns in the DataFrame exist and some do not and returns True or False. If this function doesn't make sense we could change the if condition to:The prepare_irradiance method makes it more convenient to pass weather data to the ModelChain but it blows up the code. So what do you think?
This is just a proposal to get some feedback some work is still missing:
tools.check_df
function (@gnn Could you write an efficient way to do it?) or adept if clause@gnn could you have a look from the programming point of view?