Skip to content

Making the irradiation and weather DataFrame of the ModelChain more flexible #239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Oct 18, 2016

Conversation

uvchik
Copy link
Contributor

@uvchik uvchik commented Sep 1, 2016

This is the PR for #233. It is just a proposal. I am still not sure if this approach makes sense.

There are two changes:

  • The weather DataFrame can contain the irradiation columns and will be used instead of the irradiance DataFrame if no irradiance DataFrame is passed to the method.
  • The irradiance (or weather) DataFrame does not have to contain all types of irradiance (ghi, dni, dhi) but just two of them or only ghi.

The new tools.check_df function (does not exist so far) should check if some columns in the DataFrame exist and some do not and returns True or False. If this function doesn't make sense we could change the if condition to:

# Using the function
if tools.check_df(self.irradiance, ['col1', 'col2'], not_exist=['col3'])

# Using the normal if condition
if (self.irradiance['col1'] is None and 
    self.irradiance['col2'] is None and
    self.irradiance['col3'] is not None):

The prepare_irradiance method makes it more convenient to pass weather data to the ModelChain but it blows up the code. So what do you think?

This is just a proposal to get some feedback some work is still missing:

  • adapt documentation
  • add new docstring / adept docstring of existing methods
  • write / adapt tests (my problem is that I have already 111 errors with the actual master branch)
  • write tools.check_df function (@gnn Could you write an efficient way to do it?) or adept if clause
  • adapt "whatsnew"-file

@gnn could you have a look from the programming point of view?

@wholmgren
Copy link
Member

Thanks for the nice start, @uvchik. I like the ideas. A few things to think about...

Should dni_determination_method be a ModelChain argument rather than a prepare_inputs (or similar) argument? I think this change would make it more consistent with the existing API. The ModelChain init does have a transposition_model argument, though it's not actually used at this time.

Should we get rid of the irradiance kwarg all together and insist that all of the data be put into weather? Or perhaps we should allow either, but set self.irradiance = self.weather[['dni', 'ghi', 'dhi']] if only weather is supplied.

I think you'll need to check for is None rather than relying on the truthiness of your array-like DataFrames.

Is your check_df function only going to check for keys/columns or is it going to consider nans as well? If it's only going to check for keys/columns then I don't know if separate function adds much over a inline try/except or set logic.

In [3]: df = pd.DataFrame(columns=['ghi', 'dni'])

In [4]: columns = set(df.columns)

In [5]: {'ghi', 'dni'} <= columns and not {'dhi'} <= columns
Out[5]: True

I'll try to get a fix for those forecast module errors up in the next few days so that they don't clutter your test reports.

@uvchik
Copy link
Contributor Author

uvchik commented Sep 2, 2016

Should dni_determination_method be a ModelChain argument rather than a prepare_inputs (or similar) argument?

I just forgot the **kwargs parameter in the run_model method. I think this could be a good compromise. We can define our default stuff for normal users but allow advanced users to pass additional arguments directly with the run_model call.

def run_model(self, times, irradiance=None, weather=None, **kwargs):
    self.prepare_inputs(times, irradiance, weather, **kwargs)

def prepare_inputs(self, times, irradiance=None, weather=None, **kwargs):
    self.prepare_irradiance(**kwargs)

def prepare_irradiance(self, dni_determination_method='dirint', **kwargs):
    if  dni_determination_method is something do something...

# The call:
my_chain.run_model(times, irradiance_df, dni_determination_method='erbs') 

We could use this structure for default values in other methods/functions, too.

Should we get rid of the irradiance kwarg all together and insist that all of the data be put into weather?

For me that would be okay but I think we are not allowed to change the API in minor releases so we have to wait until 0.5. Defining self.irradiance = self.weather is no problem because it just sets a pointer so we will have one DataFrame with two names. In the long run one DataFrame would be enough for me.

>>> import pandas as pd
>>> a = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
>>> a
   A  B
0  1  4
1  2  5
2  3  6
>>> b = a
>>> b['C']=[7,8,9]
>>> a
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

You are right, I will use an inline method to check the columns. I like your example with the set.

@wholmgren
Copy link
Member

I'm hesitant to hide this model choice in **kwargs. To keep ModelChain as explicit as possible, I prefer something more like these two options:

  1. Add a dni_determination_method keyword argument to the ModelChain constructor. The default value could be 'required' or 'supplied' and other values could be as you've described above. If dni is required and not supplied then an exception would be raised somewhere. If dni is supplied but dni_determination_method='erbs' or similar then the dni should be recalculated (possibly with a warning) or an exception should be raised.
  2. Create prepare_irradiance as you've proposed, but do not add it to the run_model or the misnamed prepare_inputs method (more on this later). Users that want code that is robust to varying irradiance inputs can add a irradiance = mc.prepare_irradiance(irradiance) line, and users that want code that breaks with unexpected input can skip it. I like code that breaks with unexpected input because it usually indicates that I've done something wrong somewhere else or that I don't understand all of my input data.

Regarding the misnamed prepare_inputs, I've previously considered renaming prepare_inputs to prepare_intermediates. I've also considered removing the assumption of clear sky if irradiance=None. I didn't follow through only for lack of time, but I'd like to make these changes in 0.5. This may or may not change your opinion on how the proposed functionality should be implemented.

For me that would be okay but I think we are not allowed to change the API in minor releases so we have to wait until 0.5.

True. Whatever we decide to do, let's make it non-breaking for now and reconsider the breaking change later on.

Defining self.irradiance = self.weather is no problem because it just sets a pointer so we will have one DataFrame with two names.

My only concern was that people might be confused when they see non-irradiance data in the irradiance attribute. It's probably not a big deal.

@wholmgren wholmgren added this to the 0.4.1 milestone Sep 2, 2016
@uvchik
Copy link
Contributor Author

uvchik commented Sep 5, 2016

I've also considered removing the assumption of clear sky if irradiance=None. I didn't follow through only for lack of time, but I'd like to make these changes in 0.5. This may or may not change your opinion on how the proposed functionality should be implemented.

That changes a lot. What about defining an irradiation (or weather) class. Then we can deal with all that attributes defining an irradiation (or weather) object and the ModelChain will get its dni, dhi and ghi regardless how they have been created.

The best thing is, that it is still possible to pass a weather DataFrame to the run_model because irradiation.dni (or weather.dni) works for DataFrames and will work for the object.

This might be a way to not overload the ModelChain but keep all functionalities like clearsky and my prepare_irradiation stuff.

@wholmgren
Copy link
Member

A weather class might be useful and I agree that it should be optional. Keep in mind that you'll need to support item lookup in addition to attribute lookup. The vast majority of pvlib uses item lookup, including ModelChain. I think it should be doable if you define one or more of the get* methods.

I'm not sure how long it would take to develop this class, so I'd also be ok with following one of the approaches outlined above for now and then changing it later. Documentation and rigorous testing always takes a lot longer than I expect. My main concern is that the default behavior remains that an error is raised if a required irradiance value does not exist.

@uvchik
Copy link
Contributor Author

uvchik commented Sep 6, 2016

Okay, so let us keep it simple by now and let us conclude.

  1. In the long run the irradiance parameter will be removed in favour of the weather parameter
  2. I will add a FutureWarning make users aware of this change
  3. The prepare_irradiance method will be optional mc.prepare_irradiance(weather)
  4. Do you like the name prepare_irradiance? It's more like filling or completing the weather data but you are the native speaker 😏
  5. Something we did not discussed by now: If weather is passed to the run_model it assigns the self.weather attribute. Afterwards we can pass a new pv-system and run the model again. In my opinion the self.weather attribute should be used if no new weather parameter is passed. Next time we keep the pv-system and just run the run_model again with a new weather DF. In my opinion it should overwrite the self.weather attribute.

Please point out with which statement you agree/disagree.

@wholmgren
Copy link
Member

1-3: I agree.

4: I don't particularly like the name, but I'm not opposed to it. check_and_fill_irradiance? I don't love that either. In any case, I would recommend keeping irradiance in the name and keeping this particular method focused on irradiance. An e.g. complete_weather_inputs method could be created in the future that calls the irradiance method.

5: I agree. We should make this clear in the documentation, though.

@dacoex
Copy link
Contributor

dacoex commented Sep 6, 2016

4: I don't particularly like the name, but I'm not opposed to it. check_and_fill_irradiance? I don't love that >either.
Alternative setup_irradiance or assign_irradiance_values

@adriesse
Copy link
Member

adriesse commented Sep 6, 2016

I would raise a word of caution about calculating the missing component from the two that are given among GHI, DNI and DHI. In real-world data they rarely add up perfectly. If you have only one of the three, ignorance is bliss and you can model the other two so they do add up perfectly. But if you have two, then the third one may end up being negative or some other unreasonable value. Also, dividing by cos(zenith) where zenith -> 90 is a sensitive operation.

@uvchik
Copy link
Contributor Author

uvchik commented Sep 7, 2016

@wholmgren Thanks for you feedback. I soon will provide my second approach. Could take some weeks because I have some deadlines within the next three weeks.

@adriesse Thank you for bringing up this point. I almost forgot about this point but it was my concern when I started issue 233.

I often got global and diffuse irradiation measured on a horizontal surface and had to calculate DNI using them. I got negative values for ghi-dhi > 0 if zenith > 90° and very(!) high values if 89.99° > zenith < 90°. I even got these problems using reanalysis data from a weather data set.

I think the pvlib would be a could place to talk about these problems and offer tools to deal with it.
One reason for this behaviour is the timestamp of the weather data set. This timestamp can stand for the average value of the previous hour or for the average value of the next hour. The pvlib uses this timestamp to calculate the position of the sun. In the evening of a day this might result into zenith > 90 (or slightly below 90°) while in the previous hour was some direct irradiation so that the average value of ghi-dhi is above zero.

What can we do to avoid this? Providing an average zenith? Providing a "middle" zenith for timestamp+/-30 minutes? What if it is caused by a measurement error, shall we cut small angles above the horizon?

Of course we could leave it to the users but in my opinion the pvlib is a good place to discuss these problems and offer solutions.

@adriesse
Copy link
Member

adriesse commented Sep 8, 2016

There are many tricky aspects to this radiation processing, so indeed I think that the users of these tools need to be knowledgable. It's ok with me to discuss here, but probably you want to have a larger group.

For your specific example, you can keep DNI values somewhat sane by not allowing them to be greater than a value obtained from a clear-sky model. Calculating the sun angles for the midpoint of the averaging interval is common practice (I hope). For long time steps like one hour, you can split the intervals that includes sunrise/set into two parts (one before and one after the event).

@cwhanse
Copy link
Member

cwhanse commented Sep 8, 2016

Ø Calculating the angles for the midpoint of the averaging interval is common practice (I hope).

It ought to be common practice. SAM does it, as does PVsyst I think, but other applications? We ought to build this capability into pvlib I think, as part of model chain – a kwarg switch perhaps?

@wholmgren
Copy link
Member

I agree with all of the above, though it's up to @uvchik to decide the scope of this particular addition. I have slight preference for finishing this small, non-breaking, non-default-changing pull request, and then moving on to address the time resolutions in a separate issue and pull request. I am ok with addressing it all at once, too. I will request that we add detailed documentation with examples when we address the time resolution issue, though.

@uvchik
Copy link
Contributor Author

uvchik commented Sep 12, 2016

I would suggest the following steps.

  1. I will warning if DNI is calculated from GHI and DHI, that this operation is not safe by now.
  2. Finishing this PR with tests, documentation...
  3. Let it merge by @wholmgren if everything works fine
  4. Open a new PR to create a new function "safly_calculate_dni_from_ghi_and_dhi" (this might not be the final name though 😏). Implementing all ideas to avoid strange values and control them by parameters.
  5. If the new function works we can discuss what would be a good default way to implement this in the ModelChain. Then we can remove the warning from step 1.

@wholmgren
Copy link
Member

I like the plan. The code mostly looks good, but let me know when you're ready for a few line by line comments.

Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I made a handful of minor comments on the diff.

Please add a summary of the changes and your name or username to the next whatsnew file (you might need to create a new one).

I agree with you regarding appveyor.

@@ -316,6 +317,10 @@ def __init__(self, system, location,
self.losses_model = losses_model
self.orientation_strategy = orientation_strategy

self.weather = pd.DataFrame()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this a DataFrame instead of None (like the values below)? I'm not sure if one way is better than the other, but consistency might be more important.

Copy link
Contributor Author

@uvchik uvchik Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It saves one if-clause later on. I have check weather the DataFrame exists and if the specific column exists. Creating an empty DataFrame I do not need the first check.

Commit 1bff19f shows the consequences of this change.

Determine the missing irradiation columns. Only two of the following
data columns (dni, ghi, dhi) are needed to calculate the missing data.

This function is not save at the moment. Results can be too high or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save --> safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Parameters
----------
times : datetime index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatetimeIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.DataFrame
Containing the missing column of the data sets passed with the
weather DataFrame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unneeded extra line here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.times = times
self.solar_position = self.location.get_solarposition(self.times)
icolumns = set(self.weather.columns)
wrn_txt = "This function is not save at the moment.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save --> safe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to write as

wrn_text = ("text" +
            "text" +
             ...)

instead of using +=

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

icolumns = set(self.weather.columns)
wrn_txt = "This function is not save at the moment.\n"
wrn_txt += "Results can be too high or negative.\n"
wrn_txt += "Help to improve this function on github."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a space or \n after the .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# The following part could be removed together with the irradiance
# parameter at version v0.5 or v0.6.
# **** Begin ****
wrn_txt = "The irradiance parameter will be removed soon.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same += comment as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.times = times
# Add columns that does not exist and overwrite existing columns
# Maybe there is a more elegant way to do this. Any ideas?
if weather is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be best for pvlib to only use a simple assignment. Users can do this in their own code before passing the parameter or making the assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if irradiance is not None:
warnings.warn(wrn_txt, FutureWarning)
for column in irradiance.columns:
self.weather[column] = irradiance.pop(column)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you pop the column? I think this would mutate the input data, which is usually not a nice thing to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uvchik
Copy link
Contributor Author

uvchik commented Oct 17, 2016

This is how it works now. This could be an example, but it is not working due to undefined input data. A working example would be too overloaded in my opinion. I could add the following code to the docstring using skip_doctest or just leave it as an information here. It just shows that you have to pass the weather data and the index only once.

Examples
--------

>>> from pvlib.modelchain import ModelChain

>>> # my_weather containing 'dhi' and 'ghi'. 
>>> mc = ModelChain(my_system, my_location)
>>> mc.complete_irradiance(my_datetime_index, my_weather)
>>> mc.run_model()

>>> # my_weather containing 'dhi', 'ghi' and 'dni'. 
>>> mc = ModelChain(my_system, my_location)
>>> mc.run_model(my_datetime_index, my_weather)

@uvchik
Copy link
Contributor Author

uvchik commented Oct 17, 2016

Last comment (hopefully).

Using pandas.DataFrame, I do not understand why we pass the DatetimeIndex and the columns separately. In my applications it looks like this:

mc = ModelChain(my_system, my_location)
mc.run_model(my_weather.index, my_weather)

I think it is good to force the people to pass weather data with a working DatetimeIndex. If we do so we do not have to pass them separately.

But this might be a new discussion.

@wholmgren
Copy link
Member

I think adding your simple example to the docstring would be useful. Just let me know if I should merge it now or wait for the change.

Passing the times separately was needed to support the default clear sky and weather data. I am open to changing these behaviors in a separate issue/PR.


* Adding a complete_irradiance method to the ModelChain to make it possible to calculate missing irradiation data from the existing columns [beta] (:issue:`239`)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding and fixing the typo in the ModelChain repr. You could add a note for that here, too.

Copy link
Contributor Author

@uvchik uvchik Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not clean to fix it within this PR but I was too lazy to do it separately. So I added this fix linking it to its commit. I guess there is no commit-role in your sphinx documentation similar to your issue-role, so it will not be a real link afterwards.

@uvchik
Copy link
Contributor Author

uvchik commented Oct 18, 2016

I hope everything is fine now 😄

If you find typos etc. you can fix them you directly or let me know. Otherwise feel free to merge.

@wholmgren wholmgren merged commit 23a4d1a into pvlib:master Oct 18, 2016
@wholmgren
Copy link
Member

Great, thanks so much! I'll check out the next PR soon.

@uvchik
Copy link
Contributor Author

uvchik commented Oct 18, 2016

The merge confused me but I think now everything is in order again (-> #250).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants