Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Issue/233 data validation #258

Merged
merged 9 commits into from
Oct 22, 2021
Merged

Issue/233 data validation #258

merged 9 commits into from
Oct 22, 2021

Conversation

peterdudfield
Copy link
Contributor

@peterdudfield peterdudfield commented Oct 21, 2021

Pull Request

Description

Add validation for

  • sun
  • satellite
  • nwp
  • pv
  • gsp

Fixes #233
#29

How Has This Been Tested?

Add some unittests to check the errors

  • No
  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@peterdudfield peterdudfield linked an issue Oct 21, 2021 that may be closed by this pull request
@peterdudfield peterdudfield marked this pull request as ready for review October 22, 2021 07:26
@peterdudfield peterdudfield requested review from jacobbieker, flowirtz and JackKelly and removed request for jacobbieker October 22, 2021 07:26
Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few comments on maybe simplifying some of the asserts, but thanks for adding all the validation!

@@ -24,5 +24,5 @@ class NWP(DataSourceOutput):
@classmethod
def model_validation(cls, v):
""" Check that all values are not NaNs """
assert (v.data != np.nan).all(), "Some nwp data values are NaNs"
assert (~np.isnan(v.data)).all(), "Some nwp data values are NaNs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about checking for positive and negative infinity? with

Suggested change
assert (~np.isnan(v.data)).all(), "Some nwp data values are NaNs"
assert (~np.isfinite(v.data)).all(), "Some nwp data values are NaNs or infinite"

Copy link
Member

@JackKelly JackKelly Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree about using np.isfinite to test for NaNs, +inf and -inf. But, if we use np.isfinite, then we'll also need to remove the ~ and tweak the string 🙂

Suggested change
assert (~np.isnan(v.data)).all(), "Some nwp data values are NaNs"
assert (np.isfinite(v.data)).all(), "Some nwp data values are NaNs or infinities"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gona go with both isnan and isinf. I this leads to slightly more verbose error messages

For example
assert (~isnan(v.data)).all(), f"Some pv data values are NaNs"
assert (~isinf(v.data)).all(), f"Some pv data values are Infinite"

@@ -27,5 +27,6 @@ class Satellite(DataSourceOutput):
@classmethod
def model_validation(cls, v):
""" Check that all values are non negative """
assert (v.data != np.NaN).all(), f"Some satellite data values are NaNs"
assert (~np.isnan(v.data)).all(), f"Some satellite data values are NaNs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for NWP

Suggested change
assert (~np.isnan(v.data)).all(), f"Some satellite data values are NaNs"
assert (~np.isfinite(v.data)).all(), f"Some satellite data values are NaNs or infinite"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above 🙂

Suggested change
assert (~np.isnan(v.data)).all(), f"Some satellite data values are NaNs"
assert (np.isfinite(v.data)).all(), "Some nwp data values are NaNs"

@@ -21,5 +21,6 @@ class Topographic(DataSourceOutput):
@classmethod
def model_validation(cls, v):
""" Check that all values are non NaNs """
assert (v.data != np.NaN).all(), f"Some topological data values are NaNs"
assert (~np.isnan(v.data)).all(), f"Some topological data values are NaNs"
assert (v.data != np.Inf).all(), f"Some topological data values are Infinite"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly just combining the two asserts might be a bit simpler?

Suggested change
assert (v.data != np.Inf).all(), f"Some topological data values are Infinite"
assert (np.isfinite(v.data)).all(), f"Some topological data values are Infinite or NaNs"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above, think I will keep them separate

Comment on lines 22 to 23
assert (~np.isnan(v.data)).all(), f"Some gsp data values are NaNs"
assert (v.data != np.Inf).all(), f"Some gsp data values are Infinite"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly combine the two asserts with

Suggested change
assert (~np.isnan(v.data)).all(), f"Some gsp data values are NaNs"
assert (v.data != np.Inf).all(), f"Some gsp data values are Infinite"
assert (np.isfinite(v.data)).all(), f"Some gsp data values are NaNs or infinite"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above, think I will keep them separate

Comment on lines 35 to 36
assert (~np.isnan(v.data)).all(), f"Some pv data values are NaNs"
assert (v.data != np.Inf).all(), f"Some pv data values are Infinite"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, possibly combining the two might be a bit cleaner

Suggested change
assert (~np.isnan(v.data)).all(), f"Some pv data values are NaNs"
assert (v.data != np.Inf).all(), f"Some pv data values are Infinite"
assert (np.isfinite(v.data)).all(), f"Some pv data values are NaNs or infinite"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above, think I will keep them separate

Comment on lines 26 to 28
assert (~np.isnan(v.elevation)).all(), f"Some elevation data values are NaNs"
assert (v.elevation != np.Inf).all(), f"Some elevation data values are Infinite"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly combine these two, or if we do want to keep them separate, changing it to

Suggested change
assert (~np.isnan(v.elevation)).all(), f"Some elevation data values are NaNs"
assert (v.elevation != np.Inf).all(), f"Some elevation data values are Infinite"
assert (~np.isnan(v.elevation)).all(), f"Some elevation data values are NaNs"
assert (~np.isinf(v.elevation)).all(), f"Some elevation data values are Infinite"

as I believe np.Inf only counts as positive infinity, and wouldn't check for negative infinity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above, think I will keep them separate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, good idea!

Comment on lines 29 to 30
assert (~np.isnan(v.azimuth)).all(), f"Some azimuth data values are NaNs"
assert (v.azimuth != np.Inf).all(), f"Some azimuth data values are Infinite"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@peterdudfield
Copy link
Contributor Author

Looks good! Just a few comments on maybe simplifying some of the asserts, but thanks for adding all the validation!

Ill put np.isfinite

Yea good suggestions, I wanted to put but nan check and infinite check with a different error message, so it's easier to debug. I know that its twice as much work, but its seems like it might be worth it

@jacobbieker
Copy link
Member

jacobbieker commented Oct 22, 2021

Looks good! Just a few comments on maybe simplifying some of the asserts, but thanks for adding all the validation!

Ill put np.isfinite

Yea good suggestions, I wanted to put but nan check and infinite check with a different error message, so it's easier to debug. I know that its twice as much work, but its seems like it might be worth it

Okay, yeah, makes sense! I would then suggest using np.isinf instead of np.isfinite checks both NaN and infinities, while np.isinf checks only infinities

Copy link
Member

@JackKelly JackKelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@peterdudfield peterdudfield merged commit d64f242 into main Oct 22, 2021
@peterdudfield peterdudfield deleted the issue/233-data-validation branch October 22, 2021 08:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add data source validation Check for -1 values in sat data
3 participants