-
Notifications
You must be signed in to change notification settings - Fork 20
Iluise/develop/stac database #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…nto iluise/develop/stac-database
Ilaria Luise seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iluise this is fantastic, thank you for putting this together. I have a few changes suggested that should be easy to do with search & replace.
Also, if you want, you can use jsonnet fmt
to nicely format all the jsonnet files
{ | ||
"description": "The data catalogue of the WeatherGenerator project", | ||
"id": "weathergen", | ||
"links": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this file still need to be here? it would be nice to only have jsonnet in this directory to cleanly separate the output from the jsonnet code
// for i, v in enumerate(ds.variables): print(f"\"{v}\": [{ds.statistics["minimum"][i]}, {ds.statistics["maximum"][i]}, | ||
// {ds.statistics["mean"][i]}, {ds.statistics["stdev"][i]}, | ||
// {ds.statistics_tendencies()["mean"][i]}, {ds.statistics_tendencies()["stdev"][i]}],") | ||
variables: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is not great with this format is that it is hard to update with new statistics: we can append to the list but we need to document the precise order of the fields. json is not great at dealing with NaN
. We can keep that array since you did all the work. If format change is necessary, we can have a more complex variables_v2
key.
"reanalysis" | ||
], | ||
providers: [ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to a common.jsonnet
where you put all the common variables such the common provider definition:
{
providers: {
copernicus: {
name: "Copernicus",...
},
ecmwf: {...}
},
hpc: {
leonardo: "leonardo",
hpc2020: "hpc2020",
...
}
}
The reason for also adding the HPCs is that we are going to also read this file from the programs. This ensures that the name of the HPCs exactly match the code names used in the code:
https://xxx/WeatherGenerator-private/-/blob/main/hpc/platform-env.py?ref_type=heads#L10
dataset_name: "cerra-rr-an-oper-0001-mars-5p5km-1984-2020-6h-v2-hmsi.zarr", | ||
type: "application/vnd+zarr", | ||
description: "Anemoi dataset", | ||
locations: ["HPC2020", "EWC", "MareNostrum5", "Leonardo"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above. These names do not match the code codenames. this will be easier to track with variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EWC is not officially registered, just add it in the list
"atmosphere", | ||
"reanalysis" | ||
], | ||
providers: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment about reusing teh provider
local fn = import 'functions.libsonnet'; | ||
|
||
// URL for hrefs | ||
local href_link = "https://raw.githubusercontent.com/ecmwf/WeatherGenerator/refs/heads/iluise/develop/stac-database/database/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking you could just have 2 href links and just generate 2x the files below?
local href_link = "https://raw.githubusercontent.com/ecmwf/WeatherGenerator/refs/heads/iluise/develop/stac-database/database/"; | ||
|
||
// TODO: improve this | ||
local era5v8 = import "era5_v8.jsonnet"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is all duplicate, let's try to combine. the only difference is the href link
], | ||
|
||
variables: { | ||
"quality_pixel_bitmask": ["TBA", "TBA", "TBA", "TBA", "TBA", "TBA"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be consistent with how to represent missing values? It is "NA" above, which is what I would pick
geometry: [-180, 180, -90, 90], | ||
|
||
ref_links: [ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need the links in this file? it should be generated for you.
what you could do otherwise is have a extra_ref_links
field that you merge with the one you generate
description: "Observation dataset", | ||
locations: ["HPC2020", "JSC"], | ||
size: "120GB", | ||
inodes: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"NA"?
Description
Type of Change
Issue Number
#196
Code Compatibility
Code Performance and Testing
it doesn't touch the main code
Dependencies
Should I add jsonnet to the uv modules? In principle nobody should touch the jsonnets.
Documentation
Additional Notes