Skip to content

Normalization of keywords #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rweigel opened this issue Mar 12, 2019 · 9 comments
Closed

Normalization of keywords #77

rweigel opened this issue Mar 12, 2019 · 9 comments

Comments

@rweigel
Copy link
Contributor

rweigel commented Mar 12, 2019

In dataset, we should use id instead of name. Same for `bins'.

URLs should be dataset=ID&parameters=IDs instead of id=ID&parameters=IDs.

time.min and time.max should be replaced with start and stop.

In catalog response, consider allowing the use of description with the note that this description should match that in the info response for that dataset. Possibly remove title from the catalog response as it is inconsistent.

May want to also allow catalog response to include additional information about a dataset, for example, startDate and endDate and cadence (in the case where info responses are large, clients may want the ability to get this information more quickly).

@jvandegriff
Copy link
Collaborator

Action items:

Clarify the changes and list them out
Study existing clients for the difficulty of the changes:
IDL (Eric) – seems not too hard
Java (Nand) - same

@supervised
@rweigel
@jbfaden

@jvandegriff
Copy link
Collaborator

jvandegriff commented May 11, 2020

Here's a breakdown of the current status of potentially confusing keywords, followed by proposed changes.

catalog

request: no parameters
response: datasets referred to with "id" and optional "title"
example: http://server/hapi/catalog

{
   "HAPI" : "2.1",
   "status": { "code": 1200, "message": "OK"},
   "catalog" : 
   [
      {"id": "ACE_MAG", title:"ACE Magnetometer data"},
      {"id": "data/CRUISE/PLS"},
      {"id": "any_identifier_here"}
   ]
}

info

request: parameter is "id" of dataset
response: top-level dataset info plus list of parameters; the "id" is not included; dates have labels of "startDate" and "stopDate"; parameters each have a "name"; bins also have a "name"
example: http://server/hapi/info?id=ACE_MAG_AND_PARTICLES

{  "HAPI": "2.1",
   "status": { "code": 1200, "message": "OK"},
   "startDate": "1998-001Z",
   "stopDate" : "2017-100Z",
   "parameters": [
       { "name": "Time",
         "type": "isotime",
         "units": "UTC",
         "fill": null,
         "length": 24 },
       { "name": "mag_GSE",
         "type": "double",
         "units": "nT",
         "fill": "-1e31",
         "size" : [3],
         "description": "hourly average Cartesian magnetic field in nT in GSE",
         "label": "B field in GSE"
       },
       { "name": "protons",
         "type": "double",
         "units": "1/(sec*ster*cm^2*keV)",
         "fill": "-1e31",
         "description": "hourly average protons in energies 15 to 500 keV",
         "label": "ACE EPAM protons"
         "size" : [8],
         "bins": [ {
             "name": "energy",
             "units": "keV",
             "centers": [ 18, 35, 70, 100, 150, 230, 335, 510],
          } ],
       }
   ]
}

data

request: parameter is "id" of dataset with "time.min" and "time.max"; can also request just some parameters with the "parameters" keyword in the request URL
response: same content as info response
example: http://server/hapi/data?id=ACE_MAG_AND_PARTICLES&parameters=mag_GSE&time.min=2004-001Z&time.max=2005-001Z

Proposed Changes

The proposal is to change the info and data request format from this:

http://server/hapi/info?id=ACE_MAG_AND_PARTICLES
http://server/hapi/data?id=ACE_MAG_AND_PARTICLES&time.min=2004-001Z&time.max=2005-001Z&parameters=mag_GSE

to this

http://server/hapi/info?dataset=ACE_MAG_AND_PARTICLES
http://server/hapi/data?dataset=ACE_MAG_AND_PARTICLES&startDate=2004-001Z&stopDate=2005-001Z&parameters=mag_GSE

Also, the name keywords in the parameter and bins definitions will change to id.

@jvandegriff
Copy link
Collaborator

jvandegriff commented May 11, 2020

There was brief discussion on today's telecon about this.

Everyone agrees on these:

  1. this is the last time we should make clean-up changes like this
  2. change info?id=DATASET_ID to info?dataset=DATASET_ID is good
  3. change time.min and time.max to start and stop in the request; this is the same as what's in the info header for a time range, and it is the same as in SPASE

Less clear are what to do about the name fields in the metadata. SPASE uses name and id, but what we really have is called a parameterKey is SPASE. For dataset ids, SPASE uses the term productKey.

People can comment here about what they think is best for HAPI, and we will have a dedicated meeting later this week to resolve it.

@candeynasa
Copy link

I think we should lean as much as possible to using the same terminology as SPASE: ProductKey for catalog ID/dataset name, ParameterKey for parameter name, ResourceName for catalog/dataset title. The idea is to have a common description of a dataset that can be used for all purposes. We should be able to run a HAPI server solely off the info in the SPASE descriptions. [For this reason, we should also add the URI Template scheme that we created to SPASE as a field.]

@rweigel
Copy link
Contributor Author

rweigel commented May 18, 2020

I think we should split this discussion into two parts:

(1) Internal normalization (The point of the original issue)
(2) External normalization with SPASE (Bobby's point)

Before we try (2), I think we need to come to an agreement on (1). Item (2) is a more difficult discussion and I think it is better to do one thing at a time. I recall that very early on we decided to use just "id" and not parameterKey, etc. and I think we should find our notes about this decision before attempting to undo it.

@jvandegriff
Copy link
Collaborator

jvandegriff commented May 18, 2020 via email

@jvandegriff
Copy link
Collaborator

Because this turned into two separate issues, and the first is resolved, we are going to close the ticket for that aspect, and leave the second one unresolved. If anyone wants to suggest changes to bring keywords more fully aligned with SPASE, that could be a separate ticket, and this would likely require a lot of discussion.

We have agreed to change the request interface from this:
http://server/hapi/data?id=ACE_MAG&time.min=2004-001Z&time.max=2005-001Z&parameters=mag_GSE
to this
http://server/hapi/data?dataset=ACE_MAG&start=2004-001Z&stop=2005-001Z&parameters=mag_GSE

And similarly for the info request (change id to dataset).

The older usage is still supported in the new spec version (3.0), but is deprecated. We will need to add / modify the error responses to warn about use of deprecated elements.

The spec still needs to be modified to reflect this significant change.

@jvandegriff
Copy link
Collaborator

Should id in the catalog be changed to the more specific term dataset to be more consistent?

@rweigel
Copy link
Contributor Author

rweigel commented Nov 16, 2020

Summary of general agreement

  • 3.0+ servers can allow info?id=... and info?dataset=...
  • 3.0+: both time.min=... and start=... are allowed
  • 3.0+: bothtime.max=... and stop=... are allowed

No

  • 3.0+ Allow either name or id in parameters.
  • 3.0+ Allow eithername to id in bins.

In the documentation, warn that in 4.0, only start, stop, id versions will be allowed.

Continues to be raised periodically:

  • Make HAPI metadata use SPASE keywords.

This was decided upon early on. I suggest it should be proposed in another issue if there is more interest in it beyond discussing it on a telecon. Ideally, the issue description would include a discussion of what the decision was and why the decision is wrong.

@rweigel rweigel closed this as completed in ccb75f5 Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants