Skip to content

Units issue #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rweigel opened this issue Jul 1, 2019 · 16 comments
Closed

Units issue #81

rweigel opened this issue Jul 1, 2019 · 16 comments

Comments

@rweigel
Copy link
Contributor

rweigel commented Jul 1, 2019

Provide suggested convention for units. Note that in some cases units used are derived from historical data and can't be changed.

@jbfaden
Copy link
Contributor

jbfaden commented Jul 1, 2019

Here is an Autoplot ticket that has links to several lists: https://sourceforge.net/p/autoplot/feature-requests/249/

@berniegsfc
Copy link
Contributor

I can't find the source documents right now, but from the project specific CDF validation criteria files used by skteditor, there was a "Magnetospheric Multiscale (MMS) Mission CDF File Format Guide" and an "MMS Units Of Measure" document. From these documents (Draft Version 1.7, 11/10/2014 and 1/27/2015 respectively), come the following validation criteria (regex pattern):

  • From Section 5.1.3.11, Attribute name="SI_CONVERSION"
    ' > |1e6>m^{-3} *|1.0e3>m/s *|0.0174532925>rad *|1.0e-9>Pa *|11604.50520>K *|1.0e-3>W/m^{2} *|1.0>J/K *|1.0e-3>V/m *|1.0>V *|1.0>(V/m)^{2}/Hz *|1.0e-3>m/s *|1.0e-3>W/m^{2} *|1.0e-9>T *|1.0e-18>T^{2}/Hz *|1.0e-9>A/m^{2} *|1.0e3>m *'
  • From Section 5.1.3.12, Attribute name="UNITS"
    'cm^{-3} *|km/s *|deg *|nPA *|eV *|mW/m^{2} *|J/K *|mV/m *|V *|(V/m)^{2}/Hz *|nT *|nT^{2}/Hz *|nA/m^{2} *|1/(cm^{2} s sr eV) *|eV/(cm^{2} s sr eV) *|km *|s^{-1}|\s'

And from the Panel on Radiation Belt Environment Modeling (PRBEM) CDF validation specification:

  • Position, UNITS = km
  • B_Calc, B_Eq, UNITS=nT
  • MLT, UNITS=hours
  • Alpha, Alpha_Eq, FPDU_Alpha, FPDU_Alpha_Eq, etc. UNITS=degrees
  • FPDO, FEDO, FADO, FIDO, FPDU, FEDU, FADU, FIDU UNITS=MeV^-1 cm^-2 s^-1 sr^-1
  • FIDO, FEIO, FAIO, FIIO, FPIU, FEIU, FAIU, FIIU UNITS=cm^-2 s^-1 sr^-1
  • FPDO_Energy, FPIO_Energy, FEDO_Energy, FEIO_Energy, FADO_Energy, FAIO_Energy, FIDO_Energy, FPDU_Energy, etc., UNITS=MeV

@rweigel
Copy link
Contributor Author

rweigel commented Jul 2, 2019

Another link:
https://www.unidata.ucar.edu/software/udunits/udunits-current/doc/udunits/udunits2.html has a large XML file containing definitions.

@rweigel
Copy link
Contributor Author

rweigel commented Sep 20, 2019

Proposal:

Add optional unitsConvention (or perhaps unitSymbols) at top level of info and initially allow only udunits2 as a possible value.

If unitsConvention is given, validator checks all units in metadata against the list of allowed symbols in udunits2

https://www.unidata.ucar.edu/software/udunits/udunits-current/udunits/udunits2-prefixes.xml (k, M, G, etc)

https://www.unidata.ucar.edu/software/udunits/udunits-current/udunits/udunits2-base.xml (m, kg, s, A, etc.)

https://www.unidata.ucar.edu/software/udunits/udunits-current/udunits/udunits2-derived.xml (rad, sr, Pa, etc.)

https://www.unidata.ucar.edu/software/udunits/udunits-current/udunits/udunits2-accepted.xml (eV, Hz, etc.)

To this list of allowed symbols, I suppose we would need to add /, ^, **, (, and ). The validator would also need to check for matching parentheses.

This does not cover the more complex case of PRBEM, which mandates that certain types of measurements must use a given unit.

@rweigel
Copy link
Contributor Author

rweigel commented Sep 24, 2019

Summary of discussion on telecon:

The use case is a data provider wishes to communicate to the client that the units used in the metadata follow a certain convention.

The HAPI spec should support this, but we should keep the default behavior that the unit strings do not need to follow a convention. The reason is that updating metadata would take a lot of work, be error-prone, and the resulting unit strings may not be what the user of a given data product is used to. In addition, it is expected that HAPI metadata will have a SPASE_ID pointer for additional information in a SPASE record. SPASE does not constrain units, and we would not want to have a case where the units in HAPI metadata for a parameter differ from that listed in the SPASE record.

The "units" issue is really three issues.

  1. A convention for unit strings, e.g., a radian unit is rad not radian and Celcius is C not deg_C. This is what the xml files linked to above for udunits2 covers.
  2. A convention for mathematical operations, e.g., ^ means exponentiation and log(x) means the natural log of x. This seems to be addressed by udunits2 because the software can convert between units. Ideally, we would have a document that describes how strings are interpreted - for example, is cm^-1 allowed or must it be cm^(-1) and is cm^{-1} allowed? udunits2 has a parser for this and I suggest that we follow the rules given by this parser (I was not able to find documentation on the rules, but they can be determined from this code).
  3. A convention of what units from 1. are used for certain measurements, e.g., for particle number fluxes, the units should be cm^-2 s^-1 sr^-1 and not km^-2 hour^-1 sr^-1.

I suggest that we start with 1. and 2. (and call the convention udunits2) and treat 3. as a separate issue.

If a data provider wants to define a convention other than udunits2, then they need a document listing all of the allowed unit strings and also a specification of what the allowed mathematical operations are.

@jbfaden
Copy link
Contributor

jbfaden commented Sep 24, 2019

Given two quantities, the units string for each, and the units schema for these, I should be able to calculate the product of the two quantities, with meaningful and efficient units. By meaningful I mean a human can understand them, and by efficient I mean that the system knows that cm / (1/cm) -> cm^2. Also it should have the knowledge that (1/cm) * ( cm**-1 ) -> cm^-2.

@rweigel
Copy link
Contributor Author

rweigel commented Sep 24, 2019

@jbfaden not sure what your point is relative to the thread ... is it that with 1. and 2. a client has enough information to compute meaningful and efficient units? If yes, then yes, that is the motivation of 1. and 2.

@jbfaden
Copy link
Contributor

jbfaden commented Sep 24, 2019

I just noticed that we never really stated a motivation for all this.

@rweigel
Copy link
Contributor Author

rweigel commented Sep 24, 2019

I see. I'll add it as a motivation to the one mentioned in the telecon notes above.

@jvandegriff
Copy link
Collaborator

jvandegriff commented Oct 7, 2019

In the 2019-10-07 telecon, we agreed to add unitsSchema as an optional attribute at the dataset level. (Each dataset can specify it's units schema, but not individual parameters). The three that we know about right now are:

  1. UDUNITS2.2.26
  2. AstroPy units (current AstroPy is 3.2.1)
  3. the CUNITS conventions used in CDF files

The unitsSchema values for each of these would be

  1. udunits2
  2. astropy3
  3. cdf-mms
  4. cdf-cluster
  5. cdf-prbem

The spec should reference where to find out about these (link or enough to search with).

The spec should describe how to form a units string to emphasize some level of versioning, but not necessarily the whole version identifier. Using udunits2.2.26 would mean that it would take more work for clients to know that this is pretty much compatible with udunits2.2.35 or others.

It's better if the referenced units flavor has a validating mechanism. All of the above items have software that can validate the units strings. For items 1 and 2, they each have independent validators. 3 and 5 are validated by the SKTEditor. For 4, the QSAS tools checks units.

@jvandegriff
Copy link
Collaborator

Lots of discussion about wether to use an enumeration for the allowed schema names.

For now, we will only allow these 5, and if we get requests for more, we will add them and consider changing away from a restricted list.

The validator could be augmented to look into the specific units and use the machine-accessible units spec to actually check that the data reports units correctly.

Could be a separate (community based) service to do units validation.

@jvandegriff
Copy link
Collaborator

Does anyone know where to find the official documents for Cluster unit and also PRBEM units?

I'm going to include a table in the HAPI spec with current links and also info about the origin of each convention.
You can see the current table on this branch of the spec:
https://github.com/hapi-server/data-specification/blob/jvandegriff-unitsSchema-1/hapi-dev/HAPI-data-access-spec-dev.md#info

For Cluster, this page:
https://www.cosmos.esa.int/web/csa/software
references the QUnit 2.4.02 package (which comes from QSAS):
http://www.sp.ph.ic.ac.uk/csc-web/qunit.html

Perhaps we should reference QUnit instead of Cluster specifically?

For the PRBEM, so far I've only found the COSPAR home page:
https://craterre.onera.fr/prbem/home.html

@berniegsfc
Copy link
Contributor

I have V1.2 (date of 2012 on my computer's file system) of "Panel on Radiation Belt Environment Modeling (PRBEM) Standard file format guidelines" but it has no URL to the "published" document. https://craterre.onera.fr/prbem/Standard_File_Format.pdf is V1.1.

@jvandegriff
Copy link
Collaborator

Waiting to see if MMS units has online info. PRBEN does not.

We will drop from the enum any option that does not have good online support.

@jbfaden
Copy link
Contributor

jbfaden commented Jan 4, 2021

This link talking about MMS units was recently made public: https://lasp.colorado.edu/galaxy/display/mms/Units+of+Measure

@jvandegriff
Copy link
Collaborator

Cluster Exchange Format (CEF) units are described succinctly in this file:

https://caa.esac.esa.int/documents/DS-QMW-TN-0010.pdf

which is referenced on this page:
https://www.cosmos.esa.int/web/csa/documentation

The relevant info from that document is this:
SI CONVERSION
Required for all data in science units. Text string of the form
number>SI unit
where number is the conversion factor to SI units. It is the factor that the variable must be multiplied by in order to turn it into SI units. The string SI unit is the standard unit that it converts to. For example the magnetic field for FGM may be in nT, and to convert to Tesla the value of SI CONVERSION should be 1.0e-9>T. For compound units the grammar will be of a standard form: distinct unit dimensions will be separated by space characters and powers (signed) will be preceded by the carat, ^.

Non-dimensional qualifiers, which do not appear in the SI units list, are to be enclosed in braces (). For example, m s^−1 or (number electrons) m^−3. Similarly (percent) and (ratio) would provide user information on dimensionless quantities. Non-integer powers are permitted, e.g.
Hz∧−0.5. SI units should be one of:
s second
kg kilogram
m metre
Hz hertz
A ampere
K kelvin
J joule
V volt
T tesla
Pa pascal
C coulomb
H henry [needed for mu o]
F farad [needed for eps o]
W watt
N newton
ohm
mho
rad radian
sr steradian
degree [alternative angle measure, not SI, but convenient and often used]
unitless [added for compliance with the MDD documentation, and used only when no units can
be specified, e.g. Tpar/Tperp]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants