Skip to content

Request: Reference types #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gauteh opened this issue Jun 11, 2020 · 10 comments
Open

Request: Reference types #98

gauteh opened this issue Jun 11, 2020 · 10 comments

Comments

@gauteh
Copy link

gauteh commented Jun 11, 2020

Hi,

while figuring out how to get the dimensions of a dataset I am trying to access STD_REF_OBJ's in attributes (using #65 with changes from @balintbalazs). A simple example of references are generated with this script:

from h5py import File
import numpy as np

# http://docs.h5py.org/en/stable/high/dims.html

f = File('dims_1d.h5', 'w')

f['x1'] = [1, 2]
f['x1'].make_scale('x1 name')
f['data'] = np.ones((2,), 'f')
f['data'].dims[0].attach_scale(f['x1'])

which gives the following structure (using h5dump -A dims_1d.h5):

HDF5 "dims_1d.h5" {
GROUP "/" {
   DATASET "data" {
      DATATYPE  H5T_IEEE_F32LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      ATTRIBUTE "DIMENSION_LIST" {
         DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): (DATASET 800 /x1 )
         }
      }
   }
   DATASET "x1" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      ATTRIBUTE "CLASS" {
         DATATYPE  H5T_STRING {
            STRSIZE 16;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "DIMENSION_SCALE"
         }
      }
      ATTRIBUTE "NAME" {
         DATATYPE  H5T_STRING {
            STRSIZE 8;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "x1 name"
         }
      }
      ATTRIBUTE "REFERENCE_LIST" {
         DATATYPE  H5T_COMPOUND {
            H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
            H5T_STD_I32LE "dimension";
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): {
               DATASET 1400 /data ,
               0
            }
         }
      }
   }
}
}

examples of how to de-reference the reference are here: https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5-examples/browse/1_10/C/H5T/h5ex_t_objrefatt.c with hdf5-reference: https://portal.hdfgroup.org/display/HDF5/H5R_DEREFERENCE. As far as I can see this requires some changes in hdf5-rust in order to access since I do not have direct access to the hid_t of my attribute (in this case). I could try to implement this, but it seems a bit tricky, either way it probably requires some dereferencing to get the actual object. In my case getting the name of the object would be enough at first.

Regards, Gaute

@gauteh
Copy link
Author

gauteh commented Jun 11, 2020

Seems I can use .id() on objects, never noticed before!

@JamesMc86
Copy link
Contributor

JamesMc86 commented Apr 2, 2024

I'm trying to understand if the description above offers a workaround. I have references in a dataset I need to access, but I think this ID getting will work on an attribute. Is that right? Is there any way to fake a reference data set right now?

If not I'm up for taking a swing at integrating this.

@mulimoen
Copy link
Collaborator

mulimoen commented Apr 3, 2024

I've taken a first shot at implementing this in a branch here [0]. Some notes follows

  • There are three reference types, two legacy and one new
  • Dataset region references are not implemented
  • Need a way to go from ObjectReference to an object (see StdReference for new ref.type to object)
  • Reference types created by dimension scales etc. are wrapped in VarLenArray, but this accepts only Copy types
  • References might be leaking?

Feel free to play around using this @JamesMc86

[0] https://github.com/mulimoen/hdf5-rust/tree/feature/refs2

@JamesMc86
Copy link
Contributor

This looks great and will meet all my requirements in its current format. Just sent you a PR to that branch with some integration tests I used during testing.

There are a couple of notes but none of this block what I need right now:

  • I'm not clear on the differentiation between StdReference and ObjectReference which I appreciate is differentiated at the HDF5 level so probably need to understand more of whether anything is actually different at the file level. Perhaps an easy conversion here
  • I can see you have also tried and worked with them in arrays. The lack of copy blocks these but I think you can probably only make them Clone using the HF5R_Copy method. Perhaps the VarLenArray type can have a clone based construction too.
  • Some helper methods for converting ReferencedObject to the target (with error) would be useful. This could be a TryFrom implementation.

I'm happy to work on some aspects that I can

For my use case these are all just refinements though and this looks great to me.

@JamesMc86
Copy link
Contributor

Ah, I just tried the clone implementation, but it actually requires a try_clone method, as it can fail.

@JamesMc86
Copy link
Contributor

So I've been getting my head around more of this at the HDF layer. Is it right to say the ObjectReference is the pre 1.12 type and then since 1.12 we have the standard reference type which is an opaque type over object, region or attribute references.

Therefore, for backward compatibility, we could have an ObjectReference as we do now, which could then wrap the standard reference for these in the newer versions.

@mulimoen
Copy link
Collaborator

mulimoen commented Apr 7, 2024

I think the different reference types are the old type as you suggest with separate Object and Data region references, which is replaced by a more general reference type H5R_ref_t (which we call StdReference but that is a very poor name). I think we need to support both types since the original issue has the old reference type (H5T_STD_REF_OBJECT).

Ah, I just tried the clone implementation, but it actually requires a try_clone method, as it can fail.
You can always panic in the code if the clone fails. We will have to figure out what to do with VarLenArray which contains references, but this can be delayed until a later time.

@JamesMc86
Copy link
Contributor

That sounds great. Yesterday, I did a bit more on the refactoring at https://github.com/WiresmithTech/hdf5-rust/tree/feature/refs2. I hope you are happy with me plodding away on it. I've separated the two types into their own modules so we can easily gate the new types on the HDF version number.

With the references I'm thinking:

  1. Even with the standard reference type, wrapping it as an Object reference will allow for a more ergonomic API when region references and attribute references are introduced.
  2. Keep the old reference type as you said.
  3. maybe (this is where I would like your advice) - alias them together so new code automatically uses the new reference as it seems to be compatible on disk and purely an API change. If we have made it so it automatically migrates then we don't need to expose both types.

So you end up with something like:

< v1.12

struct ObjectReference {...}

v1.12

struct StdReference{...}
struct ObjectReference(StdReference)

What do you feel about this? The alternatives I've thought about:

  • We could use ObjectReference and StdReference, but I think that will cause confusion.
  • We could use ObjectReference1 and ObjectReference2, consistent with how HDF5 names newer and older methods. This means they will co-exist and users will have to explicitly update their code to support the non-deprecated methods.
  • We could just skip the old ObjectReferences all together. They are deprecated by HDF5 but only available > v1.12.

@mulimoen
Copy link
Collaborator

mulimoen commented Apr 8, 2024

Great that you are working on this. There is a big feature space to explore. Some of the incompatibility can be resolved by using traits for dereference etc.

The reference types are not compatible on disk unfortunately, and we should keep both and make them both available for backwards compatibility with e.g. NetCDF files. Even though 1.10 (and 1.12) has no new planned releases, the old API is very unlikely to be removed.

@JamesMc86
Copy link
Contributor

Perfect - I'll assume we need both then and work accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants