Skip to content

Large data vectors break hdf5 #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
apatil opened this issue May 17, 2011 · 2 comments · Fixed by #115
Closed

Large data vectors break hdf5 #41

apatil opened this issue May 17, 2011 · 2 comments · Fixed by #115

Comments

@apatil
Copy link
Contributor

apatil commented May 17, 2011

This line: https://github.com/pymc-devs/pymc/blob/master/pymc/database/hdf5.py#L416 causes the following error if an observed stochastic has length of around 10k.

HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
#000: H5Adeprec.c line 165 in H5Acreate1(): unable to create attribute
major: Attribute
minor: Unable to initialize object
#1: H5A.c line 492 in H5A_create(): unable to create attribute in object header
major: Attribute
minor: Unable to insert object
#2: H5Oattribute.c line 346 in H5O_attr_create(): unable to create new attribute in header
major: Attribute
minor: Unable to insert object
#3: H5Omessage.c line 224 in H5O_msg_append_real(): unable to create new message
major: Object header
minor: No space available for allocation
#4: H5Omessage.c line 1925 in H5O_msg_alloc(): unable to allocate space for message
major: Object header
minor: Unable to initialize object
#5: H5Oalloc.c line 1135 in H5O_alloc(): object header message is too large
major: Object header
minor: Unable to initialize object
HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
#000: H5A.c line 916 in H5Awrite(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type

@huard
Copy link

huard commented May 17, 2011

Hi Anand,

I see two solutions here.

The 10000 limit is probably defined by a parameter somewhere in
pytables. I looked quickly in the pytables manual but did not find it,
but I'm relatively confident it is somewhere. Maybe worth asking on
their ML.

If the 10000 limit cannot be changed or if it causes performance
problem, then we'll need to find a way to store these
observed_stochastics elsewhere in the group, for instance in an array.
We'll need a try else statement both in the initialize function and in
Database.init to load the objects back in memory no matter whether
they are stored as table attributes or stand-alone arrays.

David

On Tue, May 17, 2011 at 8:41 AM, apatil
[email protected]
wrote:

This line: https://github.com/pymc-devs/pymc/blob/master/pymc/database/hdf5.py#L416 causes the following error if an observed stochastic has length of around 10k.

HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
 #000: H5Adeprec.c line 165 in H5Acreate1(): unable to create attribute
   major: Attribute
   minor: Unable to initialize object
 #1: H5A.c line 492 in H5A_create(): unable to create attribute in object header
   major: Attribute
   minor: Unable to insert object
 #2: H5Oattribute.c line 346 in H5O_attr_create(): unable to create new attribute in header
   major: Attribute
   minor: Unable to insert object
 #3: H5Omessage.c line 224 in H5O_msg_append_real(): unable to create new message
   major: Object header
   minor: No space available for allocation
 #4: H5Omessage.c line 1925 in H5O_msg_alloc(): unable to allocate space for message
   major: Object header
   minor: Unable to initialize object
 #5: H5Oalloc.c line 1135 in H5O_alloc(): object header message is too large
   major: Object header
   minor: Unable to initialize object
HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
 #000: H5A.c line 916 in H5Awrite(): not an attribute
   major: Invalid arguments to routine
   minor: Inappropriate type

Reply to this email directly or view it on GitHub:
#41

@scopatz
Copy link

scopatz commented Sep 5, 2011

This is not a PyTables issue, it is an HDF5 one. HDF5 limits the size of data type headers to 64k, which in practice is about ~1000 columns. I am currently looking for a workaround myself...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants