-
Notifications
You must be signed in to change notification settings - Fork 261
Streamlines API #391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamlines API #391
Conversation
As a straw-man, I would suggest to see whether it would be easy to apply these same classes to the PDB file format (#313). At first blush, the tractogram object might be able to handle all features of this file format as well. I have not looked at the code yet, but I see no mention of the affine transformation being stored together with the Tractogram object. Would that be a good idea? Maybe it's already there -- looking at the code next. |
@arokem: at first glance, the PDB seems easy to support by this streamlines API. Regarding the affine transformation, I struggled a lot with that: should we keep it? If so, where do we keep it? In the Presently, the affine transformation is not stored in the At first, I thought about creating a complex and generic To make it easy to convert from one header's format to another, I've created a static class (some sort of I hope it's clearer. Don't hesitate to ask question, give suggestion or comment about the code or how the streamlines API works. |
We need this is in asap to reduce memory usage of tractographies. @MarcCote is it hard to increase the coverage so we can get the green light for merging? Let us know if you need some help. |
Does it really reduce memory that much? Fiber compression could probably help you for that meanwhile, but the possibility to convert from one format to another for interoperability (visualization, extraction, roi drawing in other tools for example) would be really useful. |
No, unfortunately even if you use compression you first need to load the initial streamlines before compressing them. When you do that you will still block around 2-3X more memory than the actual size of your datasets. |
They can be loaded only one at a time for compressing, its probably
|
@Garyfallidis okay I found the problem, I wasn't adding the tests folder to Nibabel's packages in the @samuelstjean, loading one streamline at a time will indeed reduce the memory but the overhead @Garyfallidis is talking about will still be present. Having a |
I meant that if it's that bad, try to compress with the C++ version, then 2016-01-11 22:49 GMT+01:00 Marc-Alexandre Côté [email protected]:
|
I'm a bit confused about the errors I got on appveyor: MemoryError and PermissionError. @matthew-brett have you encountered that error before? |
for ext, cls in nib.streamlines.FORMATS.items(): | ||
with tempfile.NamedTemporaryFile(mode="w+b", suffix=ext) as f: | ||
with clear_and_catch_warnings(record=True, modules=[trk]) as w: | ||
nib.streamlines.save_tractogram(complex_tractogram, f.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here (and with the other permission errors) is that Windows doesn't like you re-opening the NamedTemporaryFile.
What I tend to do is to use the nibabel.tmpdirs.InTemporaryDirectory
context manager and then load / save with some filename. When you do this, you have to make sure all objects holding onto the file get deleted though, otherwise Windows (again) may refuse to delete the temporary file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll do that. Thanks.
Not sure what's happening with the unpickling - can you replicate with a virtualenv locally? |
I'm trying to install python 2.6 and I'll check the unpickling error. |
Almost there. There is only the MemoryError I don't understand what's happening. |
Yes, not sure what's happening there either. How big are the arrays it's trying to create there? Could the process have run out of memory? Could the |
The shape of my 2D array is random but should be bound between 10 and 50 rows with 3 columns, and I have 10 of those arrays: What's the memory limit on that machine? |
I don't know the memory limit, but it can't be so low that it cannot make those arrays. Would you mind sending me your ssh public key by email, so I can let you trigger builds on the buildbots? It's easier to debug on those machines, I'll explain when I've got you set up. |
I was wrong, I use a buffer of 128Mb when creating a |
Maybe adapt buffer to system memory somehow? |
That was it, I reduced it to 1Mb and now it works. Right now, the value is hardcoded in the code as a constant. Is there a better way to automate that, or make it more flexible? |
There is the library psutil which is cross-platform and allows to get the available memory but that would had an additional dependency for NiBabel. |
Maybe benchmark whether there's a difference between 1M and 128M? |
On my computer (using an SSD), with a buffer of 4Mb it took 7.8sec to read a tractogram file of 1.4Gb containing 500k streamlines. With 128Mb, it took 7.5sec. |
Now, it is all green. 👍 Is there something else to do? From now, I think the best test we can have is using it in real life applications. To be sure, is there anyone else that want to have a look at this PR or play with it but didn't had time? @arokem did you try to add support to the PDB file format? @MrBago any comments on the new framework? |
I guess the speed difference should be less when you've got a normal hard drive - so 1M probably fine. Any benefit from 10M? |
Sorry I mean 4M probably fine. I guess the chances are that 10 or 8 won't make any difference. |
10Mb was still causing a MemoryError, but no, it didn't make any difference regarding the speed. Just tried 8Mb, not causing a MemoryError, speed relatively the same. I'll go for 8 Mb then. |
Still getting a memory error with 8M on appveyor? Back off to 4M again? |
Guess so. |
This PR seems ready to be merged. @matthew-brett would you like to have the honour to merge this the next couple of days if nobody else has something to add? After this is merged together with @MarcCote we will start making the API changes in DIPY to accommodate for the new API for streamlines. If we can have this issue fixed for 0.11 it would be fantastic! Much luv to all and many thx to @MarcCote for the excellent work. |
Test reset of file position when reading trackvis header. This revealed that we were using the wrong flag to `seek`.
Use ``assert_arr_dict_equal`` to test whether two headers are equal, rather than custom testing function.
Add tests for encoding / decoding numerical values in byte string fields. Refactor encoding. Update comments / docstrings to note new encoding with numbers as ASCII strings.
Add docstring for function giving RAS+ affine from trackvis header.
return lazy_tractogram | ||
|
||
@classmethod | ||
def create_from(cls, data_func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to from_data_func
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
|
||
# We make sure there is the right amount of data. | ||
if (self.nb_elements is not None and | ||
value.nb_elements != self.nb_elements): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you think of a better name for nb_elements
for the arraysequence ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we think about what ArraySequence
means it is an array of sequences. So calling len(arr_seq)
gives you the number of sequences and arr_seq.nb_elements
is the total number of sequences' items. What about: sequences_(items|elements)_count
or nb_sequences_(items|elements)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After speaking with @jchoude, what about using the term atom
to refer to a "row" of an ArraySequence._data
. So, an ArraySequence is made of atoms which come from several sequences' elements.
We could then replace element
in the docstring with atom
when we refer to the row of the _data
. nb_elements
would become either nb_atoms
or atoms_count
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong preference. How about total_nb_rows
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And use row
to refer to an element
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway - chose as you like for nb_elements
, then I think we're ready to merge, if you agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
RF: small streamlines API PR edits
Try dropping the special cases for the contructors for sliceable dicts, and deal with None as input value in the tractogram properties.
Suggestion: simplify constructor for sliceable dicts
OK - finally done - congratulations. Thanks too for your patience and good grace in responding to all the comments. |
This is fantastic! Congratulations to @MarcCote and everyone else involved! Question: when might we see this in a released version of nibabel? In other words -- what is the road map towards a 2.1 (assuming that's the next release)? |
Fantastic! Good job @MarcCote and everyone! |
Congrats @MarcCote and everyone for this titanic piece of work |
😃 |
MRG: Support TCK streamlines file format This PR adds support for MRtrix streamlines file to the Streamlines API (#391). Eventually, in a future PR, I will add some functions that allows easy conversion between all different formats. For now, going from TRK to TCK is easy. See https://gist.github.com/MarcCote/ea6842cc4c3950f7596fc3c8a0be0154 Converting from TCK to TRK requires additional information from a reference anatomy. See https://gist.github.com/MarcCote/ebf0ae83a9202eb96d1dfbb949cd98af
This is take two at providing a general API to manage different streamlines file format (the code can all be found in
nibabel.streamlines.
). This is related to this somewhat outdated BIAP5. I initially wanted to make two PRs but as @Garyfallidis pointed out, without the entire PR "it will make it difficult to try this with real data".This new streamline API is divided in two main parts:
Tractogram
andArraySequence
classes)TractogramFile
class)We use the term 'Tractogram' to refer to a set of streamlines, usually the output of a tractography algorithm.
ArraySequence
The
ArraySequence
class needs to be explained first as it is at the heart of the streamlines representation. I guess it could be seen as a sort of jagged/ragged array to store a list of 2D arrays that have different length but a same number of dimensions (e.g. a list of list of points). For efficiency, we store a list of streamlines contiguously in memory using what we call aArraySequence
. This was needed as @Garyfallidis and I realized that a list of numpy arrays has too much overhead in term of memory when handling >1M streamlines (loading a tractogram of 2.5Gb could take up to 5Gb in RAM!).A
ArraySequence
behaves exactly like alist
of2D ndarray
. That is, it supports appending, extending, indexing (supports slicing too), iterating, etc. as alist
object. For example, indexing aArraySequence
object will return a 2Dndarray
, as one would expect, representing the 3D points of the selected streamline. Note that, whileArraySequence
does not support deletion (i.e. nodel
norremove
), one can efficiently use slicing to achieve the same goal.A
ArraySequence
object can be instantiated using any iterable yielding2D array
:Slicing a
ArraySequence
object returns a newArraySequence
object acting as a view (no memory duplication). Of course, one can use thecopy
method to "force" the memory duplication.Tractogram
This part handles streamlines in memory using the Python class
Tractogram
. This class provides a mechanism to keep data associated to points of every streamline (data_per_point
property). It can also keep data associated to each streamline (data_per_streamline
property).Regarding the reference space we used for the streamlines of a tractogram live in, we decided to follow the NiBabel convention: the streamlines are assumed to be in RAS+, points are expressed in millimetre and the coordinate of a voxel (eg. (0,0,0) or (1,2,3)) refers to the centre of that voxel.
A
Tractogram
objects supports indexing, slicing and iterating. For example, indexing aTractogram
object will return aTractogramItem
object, representing a single streamline with its associated data. Note that, whileTractogram
does not support deletion (i.e. nodel
norremove
), one can efficiently use slicing to achieve the same goal. Again, slicing aTractogram
object returns a newTractogram
object acting as a view (no memory duplication).Here a small example of instantiating a
Tractogram
object:Tractogram File
This part handles the saving and loading of
Tractogram
objects from/to the disk. Right now, as a proof of concept, only the TRK file format is supported (support for TCK is already in progress and will be added in a following PR).The
TractogramFile
class is abstract. Each file format should have its own class inheritingTractogramFile
(e.g.,TrkFile
).Realistic example
Here a small example of loading a TRK file, computing the length of each streamlines, adding that information to the tractogram and saving it back.
EDIT: Renamed
CompactList
toArraySequence