Streamlines API #391

MarcCote · 2015-12-03T17:25:04Z

This is take two at providing a general API to manage different streamlines file format (the code can all be found in nibabel.streamlines.). This is related to this somewhat outdated BIAP5. I initially wanted to make two PRs but as @Garyfallidis pointed out, without the entire PR "it will make it difficult to try this with real data".

This new streamline API is divided in two main parts:

Representation of streamlines in memory (Tractogram and ArraySequence classes)
Loading and saving streamlines from/to different file format (TractogramFile class)

We use the term 'Tractogram' to refer to a set of streamlines, usually the output of a tractography algorithm.

ArraySequence

The ArraySequence class needs to be explained first as it is at the heart of the streamlines representation. I guess it could be seen as a sort of jagged/ragged array to store a list of 2D arrays that have different length but a same number of dimensions (e.g. a list of list of points). For efficiency, we store a list of streamlines contiguously in memory using what we call a ArraySequence. This was needed as @Garyfallidis and I realized that a list of numpy arrays has too much overhead in term of memory when handling >1M streamlines (loading a tractogram of 2.5Gb could take up to 5Gb in RAM!).

A ArraySequence behaves exactly like a list of 2D ndarray. That is, it supports appending, extending, indexing (supports slicing too), iterating, etc. as a list object. For example, indexing a ArraySequence object will return a 2D ndarray, as one would expect, representing the 3D points of the selected streamline. Note that, while ArraySequence does not support deletion (i.e. no del nor remove), one can efficiently use slicing to achieve the same goal.

A ArraySequence object can be instantiated using any iterable yielding 2D array:

import numpy as np
import nibabel as nib
data = [np.random.rand(10, 3), np.random.rand(3, 3)]
seq = nib.streamlines.ArraySequence(data)

Slicing a ArraySequence object returns a new ArraySequence object acting as a view (no memory duplication). Of course, one can use the copy method to "force" the memory duplication.

Tractogram

This part handles streamlines in memory using the Python class Tractogram. This class provides a mechanism to keep data associated to points of every streamline (data_per_point property). It can also keep data associated to each streamline (data_per_streamline property).

Regarding the reference space we used for the streamlines of a tractogram live in, we decided to follow the NiBabel convention: the streamlines are assumed to be in RAS+, points are expressed in millimetre and the coordinate of a voxel (eg. (0,0,0) or (1,2,3)) refers to the centre of that voxel.

A Tractogram objects supports indexing, slicing and iterating. For example, indexing a Tractogram object will return a TractogramItem object, representing a single streamline with its associated data. Note that, while Tractogram does not support deletion (i.e. no del nor remove), one can efficiently use slicing to achieve the same goal. Again, slicing a Tractogram object returns a new Tractogram object acting as a view (no memory duplication).

Here a small example of instantiating a Tractogram object:

import numpy as np
import nibabel as nib
data = [np.random.rand(10, 3), np.random.rand(3, 3)]
tractogram = nib.streamlines.Tractogram(streamlines=data)
colors = [(1, 0, 0), (0, 1, 0)]
tractogram.data_per_streamlines['colors'] = colors

Tractogram File

This part handles the saving and loading of Tractogram objects from/to the disk. Right now, as a proof of concept, only the TRK file format is supported (support for TCK is already in progress and will be added in a following PR).

The TractogramFile class is abstract. Each file format should have its own class inheriting TractogramFile (e.g., TrkFile).

Realistic example

Here a small example of loading a TRK file, computing the length of each streamlines, adding that information to the tractogram and saving it back.

import nibabel as nib
trk = nib.streamlines.load('my_streamlines.trk')

from dipy.tracking.streamline import length
lengths = length(trk.tractogram.streamlines)

trk.tractogram.data_per_streamline['lengths'] = lengths

new_trk = nib.streamlines.TrkFile(trk.tractogram, trk.header)
nib.streamlines.save(new_trk, "my_streamlines.trk")

EDIT: Renamed CompactList to ArraySequence

arokem · 2015-12-03T19:22:46Z

As a straw-man, I would suggest to see whether it would be easy to apply these same classes to the PDB file format (#313). At first blush, the tractogram object might be able to handle all features of this file format as well. I have not looked at the code yet, but I see no mention of the affine transformation being stored together with the Tractogram object. Would that be a good idea? Maybe it's already there -- looking at the code next.

MarcCote · 2015-12-04T05:17:26Z

@arokem: at first glance, the PDB seems easy to support by this streamlines API. Regarding the affine transformation, I struggled a lot with that: should we keep it? If so, where do we keep it? In the Tractogram object or a separate TractogramHeader object?

Presently, the affine transformation is not stored in the Tractogram and assumed the streamlines are always in RAS+mm (i.e. the world space according to Nifti). Not all formats contain information about the affine transformation (e.g. TCK from MRtrix) and sometime it even depends on the software (e.g. VTK files outputted by Camino, mitk, etc.). For this reason, I think it makes more sense to keep that information in the TractogramFile object that is responsible to load/save tractograms correctly.

At first, I thought about creating a complex and generic TractogramHeader class to store all information that could be shared across different file format. In the end, it was too much trouble. So, I finally decided to add an attribute named header (a simple dict) to the TractogramFile class and stored the affine in it, along other information provided by the file format.

To make it easy to convert from one header's format to another, I've created a static class (some sort of enum) named nib.streamlines.header.Field containing the most common header fields (e.g. Field.NB_STREAMLINES). So, for a given file format, whenever I read some header information that I know has a corresponding Field's attribute, I used it as the key in the header dictionary of the TractogramFile object.

I hope it's clearer. Don't hesitate to ask question, give suggestion or comment about the code or how the streamlines API works.

Garyfallidis · 2015-12-30T20:29:33Z

We need this is in asap to reduce memory usage of tractographies. @MarcCote is it hard to increase the coverage so we can get the green light for merging? Let us know if you need some help.

samuelstjean · 2016-01-04T15:56:11Z

Does it really reduce memory that much? Fiber compression could probably help you for that meanwhile, but the possibility to convert from one format to another for interoperability (visualization, extraction, roi drawing in other tools for example) would be really useful.

Garyfallidis · 2016-01-08T21:27:08Z

No, unfortunately even if you use compression you first need to load the initial streamlines before compressing them. When you do that you will still block around 2-3X more memory than the actual size of your datasets.

samuelstjean · 2016-01-09T14:36:09Z

They can be loaded only one at a time for compressing, its probably
something with the implementation. That being said, I don't know if the
original one and the python rewrite handle loading in the same way though,
might be worth asking francois about that.
On Jan 8, 2016 22:27, "Eleftherios Garyfallidis" [email protected]
wrote:

No, unfortunately even if you use compression you first need to load the
initial streamlines before compressing them. When you do that you will
still block around 2-3X more memory than the actual size of your datasets.

—
Reply to this email directly or view it on GitHub
#391 (comment).

MarcCote · 2016-01-11T21:49:39Z

@Garyfallidis okay I found the problem, I wasn't adding the tests folder to Nibabel's packages in the setup.py.

@samuelstjean, loading one streamline at a time will indeed reduce the memory but the overhead @Garyfallidis is talking about will still be present. Having a list of numpy.array is just not as memory efficient as concatenating all ndarrays in one big numpy array (that is what nibabel.streamlines.CompactList is doing).

samuelstjean · 2016-01-12T08:57:28Z

I meant that if it's that bad, try to compress with the C++ version, then
work from there, that should get you going in the meantime. I can compress
a whole 2go of streamlines down to 300mo in like 7 minutes, so it's not too
bad.

2016-01-11 22:49 GMT+01:00 Marc-Alexandre Côté [email protected]:

@Garyfallidis https://github.com/Garyfallidis okay I found the problem,
I wasn't adding the tests folder to Nibabel's packages in the setup.py.

@samuelstjean https://github.com/samuelstjean, loading one streamline
at a time will indeed reduce the memory but the overhead @Garyfallidis
https://github.com/Garyfallidis is talking about will still be present.
Having a list of numpy.array is just not as memory efficient as
concatenating all ndarrays in one big numpy array (that is what
nibabel.streamlines.CompactList is doing).

—
Reply to this email directly or view it on GitHub
#391 (comment).

MarcCote · 2016-01-13T03:27:27Z

I'm a bit confused about the errors I got on appveyor: MemoryError and PermissionError. @matthew-brett have you encountered that error before?
Also, I'm not sure what's causing the UnpicklingError with Python2.6. I'm simply calling np.savez and np.load as usual.

matthew-brett · 2016-01-13T03:39:23Z

nibabel/streamlines/tests/test_streamlines.py

+        for ext, cls in nib.streamlines.FORMATS.items():
+            with tempfile.NamedTemporaryFile(mode="w+b", suffix=ext) as f:
+                with clear_and_catch_warnings(record=True, modules=[trk]) as w:
+                    nib.streamlines.save_tractogram(complex_tractogram, f.name)


I think the problem here (and with the other permission errors) is that Windows doesn't like you re-opening the NamedTemporaryFile.

What I tend to do is to use the nibabel.tmpdirs.InTemporaryDirectory context manager and then load / save with some filename. When you do this, you have to make sure all objects holding onto the file get deleted though, otherwise Windows (again) may refuse to delete the temporary file...

Okay, I'll do that. Thanks.

matthew-brett · 2016-01-13T03:40:15Z

Not sure what's happening with the unpickling - can you replicate with a virtualenv locally?

MarcCote · 2016-01-13T03:48:42Z

I'm trying to install python 2.6 and I'll check the unpickling error.

MarcCote · 2016-01-13T04:56:55Z

Almost there. There is only the MemoryError I don't understand what's happening.

matthew-brett · 2016-01-13T05:18:09Z

Yes, not sure what's happening there either. How big are the arrays it's trying to create there? Could the process have run out of memory? Could the shape be unexpected?

MarcCote · 2016-01-13T13:37:30Z

The shape of my 2D array is random but should be bound between 10 and 50 rows with 3 columns, and I have 10 of those arrays:
data = [rng.rand(rng.randint(10, 50), 3) for _ in range(10)]

What's the memory limit on that machine?

matthew-brett · 2016-01-13T17:22:33Z

I don't know the memory limit, but it can't be so low that it cannot make those arrays.

Would you mind sending me your ssh public key by email, so I can let you trigger builds on the buildbots? It's easier to debug on those machines, I'll explain when I've got you set up.

MarcCote · 2016-01-13T21:21:32Z

I was wrong, I use a buffer of 128Mb when creating a CompactList from a generator to avoid allocating memory for each item of the generator. I'll try reducing it and check if it works.

matthew-brett · 2016-01-13T21:27:02Z

Maybe adapt buffer to system memory somehow?

MarcCote · 2016-01-13T21:42:09Z

That was it, I reduced it to 1Mb and now it works. Right now, the value is hardcoded in the code as a constant. Is there a better way to automate that, or make it more flexible?

MarcCote · 2016-01-13T21:47:01Z

There is the library psutil which is cross-platform and allows to get the available memory but that would had an additional dependency for NiBabel.

matthew-brett · 2016-01-13T22:43:59Z

Maybe benchmark whether there's a difference between 1M and 128M?

MarcCote · 2016-01-14T02:33:31Z

On my computer (using an SSD), with a buffer of 4Mb it took 7.8sec to read a tractogram file of 1.4Gb containing 500k streamlines. With 128Mb, it took 7.5sec.

MarcCote · 2016-01-14T02:36:38Z

Now, it is all green. 👍

Is there something else to do? From now, I think the best test we can have is using it in real life applications. To be sure, is there anyone else that want to have a look at this PR or play with it but didn't had time? @arokem did you try to add support to the PDB file format? @MrBago any comments on the new framework?

matthew-brett · 2016-01-14T02:37:46Z

I guess the speed difference should be less when you've got a normal hard drive - so 1M probably fine. Any benefit from 10M?

matthew-brett · 2016-01-14T02:38:33Z

Sorry I mean 4M probably fine. I guess the chances are that 10 or 8 won't make any difference.

MarcCote · 2016-01-14T03:15:54Z

10Mb was still causing a MemoryError, but no, it didn't make any difference regarding the speed. Just tried 8Mb, not causing a MemoryError, speed relatively the same. I'll go for 8 Mb then.

matthew-brett · 2016-01-14T08:02:51Z

Still getting a memory error with 8M on appveyor? Back off to 4M again?

MarcCote · 2016-01-14T13:09:30Z

Guess so.

Garyfallidis · 2016-01-17T21:29:04Z

This PR seems ready to be merged. @matthew-brett would you like to have the honour to merge this the next couple of days if nobody else has something to add? After this is merged together with @MarcCote we will start making the API changes in DIPY to accommodate for the new API for streamlines. If we can have this issue fixed for 0.11 it would be fantastic! Much luv to all and many thx to @MarcCote for the excellent work.

Test reset of file position when reading trackvis header. This revealed that we were using the wrong flag to `seek`.

Use ``assert_arr_dict_equal`` to test whether two headers are equal, rather than custom testing function.

Add tests for encoding / decoding numerical values in byte string fields. Refactor encoding. Update comments / docstrings to note new encoding with numbers as ASCII strings.

Add docstring for function giving RAS+ affine from trackvis header.

We discussed this at https://github.com/nipy/nibabel/pull/391/files#r57834220

matthew-brett · 2016-06-07T01:25:24Z

nibabel/streamlines/tractogram.py

+        return lazy_tractogram
+
+    @classmethod
+    def create_from(cls, data_func):


Maybe rename to from_data_func ?

matthew-brett · 2016-06-07T02:06:37Z

nibabel/streamlines/tractogram.py

+
+        # We make sure there is the right amount of data.
+        if (self.nb_elements is not None and
+                value.nb_elements != self.nb_elements):


Did you think of a better name for nb_elements for the arraysequence ?

If we think about what ArraySequence means it is an array of sequences. So calling len(arr_seq) gives you the number of sequences and arr_seq.nb_elements is the total number of sequences' items. What about: sequences_(items|elements)_count or nb_sequences_(items|elements)?

After speaking with @jchoude, what about using the term atom to refer to a "row" of an ArraySequence._data. So, an ArraySequence is made of atoms which come from several sequences' elements.

We could then replace element in the docstring with atom when we refer to the row of the _data. nb_elements would become either nb_atoms or atoms_count.

No strong preference. How about total_nb_rows?

And use row to refer to an element ?

Anyway - chose as you like for nb_elements, then I think we're ready to merge, if you agree.

RF: small streamlines API PR edits

coveralls · 2016-06-07T13:34:09Z

Coverage increased (+0.4%) to 96.102% when pulling ce9b44a on MarcCote:streamlines_api into 83d3249 on nipy:master.

coveralls · 2016-06-07T13:35:25Z

Coverage increased (+0.4%) to 96.102% when pulling 8952c23 on MarcCote:streamlines_api into 83d3249 on nipy:master.

Try dropping the special cases for the contructors for sliceable dicts, and deal with None as input value in the tractogram properties.

Suggestion: simplify constructor for sliceable dicts

coveralls · 2016-06-07T18:37:36Z

Coverage increased (+0.4%) to 96.098% when pulling 0324bce on MarcCote:streamlines_api into 83d3249 on nipy:master.

coveralls · 2016-06-07T19:42:03Z

Coverage increased (+0.4%) to 96.098% when pulling 0a22b93 on MarcCote:streamlines_api into 83d3249 on nipy:master.

matthew-brett · 2016-06-07T21:39:11Z

OK - finally done - congratulations. Thanks too for your patience and good grace in responding to all the comments.

arokem · 2016-06-07T21:58:20Z

This is fantastic! Congratulations to @MarcCote and everyone else involved!

Question: when might we see this in a released version of nibabel? In other words -- what is the road map towards a 2.1 (assuming that's the next release)?

jchoude · 2016-06-08T01:59:18Z

Fantastic! Good job @MarcCote and everyone!

demianw · 2016-06-08T04:04:36Z

Congrats @MarcCote and everyone for this titanic piece of work

MarcCote · 2016-06-08T11:39:26Z

😃

MRG: Support TCK streamlines file format This PR adds support for MRtrix streamlines file to the Streamlines API (#391). Eventually, in a future PR, I will add some functions that allows easy conversion between all different formats. For now, going from TRK to TCK is easy. See https://gist.github.com/MarcCote/ea6842cc4c3950f7596fc3c8a0be0154 Converting from TCK to TRK requires additional information from a reference anatomy. See https://gist.github.com/MarcCote/ebf0ae83a9202eb96d1dfbb949cd98af

MarcCote mentioned this pull request Dec 3, 2015

Streamlines API #243

Closed

matthew-brett reviewed Jan 13, 2016
View reviewed changes

matthew-brett added 6 commits June 6, 2016 16:36

BF+TST: test and fix header file position reset

bf8b6b5

Test reset of file position when reading trackvis header. This revealed that we were using the wrong flag to `seek`.

RF: use generic testing function to hdr1=hdr2

959a1c7

Use ``assert_arr_dict_equal`` to test whether two headers are equal, rather than custom testing function.

RF+TST: add tests for en/decoding value in fields

84dd196

Add tests for encoding / decoding numerical values in byte string fields. Refactor encoding. Update comments / docstrings to note new encoding with numbers as ASCII strings.

DOC: add docstring for function giving affine

a8af94c

Add docstring for function giving RAS+ affine from trackvis header.

DOC: small edits to tractogram_file docstrings

16acde3

RF: remove duplicate methods for attribute access

d910a5e

We discussed this at https://github.com/nipy/nibabel/pull/391/files#r57834220

matthew-brett reviewed Jun 7, 2016
View reviewed changes

DOC: edits to docstrings

d231004

matthew-brett reviewed Jun 7, 2016
View reviewed changes

MarcCote added 2 commits June 7, 2016 09:04

Merge pull request #9 from matthew-brett/small-streamlines-edits

ce9b44a

RF: small streamlines API PR edits

Renamed LazyTractogram.create_from to Lazy_Tractogram.from_data_func

8952c23

matthew-brett and others added 3 commits June 7, 2016 10:50

RF: simplify constructor for sliceable dicts

f34b779

Try dropping the special cases for the contructors for sliceable dicts, and deal with None as input value in the tractogram properties.

DOC: fix hash typo in docstring

2dd2d4a

Merge pull request #10 from matthew-brett/suggestion-for-per-array-dicts

0324bce

Suggestion: simplify constructor for sliceable dicts

Renamed nb_elements to total_nb_rows

0a22b93

matthew-brett merged commit 9959c74 into nipy:master Jun 7, 2016

MarcCote deleted the streamlines_api branch June 8, 2016 11:39

MarcCote mentioned this pull request Sep 13, 2016

ENH: Support TCK streamlines file format #486

Merged

effigies mentioned this pull request Apr 25, 2019

Support pdb files #313

Open

MarcCote mentioned this pull request Mar 21, 2025

ENH: Remove dependency on nibabel.trackvis demianw/tract_querier#64

Merged

Streamlines API #391

Streamlines API #391

Conversation

MarcCote commented Dec 3, 2015

ArraySequence

Tractogram

Tractogram File

Realistic example

arokem commented Dec 3, 2015

MarcCote commented Dec 4, 2015

Garyfallidis commented Dec 30, 2015

samuelstjean commented Jan 4, 2016

Garyfallidis commented Jan 8, 2016

samuelstjean commented Jan 9, 2016

MarcCote commented Jan 11, 2016

samuelstjean commented Jan 12, 2016

MarcCote commented Jan 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthew-brett commented Jan 13, 2016

MarcCote commented Jan 13, 2016

MarcCote commented Jan 13, 2016

matthew-brett commented Jan 13, 2016

MarcCote commented Jan 13, 2016

matthew-brett commented Jan 13, 2016

MarcCote commented Jan 13, 2016

matthew-brett commented Jan 13, 2016

MarcCote commented Jan 13, 2016

MarcCote commented Jan 13, 2016

matthew-brett commented Jan 13, 2016

MarcCote commented Jan 14, 2016

MarcCote commented Jan 14, 2016

matthew-brett commented Jan 14, 2016

matthew-brett commented Jan 14, 2016

MarcCote commented Jan 14, 2016

matthew-brett commented Jan 14, 2016

MarcCote commented Jan 14, 2016

Garyfallidis commented Jan 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Jun 7, 2016

coveralls commented Jun 7, 2016

coveralls commented Jun 7, 2016

coveralls commented Jun 7, 2016

matthew-brett commented Jun 7, 2016

arokem commented Jun 7, 2016

jchoude commented Jun 8, 2016

demianw commented Jun 8, 2016 • edited Loading

MarcCote commented Jun 8, 2016

demianw commented Jun 8, 2016 •

edited

Loading