-
Notifications
You must be signed in to change notification settings - Fork 77
New concatenate function to join tree sequences #3164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This would be very useful I think. A few minor points. I don't see any point in having a default value for I think it would be clearer if we make the pairwise "append" operation the basic unit here rather than the more complex So, if we focus on clearly defining the semantics of ``tc1.append(ts2) then I think defining for tc in tcs:
tc1.append(tc) will be straightforward |
My thought was that we could then use
Yes, I wondered about that. But this would mean lots of creating multiple tree sequences, in the TreeSequence form, I think?
I guess we could just say to drop into doing things on tables if that's a concern. I slightly worried whether |
That sounds like a different operation then. I think the operation you want is We can define the TreeSequence equivalents later on on top of the TableCollection ops. |
Yeah, the name is a bit overloaded. I guess we could just tell people who wanted to change the sequence length to use
Yep, this seems sensible. We can figure out any other operations on top of this, as you say. I think both should be exposed as TC and TS methods. |
Great idea. Minor note: in the docstring for |
Fixes tskit-dev#3164 Update python/tskit/tables.py
Fixes tskit-dev#3164 Update python/tskit/tables.py
Fixes tskit-dev#3164 Update python/tskit/tables.py
A number of people (including me) have been trying to join two tree sequences together, end to end, e.g. when obtained from different chromosomes. I have messed this up (message in slack, I forgot that
keep_intervals
can also change node IDs because it automatically simplifies), and I suspect others find this tricky too.I propose two new main TableCollection methods,
shift
andconcatenate
(based on the numpy function, which concatenates multiple things together), and possibly an additional shortcut method, provisionally calledappend
, which simply concatenates two TableCollections. These can be exposed in the normal way as TreeSequence methods.As this essentially only relies on
union
, I think most of the behavioural edge cases & errors should be caught in that function. The helpful addition here is to (a) shift the coordinates of the other TC to fit and (b) by default, only map the samples together.I think it's helpful to have a function that doesn't require the user to wrap the
[other]
in a list, but rather than have a separateappend
method, we could allow concatenate to take either a list of Table Collections, or a single table collection? Or we could take all non-keyword arguments to be additional TableCollections.The text was updated successfully, but these errors were encountered: