Skip to content

R rolling join equivalent in Pandas #7546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krismetha opened this issue Jun 22, 2014 · 7 comments
Closed

R rolling join equivalent in Pandas #7546

krismetha opened this issue Jun 22, 2014 · 7 comments
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@krismetha
Copy link

Hi,

In R, there is a way to perform a rolling join as shown below.

library("data.table")
DT <-
data.table(
x = rep(c("a", "b", "c"), each = 3),
y = c(1, 3, 6),
v = 1:9)
setkey(DT, x, y) # Necessary for following self join.
DT[J("a", 4:6)]
DT[J("a", 4:6), roll = TRUE]

The relevant output looks like this (J is shorthand for self join):

DT
x y v
1: a 1 1
2: a 3 2
3: a 6 3
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
DT[J("a", 4:6)] # v columns does not have "2".
x y v
1: a 4 NA
2: a 5 NA
3: a 6 3
DT[J("a", 4:6), roll = TRUE] # v column rolls "2" forward.
x y v
1: a 4 2
2: a 5 2
3: a 6 3

In Pandas, the closest that I could think of is to use Join then DataFrame,fillna(method='pad') . But this will not yield the above R rolling join result since the first 2 rows would be NA in the joined column (it starts with (a,3) and not (a,2))

I am just wondering whether there is an equivalent operation in Pandas for this?

Kind regards,
Kris

@jreback
Copy link
Contributor

jreback commented Jun 22, 2014

see #2996 and #5401

@krismetha
Copy link
Author

Hi,

I have taken a look at both of the above mentioned issues, but it is not really what this question is about.

Below is an equivalent code for Python using Pandas, as my first post was for R

import pandas as pd, numpy as np

df = pd.DataFrame( np.hstack(([['a']]_3 + [['b']]_3 +[['c']]*3 , [[1],[3],[6]] * 3 , [ [i] for i in range(1,10)])))
df = df.set_index([0,1])

df2 = pd.DataFrame( np.hstack(([['a']]*3 , [ [i] for i in range(4,7)] )))
df2 = df2.set_index([0,1])

image

image

image

So my question is whether there is an operator that is equivalent to R join argument roll = True above?

eg, something like df2.join(df, roll=True), and this gives:
image

the 2 in a,4 and a,5 comes from the fact that the original df.ix[a,3] is 2, which is the index right before df.ix[a,6] . In other words, this is a rolling join - rolling upward.

Looking at this another way, rolling join is similar to using join, then fillna(method='pad') but the slight difference is that the first few rows would still end up being NA, like the above case.

Thanks again,
Kris

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

closing as stale. pls reopen if still an issue.

@jreback jreback closed this as completed Oct 20, 2015
@Make42
Copy link

Make42 commented Apr 21, 2017

Rolling join is not join + fillna

A rolling join is not the same as a join and a fillna! That would only be the case if the key of the table that is joined against (in terms of data.table that would be the left table and a right-join) has equivalents in the main table. A data.table rolling join does not require this.

Solution in pandas:

There is a solution though in pandas. Let's assume your right data.table is table A and your left data.table is table B.

  1. Sort the table A and and B each by key.
  2. Add a column tag to A which are all 0 and a column tag to B that are all 1.
  3. Delete all columns except the key and tagfrom B (can be omitted, but it is clearer this way) and call the table B'. Keep B as an original - we are going to need it later.
  4. Concatenate A with B' to C and ignore the fact that the rows from B' has many NAs.
  5. Sort C by key.
  6. Make a new cumsum column with C = C.assign(groupNr = np.cumsum(C.tag))
  7. Using filtering (query) on tag get rid of all B'-rows.
  8. Add a running counter column groupNr to the original B (integers from 0 to N-1 or from 1 to N, depending on whether you want forward or backward rolling join).
  9. Join B with C on groupNr.

@PaluchowskiMatthew
Copy link

R newbie here. @Make42 Could you please provide short python code snippet with all steps or at least clarify step 7 because I am not sure I get it.

@Make42
Copy link

Make42 commented May 20, 2017

@PaluchowskiMatthew: I did an example for you. Maybe you can give me some points on stackoverflow: http://stackoverflow.com/a/43539437/4533188

@PaluchowskiMatthew
Copy link

@Make42 Awesome thanks! +1 on stackoverflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants