Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How could I filter a df using a date column? #560

Open
thais-molica opened this issue Feb 9, 2023 · 2 comments
Open

How could I filter a df using a date column? #560

thais-molica opened this issue Feb 9, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@thais-molica
Copy link

Hi, everyone. I'm new to the data manipulation and I'm learning with danfojs.
I have a dataframe, like this:

const data = [{id: 000, timestamp: "2015-01-01T02:00:00.000Z"}]
const df = new DataFrame(data)

I would like to filter this dataframe using this column timestamp. I tried something like:

df.query([df['timestamp'] > "2015-01-01T02:00:00.000Z" && df['timestamp'] < "2015-06-01T02:00:00.000Z" ])

But it only returned the first row when it should've returned more data. I haven't found nothing like this in the documentarion. Could anybody help me, please?
Thanks

@thais-molica
Copy link
Author

Here's an example (not working)
I'm trying to follow this pandas tutorial

@kitfit-dave
Copy link
Contributor

I don't know what is going on with toDateTime and getting the date column into the Series or DataFrame (I can't make that work, I think it only supports string | number | boolean anyway), but you don't have to do that. Just note that DataFrame.query does not take a predicate or anything like that, it takes a Series (or array) that is as long as the Column or Series you are querying, with boolean values as a mask for whether that element is to be returned or filtered out. So you just need to make that mask, there may be other ways, but this is one way:

const mask = df.column('timestamp').apply((timestamp) => {
	// mask using string
	return timestamp >= '2015-02-01T00:00:00.000Z' && timestamp < '2015-04-01T00:00:00.000Z'
})
df.query(mask).print()

ISO format dates are in time order when they are in alphabetical order, so long as they are all in Zulu time, or the same timezone offset. If you are dealing with different tz offsets, or just want to do the date comparison for some other reason, you can do that like this:

const mask = df.column('timestamp').apply((timestamp) => {
	// or mask using date (consider timezones offset)
	const date = new Date(timestamp)
	return date >= new Date('2015-02-01T00:00:00.000Z') && date < new Date('2015-04-01T00:00:00.000Z')
})

i.e. if one of your rows had timestamp: '2015-04-01T01:00:00.000+10:00' that would become 2015-03-31T15:00:00.000Z when made into a Date, and it would be selected by the query

@risenW risenW added the bug Something isn't working label Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants