Skip to content

pandas read_sql reads the entire table in to memory despite specifying chunksize #13168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeetjitsu opened this issue May 13, 2016 · 4 comments
Labels
IO SQL to_sql, read_sql, read_sql_query

Comments

@jeetjitsu
Copy link
Contributor

jeetjitsu commented May 13, 2016

I was trying to process a massive table in chunks and therefore wanted to read the table in chunks and
process it.

When i tried reading the table using the pandas.read_sql_table i ran out of memory even though i
had passed in the chunksize parameter.

I'm using the mysqlclient for python3.

Code Sample, a copy-pastable example if possible

eng = sqlalchemy.create_engine("mysql+mysqldb://user:pass@localhost/db_name")
dframe = pandas.read_sql_table('table_name', eng, chunksize=100)

What i expected, was for the function to return an iterator that lazily loads the data into memory.
The documentation is not very clear about this nor have I found anything else on google.

Any further information on this will be appreciated.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 13, 2016

This is a known issue, and a limitation of most python database drivers (not something pandas can solve), but should be better documented (see #10693).

Similar issue: #12265 (with some additional explanation).

PRs to improve the docs are always welcome!

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label May 13, 2016
@jorisvandenbossche jorisvandenbossche added this to the No action milestone May 13, 2016
@jorisvandenbossche jorisvandenbossche changed the title pandas read_from_sql reads the entire table in to memory despite specifying chunksize pandas read_sql reads the entire table in to memory despite specifying chunksize May 13, 2016
@jeetjitsu
Copy link
Contributor Author

Thank you for the prompt reply. Would be happy to add to the documentation. But to take this a little further, is not possible to add this functionality pandas side with the limit offset trick? Especially when a sqlalchemy connection has been passed in

@jreback
Copy link
Contributor

jreback commented Jul 9, 2016

@jorisvandenbossche close?

@jorisvandenbossche
Copy link
Member

Yes, it's a doc issue, but that is already covered by #10693

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

3 participants