pandas read_sql reads the entire table in to memory despite specifying chunksize #13168

jeetjitsu · 2016-05-13T10:11:02Z

I was trying to process a massive table in chunks and therefore wanted to read the table in chunks and
process it.

When i tried reading the table using the pandas.read_sql_table i ran out of memory even though i
had passed in the chunksize parameter.

I'm using the mysqlclient for python3.

Code Sample, a copy-pastable example if possible

eng = sqlalchemy.create_engine("mysql+mysqldb://user:pass@localhost/db_name")
dframe = pandas.read_sql_table('table_name', eng, chunksize=100)

What i expected, was for the function to return an iterator that lazily loads the data into memory.
The documentation is not very clear about this nor have I found anything else on google.

Any further information on this will be appreciated.

jorisvandenbossche · 2016-05-13T10:45:57Z

This is a known issue, and a limitation of most python database drivers (not something pandas can solve), but should be better documented (see #10693).

Similar issue: #12265 (with some additional explanation).

PRs to improve the docs are always welcome!

jeetjitsu · 2016-05-13T11:53:56Z

Thank you for the prompt reply. Would be happy to add to the documentation. But to take this a little further, is not possible to add this functionality pandas side with the limit offset trick? Especially when a sqlalchemy connection has been passed in

jreback · 2016-07-09T16:40:21Z

@jorisvandenbossche close?

jorisvandenbossche · 2016-07-11T07:43:57Z

Yes, it's a doc issue, but that is already covered by #10693

jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label May 13, 2016

jorisvandenbossche added this to the No action milestone May 13, 2016

jorisvandenbossche changed the title ~~pandas read_from_sql reads the entire table in to memory despite specifying chunksize~~ pandas read_sql reads the entire table in to memory despite specifying chunksize May 13, 2016

jorisvandenbossche closed this as completed Jul 11, 2016

cloud-rocket mentioned this issue Aug 12, 2020

ENH: Support PostgreSQL server-side cursors to prevent memory hog on large datasets #35689

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pandas read_sql reads the entire table in to memory despite specifying chunksize #13168

pandas read_sql reads the entire table in to memory despite specifying chunksize #13168

jeetjitsu commented May 13, 2016 •

edited by jorisvandenbossche

Loading

jorisvandenbossche commented May 13, 2016 •

edited

Loading

Uh oh!

jeetjitsu commented May 13, 2016

Uh oh!

jreback commented Jul 9, 2016

Uh oh!

jorisvandenbossche commented Jul 11, 2016

Uh oh!

Uh oh!

pandas read_sql reads the entire table in to memory despite specifying chunksize #13168

pandas read_sql reads the entire table in to memory despite specifying chunksize #13168

Comments

jeetjitsu commented May 13, 2016 • edited by jorisvandenbossche Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Sample, a copy-pastable example if possible

jorisvandenbossche commented May 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeetjitsu commented May 13, 2016

Uh oh!

jreback commented Jul 9, 2016

Uh oh!

jorisvandenbossche commented Jul 11, 2016

Uh oh!

jeetjitsu commented May 13, 2016 •

edited by jorisvandenbossche

Loading

jorisvandenbossche commented May 13, 2016 •

edited

Loading