Add doc note on memory usage of read_sql with chunksize #10693

jorisvandenbossche · 2015-07-28T22:06:54Z

As this typically does not give you much memory usage improvement (which is a bit unexpected from the keyword explanation), this is worth a note in the docs.

From some discussion on gitter: https://gitter.im/pydata/pandas?at=55b61bf952d85d450f404be1 (with @litaotao) and https://gitter.im/pydata/pandas?at=554609295edd84254582fb39 (with @twiecki)

zirmite · 2015-09-24T14:46:33Z

It should give you memory improvement if the db api/engine is hooked up correctly, right?

Using it with sqlalchemy+psycopg2 it needs to make sure that sqlengine.dialect.server_side_cursors == True for the sqlengine.execute call to be separate from the results.fetchmany call. This seems to be a postgres-only issue and engine option because of how it handles cursors (perhaps?).

However, sqlalchemy's engine.execution_options API provides stream_results which will try to execute without pre-buffering.

stream_results – Available on: Connection, statement. Indicate to the dialect that results should be “streamed” and not pre-buffered, if possible. This is a limitation of many DBAPIs. The flag is currently understood only by the psycopg2 dialect.

Maybe pandas could try and set that flag if chunksize is not None?

dostabhi · 2019-10-11T11:16:22Z

Any update on this issue?

MordorianGuy · 2024-05-13T15:34:07Z

Does anyone know which parameters should be passed to iterate over Clickhouse query with the stream?

jreback added Docs IO SQL to_sql, read_sql, read_sql_query labels Jul 28, 2015

jorisvandenbossche added this to the Next Major Release milestone Jul 28, 2015

jorisvandenbossche mentioned this issue Feb 9, 2016

Reading table with chunksize still pumps the memory #12265

Closed

jorisvandenbossche mentioned this issue May 13, 2016

pandas read_sql reads the entire table in to memory despite specifying chunksize #13168

Closed

cloud-rocket mentioned this issue Aug 12, 2020

ENH: Support PostgreSQL server-side cursors to prevent memory hog on large datasets #35689

Open

axiomcura mentioned this issue May 13, 2022

Potential memory leak in SingleCell's .merge_single_cells() method cytomining/pycytominer#195

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add doc note on memory usage of read_sql with chunksize #10693

Add doc note on memory usage of read_sql with chunksize #10693

jorisvandenbossche commented Jul 28, 2015

zirmite commented Sep 24, 2015

Uh oh!

dostabhi commented Oct 11, 2019

Uh oh!

MordorianGuy commented May 13, 2024

Uh oh!

Uh oh!

Add doc note on memory usage of read_sql with chunksize #10693

Add doc note on memory usage of read_sql with chunksize #10693

Comments

jorisvandenbossche commented Jul 28, 2015

zirmite commented Sep 24, 2015

Uh oh!

dostabhi commented Oct 11, 2019

Uh oh!

MordorianGuy commented May 13, 2024

Uh oh!