Skip to content

Add doc note on memory usage of read_sql with chunksize #10693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorisvandenbossche opened this issue Jul 28, 2015 · 3 comments
Open

Add doc note on memory usage of read_sql with chunksize #10693

jorisvandenbossche opened this issue Jul 28, 2015 · 3 comments
Labels
Docs IO SQL to_sql, read_sql, read_sql_query

Comments

@jorisvandenbossche
Copy link
Member

As this typically does not give you much memory usage improvement (which is a bit unexpected from the keyword explanation), this is worth a note in the docs.

From some discussion on gitter: https://gitter.im/pydata/pandas?at=55b61bf952d85d450f404be1 (with @litaotao) and https://gitter.im/pydata/pandas?at=554609295edd84254582fb39 (with @twiecki)

@jreback jreback added Docs IO SQL to_sql, read_sql, read_sql_query labels Jul 28, 2015
@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone Jul 28, 2015
@zirmite
Copy link

zirmite commented Sep 24, 2015

It should give you memory improvement if the db api/engine is hooked up correctly, right?

Using it with sqlalchemy+psycopg2 it needs to make sure that sqlengine.dialect.server_side_cursors == True for the sqlengine.execute call to be separate from the results.fetchmany call. This seems to be a postgres-only issue and engine option because of how it handles cursors (perhaps?).

However, sqlalchemy's engine.execution_options API provides stream_results which will try to execute without pre-buffering.

stream_results – Available on: Connection, statement. Indicate to the dialect that results should be “streamed” and not pre-buffered, if possible. This is a limitation of many DBAPIs. The flag is currently understood only by the psycopg2 dialect.

Maybe pandas could try and set that flag if chunksize is not None?

@dostabhi
Copy link

Any update on this issue?

@MordorianGuy
Copy link

Does anyone know which parameters should be passed to iterate over Clickhouse query with the stream?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

6 participants