Enable server side cursors #40796

J0 · 2021-04-06T04:27:19Z

This pull request attempts to fix #35689. I read an article by @itamarst and decided to look further into the codebase. After noticing that the fix semed to be simple I decided file a PR. The fix is naive and I'm not sure the true solution is as simple as I imagine it to be. There are probably many things I'm missing as I've not read the codebase in detail so do let me know what else needs to be addressed.

closes ENH: Support PostgreSQL server-side cursors to prevent memory hog on large datasets #35689
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry(N.A.)

itamarst · 2021-04-07T21:20:34Z

Thanks for doing this! You could perhaps only do the stream_results=True conditionally, if chunksize is set.

jreback

this doesnt' seems to break anything, but its certainly possible we are not exerciing things. can you a) comment on what that parameter is doing (e.g. put an reference in the code to the article), and b) if you can come up with a test that fails current and passes with this and c) would need a whatsnew note (perf section)? for 1.3.

would we ever need to turn this off?

itamarst · 2021-04-09T12:20:06Z

As context, this is a memory usage performance improvement (in some cases). You can see the article here (https://pythonspeed.com/articles/pandas-sql-chunking/) with some memory-usage graphs.

With streaming mode, rows are only fetched on demand. If you're using chunking, this is what you want, because otherwise you might try to load gigabytes of rows into memory and then dole them out in chunks, potentially running out of memory.

If you're not using chunking, though, streaming is probably a runtime performance hint, because there's a bunch more network latency involved, and you already know you want all the data so why not just get it all in one go? (See #40847 for follow-up work that would use streaming cursors everywhere, but I don't think it should always be on).

J0 · 2021-04-09T12:36:51Z

Hey @itamarst and @jreback,

Thanks for the input! Will do, will address the comments and revert shortly.

Update: I note the PEP8 style error below, will address it tomorrow when I add

a reference in sql.py to the article
a test that fails current and passes with the addition of the new parameter
A whatsnew note for 1.3 under the perf section

pep8speaks · 2021-04-11T15:42:03Z

Hello @J0! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-07-02 04:15:17 UTC

github-actions · 2021-05-19T00:03:46Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

J0 · 2021-05-19T00:05:30Z

Yes, I am still interested in working on this and will take a look at this on the weekend

simonjayhawkins · 2021-06-08T18:22:45Z

Yes, I am still interested in working on this and will take a look at this on the weekend

Thanks @J0 . closing as stale. ping when ready

J0 · 2021-06-09T04:38:15Z

Okay, will do. Sorry -- work got a bit busy. Will try to make time for it in this weekend or over an evening. Thanks for your understanding

J0 · 2021-06-12T12:35:25Z

Hey! @simonjayhawkins,

I'm working on this right now -- any chance I could get this re-opened?

Have addressed the past issues and would like to see if all the checks are passing.

J0 · 2021-06-12T12:57:10Z

Thanks!

J0 · 2021-06-12T14:07:22Z

Oh dear, looks like the coverage tests aren't passing.

Not too sure what's the best way to go about writing a failing this for this since the main change is in the option passed in

Will think about this but just wanted to check if @itamarst and @jreback had any suggestions

itamarst · 2021-06-13T19:08:27Z

My guess is the failure is unrelated: it timed out after 60 minutes. I would just push a meaningless change and see if it passes this time.

J0 · 2021-06-14T16:05:43Z

Looks like I need approval to run the workflows D: @simonjayhawkins could I trouble you again for help with requesting for approval to run the branch?

Thank you so much -- am aware you probably are quite busy.

J0 · 2021-06-22T03:04:21Z

Hey, does anyone have suggestions on how I might get the tests to pass? a little clueless at this point.

itamarst · 2021-06-23T12:28:21Z

In a large project like this, it's possible (a) to have intermittent failures (b) for things to break in ways that are unrelated to your code. So general procedure is:

Try rerunning with meaningless commit.
If you get the exact same failure, and it's clear that it's not related to your code, merge forward or rebase, depending on how the project does it.

J0 · 2021-07-02T02:15:01Z

Workflows :( any chance for help with enabling the workflows again? @simonjayhawkins

jreback · 2021-07-12T13:15:47Z

pandas/io/sql.py

@@ -1421,7 +1421,13 @@ def run_transaction(self):

    def execute(self, *args, **kwargs):
        """Simple passthrough to SQLAlchemy connectable"""
-        return self.connectable.execution_options().execute(*args, **kwargs)
+        if "chunksize" in kwargs:


if you are going to do this, then chunksize should be an actual keyword with a default, and a doc-string.

are there any tests which actually evaluate this path?

mroeschke · 2021-09-08T02:18:43Z

Thanks for the PR, but appears that this PR has gone stale again. If you're interested in continuing further by merging master and addressing the comment, feel free to let the team know and we can reopen.

J0 · 2022-01-13T06:37:38Z

@mroeschke apologies for the delay. I now have time to patch this again -- by any chance could we reopen this? Or would it be better if I opened a new PR?

Lmk!

mroeschke · 2022-01-13T22:22:03Z

@J0 appears I cannot reopen this pull request as our master branch was renamed to main recently. I would recommend to open a new PR with the change.

J0 · 2022-01-14T04:33:46Z

@mroeschke sure, sounds good, will open a new PR! :)

J0 · 2022-02-27T07:23:40Z

Re-opened at #46166

Enable server side cursors

2f25557

auvipy approved these changes Apr 7, 2021

View reviewed changes

jreback requested changes Apr 9, 2021

View reviewed changes

jreback added the IO SQL to_sql, read_sql, read_sql_query label Apr 9, 2021

itamarst mentioned this pull request Apr 9, 2021

ENH: Memory usage of read_sql could be significantly reduced to ~25% of current memory usage #40847

Open

ENH: Add conditional check for chunking

e1f51dc

Joel Lee and others added 5 commits April 14, 2021 23:00

Add note in what's new

fd0599b

ENH: Update documents and add reference to article

56af070

Merge branch 'master' into patch-1

a37851e

ENH: Update What's new

26ae702

ENH: Resolve merge conflicts

160ea02

github-actions bot added the Stale label May 19, 2021

simonjayhawkins closed this Jun 8, 2021

DOC:resolve merge conflicts in docs

7336033

DOC: Add whatsnew note to v1.4.0 doc

624efc7

simonjayhawkins reopened this Jun 12, 2021

ENH:retrigger CI

74774ed

simonjayhawkins removed the Stale label Jun 16, 2021

Merge branch 'master' of github.com:pandas-dev/pandas into patch-1

a64fd71

Merge branch 'master' into patch-1

e83e1e5

J0 requested a review from jreback July 12, 2021 02:59

jreback requested changes Jul 12, 2021

View reviewed changes

mroeschke closed this Sep 8, 2021

skshetry mentioned this pull request Jan 5, 2024

ENH: enable server side cursors when chunksize is set #56742

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable server side cursors #40796

Enable server side cursors #40796

J0 commented Apr 6, 2021 •

edited

Loading

itamarst commented Apr 7, 2021

jreback left a comment

itamarst commented Apr 9, 2021

J0 commented Apr 9, 2021 •

edited

Loading

pep8speaks commented Apr 11, 2021 •

edited

Loading

github-actions bot commented May 19, 2021

J0 commented May 19, 2021

simonjayhawkins commented Jun 8, 2021

J0 commented Jun 9, 2021 •

edited

Loading

J0 commented Jun 12, 2021 •

edited

Loading

J0 commented Jun 12, 2021

J0 commented Jun 12, 2021 •

edited

Loading

itamarst commented Jun 13, 2021

J0 commented Jun 14, 2021 •

edited

Loading

J0 commented Jun 22, 2021

itamarst commented Jun 23, 2021

J0 commented Jul 2, 2021 •

edited

Loading

jreback Jul 12, 2021

mroeschke commented Sep 8, 2021

J0 commented Jan 13, 2022

mroeschke commented Jan 13, 2022

J0 commented Jan 14, 2022

J0 commented Feb 27, 2022

Enable server side cursors #40796

Enable server side cursors #40796

Conversation

J0 commented Apr 6, 2021 • edited Loading

itamarst commented Apr 7, 2021

jreback left a comment

Choose a reason for hiding this comment

itamarst commented Apr 9, 2021

J0 commented Apr 9, 2021 • edited Loading

pep8speaks commented Apr 11, 2021 • edited Loading

Comment last updated at 2021-07-02 04:15:17 UTC

github-actions bot commented May 19, 2021

J0 commented May 19, 2021

simonjayhawkins commented Jun 8, 2021

J0 commented Jun 9, 2021 • edited Loading

J0 commented Jun 12, 2021 • edited Loading

J0 commented Jun 12, 2021

J0 commented Jun 12, 2021 • edited Loading

itamarst commented Jun 13, 2021

J0 commented Jun 14, 2021 • edited Loading

J0 commented Jun 22, 2021

itamarst commented Jun 23, 2021

J0 commented Jul 2, 2021 • edited Loading

jreback Jul 12, 2021

Choose a reason for hiding this comment

mroeschke commented Sep 8, 2021

J0 commented Jan 13, 2022

mroeschke commented Jan 13, 2022

J0 commented Jan 14, 2022

J0 commented Feb 27, 2022

J0 commented Apr 6, 2021 •

edited

Loading

J0 commented Apr 9, 2021 •

edited

Loading

pep8speaks commented Apr 11, 2021 •

edited

Loading

J0 commented Jun 9, 2021 •

edited

Loading

J0 commented Jun 12, 2021 •

edited

Loading

J0 commented Jun 12, 2021 •

edited

Loading

J0 commented Jun 14, 2021 •

edited

Loading

J0 commented Jul 2, 2021 •

edited

Loading