-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable server side cursors #40796
Enable server side cursors #40796
Conversation
Thanks for doing this! You could perhaps only do the stream_results=True conditionally, if chunksize is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesnt' seems to break anything, but its certainly possible we are not exerciing things. can you a) comment on what that parameter is doing (e.g. put an reference in the code to the article), and b) if you can come up with a test that fails current and passes with this and c) would need a whatsnew note (perf section)? for 1.3.
would we ever need to turn this off?
As context, this is a memory usage performance improvement (in some cases). You can see the article here (https://pythonspeed.com/articles/pandas-sql-chunking/) with some memory-usage graphs. With streaming mode, rows are only fetched on demand. If you're using chunking, this is what you want, because otherwise you might try to load gigabytes of rows into memory and then dole them out in chunks, potentially running out of memory. If you're not using chunking, though, streaming is probably a runtime performance hint, because there's a bunch more network latency involved, and you already know you want all the data so why not just get it all in one go? (See #40847 for follow-up work that would use streaming cursors everywhere, but I don't think it should always be on). |
Thanks for the input! Will do, will address the comments and revert shortly. Update: I note the PEP8 style error below, will address it tomorrow when I add
|
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Yes, I am still interested in working on this and will take a look at this on the weekend |
Thanks @J0 . closing as stale. ping when ready |
Okay, will do. Sorry -- work got a bit busy. Will try to make time for it in this weekend or over an evening. Thanks for your understanding |
Hey! @simonjayhawkins, I'm working on this right now -- any chance I could get this re-opened? Have addressed the past issues and would like to see if all the checks are passing. |
Thanks! |
My guess is the failure is unrelated: it timed out after 60 minutes. I would just push a meaningless change and see if it passes this time. |
Looks like I need approval to run the workflows D: @simonjayhawkins could I trouble you again for help with requesting for approval to run the branch? Thank you so much -- am aware you probably are quite busy. |
Hey, does anyone have suggestions on how I might get the tests to pass? a little clueless at this point. |
In a large project like this, it's possible (a) to have intermittent failures (b) for things to break in ways that are unrelated to your code. So general procedure is:
|
Workflows :( any chance for help with enabling the workflows again? @simonjayhawkins |
@@ -1421,7 +1421,13 @@ def run_transaction(self): | |||
|
|||
def execute(self, *args, **kwargs): | |||
"""Simple passthrough to SQLAlchemy connectable""" | |||
return self.connectable.execution_options().execute(*args, **kwargs) | |||
if "chunksize" in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are going to do this, then chunksize should be an actual keyword with a default, and a doc-string.
are there any tests which actually evaluate this path?
Thanks for the PR, but appears that this PR has gone stale again. If you're interested in continuing further by merging master and addressing the comment, feel free to let the team know and we can reopen. |
@mroeschke apologies for the delay. I now have time to patch this again -- by any chance could we reopen this? Or would it be better if I opened a new PR? Lmk! |
@J0 appears I cannot reopen this pull request as our master branch was renamed to main recently. I would recommend to open a new PR with the change. |
@mroeschke sure, sounds good, will open a new PR! :) |
Re-opened at #46166 |
This pull request attempts to fix #35689. I read an article by @itamarst and decided to look further into the codebase. After noticing that the fix semed to be simple I decided file a PR. The fix is naive and I'm not sure the true solution is as simple as I imagine it to be. There are probably many things I'm missing as I've not read the codebase in detail so do let me know what else needs to be addressed.