-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Don't Cache Index Pages Across Runs #2904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note: We should make sure we get an in memory cache (it could be unconditional) so that we don't get a lot slower for repeated calls in the same process. |
I currently benefit hugely from the caching of things like /simple/foo/ when I run pip in test runs across 5+ python versions on high latency links. Would it be possible to solve the issues some other way? I don't fully understand them as described though, so not sure I can help designing alternative answers yet. AIUI you're saying there are three issues:
The first one we could tackle by dropping some more diagnostics in the debug log, and then asking for that on the next reporter. The second I think a last-invalidated RSS-or-similar feed from PyPI itself would be appropriate (e.g. lists the API pages invalidated in the last say 12 minutes and the timestamp of their invalidation) - we can compare that to the Date header on the pages as we access them providing cheap (1 small HTTP request-response per pip install/wheel invocation) invalidation across arbitrary pages. Basically the same strategy as has been used to make Squid ultra responsive for accelerated sites for ages. The third one I have no idea about :) |
Oh,I should add, the absence of an invalidated feed on a site would be enough to disable using cached pages for it in this scheme, since 10m is a long time for arbitrary freshness limits. |
Ok, so that's a reasonable thing, and I think enough to reject the idea of not caching for a short time period. I'm going to split your ideas out into their own tickets and then go ahead and close this. Thanks! |
Right now we cache the pages like
/simple/foo/
for 10 minutes across all runs, however we've had a number of problems with people either confused why something they just published isn't available even though it showed up on PyPI, (see #2901) or when people just flat out manage to get pip to keep reusing the cache beyond it's expiration (???? Not sure why).The usefulness of this is somewhat limited, it will reduce HTTP requests and make them faster within a 10 minute window, however that doesn't seem like a very major benefit. It also has some other downsides, one of which is that for things like Travis we can have "cache churn" causing a new set of cached dependencies to need to be uploaded even though the only thing that has changed is the
/simple/foo/
pages, which won't be re-used anyways unless the next build is < 10 minutes away.The text was updated successfully, but these errors were encountered: