Don't Cache Index Pages Across Runs #2904

dstufft · 2015-06-12T19:04:27Z

Right now we cache the pages like /simple/foo/ for 10 minutes across all runs, however we've had a number of problems with people either confused why something they just published isn't available even though it showed up on PyPI, (see #2901) or when people just flat out manage to get pip to keep reusing the cache beyond it's expiration (???? Not sure why).

The usefulness of this is somewhat limited, it will reduce HTTP requests and make them faster within a 10 minute window, however that doesn't seem like a very major benefit. It also has some other downsides, one of which is that for things like Travis we can have "cache churn" causing a new set of cached dependencies to need to be uploaded even though the only thing that has changed is the /simple/foo/ pages, which won't be re-used anyways unless the next build is < 10 minutes away.

The text was updated successfully, but these errors were encountered:

dstufft · 2015-06-12T19:05:27Z

Note: We should make sure we get an in memory cache (it could be unconditional) so that we don't get a lot slower for repeated calls in the same process.

rbtcollins · 2015-06-12T20:26:17Z

I currently benefit hugely from the caching of things like /simple/foo/ when I run pip in test runs across 5+ python versions on high latency links. Would it be possible to solve the issues some other way? I don't fully understand them as described though, so not sure I can help designing alternative answers yet.

AIUI you're saying there are three issues:

some users are finding API index pages are cached > 10 minutes
the 10 minutes is unconditional with no invalidation leading to test-upload-test-wtf-where-is-my-new-package stories
something something travis ?

The first one we could tackle by dropping some more diagnostics in the debug log, and then asking for that on the next reporter.

The second I think a last-invalidated RSS-or-similar feed from PyPI itself would be appropriate (e.g. lists the API pages invalidated in the last say 12 minutes and the timestamp of their invalidation) - we can compare that to the Date header on the pages as we access them providing cheap (1 small HTTP request-response per pip install/wheel invocation) invalidation across arbitrary pages. Basically the same strategy as has been used to make Squid ultra responsive for accelerated sites for ages.

The third one I have no idea about :)

rbtcollins · 2015-06-12T20:28:18Z

Oh,I should add, the absence of an invalidated feed on a site would be enough to disable using cached pages for it in this scheme, since 10m is a long time for arbitrary freshness limits.

dstufft · 2015-06-12T20:35:25Z

Ok, so that's a reasonable thing, and I think enough to reject the idea of not caching for a short time period. I'm going to split your ideas out into their own tickets and then go ahead and close this. Thanks!

dstufft · 2015-06-12T20:43:13Z

Ok, closing this for #2905 and #2906. I don't think the Travis issue is that big of a deal.

dstufft mentioned this issue Jun 12, 2015

Handle Stale(ish) Index Pages #2906

Closed

dstufft closed this as completed Jun 12, 2015

lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 4, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't Cache Index Pages Across Runs #2904

Don't Cache Index Pages Across Runs #2904

dstufft commented Jun 12, 2015

dstufft commented Jun 12, 2015

rbtcollins commented Jun 12, 2015

rbtcollins commented Jun 12, 2015

dstufft commented Jun 12, 2015

dstufft commented Jun 12, 2015

Don't Cache Index Pages Across Runs #2904

Don't Cache Index Pages Across Runs #2904

Comments

dstufft commented Jun 12, 2015

dstufft commented Jun 12, 2015

rbtcollins commented Jun 12, 2015

rbtcollins commented Jun 12, 2015

dstufft commented Jun 12, 2015

dstufft commented Jun 12, 2015