Skip to content

Commit b18f431

Browse files
authored
Merge pull request python#37 from joshuagl/joshuagl/bins
Point out that changes to number of bins are transparent to the client
2 parents f6f77d3 + 7a58323 commit b18f431

File tree

1 file changed

+23
-5
lines changed

1 file changed

+23
-5
lines changed

pep-0458.txt

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -451,14 +451,32 @@ Based on our findings as of the time this document was updated for implementatio
451451
(Oct 7 2019), PyPI SHOULD split all targets in the *bins* role by delegating
452452
them to 16,384 *bin-n* roles. Each *bin-n* role would sign for the PyPI targets whose
453453
hashes fall into that bin (see Figure 2). It was found__
454-
that this number of bins would result in a 12-17% metadata overhead for
455-
returning users, and a 148% overhead for new users who are installing
456-
pip for the first time.
454+
that this number of bins would result in a 12-17% metadata overhead
455+
(relative to the average size of downloaded packages) for returning users
456+
(assuming 256-byte target filenames for all packages), and a 148% overhead
457+
for new users who are installing pip for the first time.
457458

458459
__ https://docs.google.com/spreadsheets/d/11_XkeHrf4GdhMYVqpYWsug6JNz5ZK6HvvmDZX0__K2I/edit?usp=sharing
459460

460-
While it is possible to make TUF metadata more compact by representing it in a
461-
binary format, as opposed to the JSON text format, a sufficiently large
461+
This number of bins SHOULD increase when the metadata overhead for returning
462+
users exceeds 50%. Presently, this SHOULD happen when the number of targets
463+
increase at least 4x from over 2M to nearly 9M, at which point the metadata
464+
overhead for returning and new users would be around 49-54% (assuming 256-byte
465+
target filenames for all packages) and 185% respectively, assuming that the
466+
number of bins stay fixed. If the number of bins is increased, then the cost
467+
for all users would effectively be the cost for new users, because their cost
468+
would be dominated by the (once-in-a-while) cost of downloading the large
469+
number of delegations in the `bins` metadata. If the cost for new users
470+
should prove to be too much, then this subject SHOULD be revisited before
471+
that happens.
472+
473+
Note that changes to the number of bins on the server are transparent to the
474+
client. The package manager will be required to download a fresh set of
475+
metadata, as though it were a new user, but this operation will not require any
476+
explicit code logic or user interaction in order to do so.
477+
478+
It is possible to make TUF metadata more compact by representing it in a binary
479+
format, as opposed to the JSON text format. Nevertheless, a sufficiently large
462480
number of projects and distributions will introduce scalability challenges at
463481
some point, and therefore the *bins* role will still need delegations (as
464482
outlined in figure 2) in order to address the problem. The JSON format is an

0 commit comments

Comments
 (0)