@@ -451,14 +451,32 @@ Based on our findings as of the time this document was updated for implementatio
451
451
(Oct 7 2019), PyPI SHOULD split all targets in the *bins* role by delegating
452
452
them to 16,384 *bin-n* roles. Each *bin-n* role would sign for the PyPI targets whose
453
453
hashes fall into that bin (see Figure 2). It was found__
454
- that this number of bins would result in a 12-17% metadata overhead for
455
- returning users, and a 148% overhead for new users who are installing
456
- pip for the first time.
454
+ that this number of bins would result in a 12-17% metadata overhead
455
+ (relative to the average size of downloaded packages) for returning users
456
+ (assuming 256-byte target filenames for all packages), and a 148% overhead
457
+ for new users who are installing pip for the first time.
457
458
458
459
__ https://docs.google.com/spreadsheets/d/11_XkeHrf4GdhMYVqpYWsug6JNz5ZK6HvvmDZX0__K2I/edit?usp=sharing
459
460
460
- While it is possible to make TUF metadata more compact by representing it in a
461
- binary format, as opposed to the JSON text format, a sufficiently large
461
+ This number of bins SHOULD increase when the metadata overhead for returning
462
+ users exceeds 50%. Presently, this SHOULD happen when the number of targets
463
+ increase at least 4x from over 2M to nearly 9M, at which point the metadata
464
+ overhead for returning and new users would be around 49-54% (assuming 256-byte
465
+ target filenames for all packages) and 185% respectively, assuming that the
466
+ number of bins stay fixed. If the number of bins is increased, then the cost
467
+ for all users would effectively be the cost for new users, because their cost
468
+ would be dominated by the (once-in-a-while) cost of downloading the large
469
+ number of delegations in the `bins` metadata. If the cost for new users
470
+ should prove to be too much, then this subject SHOULD be revisited before
471
+ that happens.
472
+
473
+ Note that changes to the number of bins on the server are transparent to the
474
+ client. The package manager will be required to download a fresh set of
475
+ metadata, as though it were a new user, but this operation will not require any
476
+ explicit code logic or user interaction in order to do so.
477
+
478
+ It is possible to make TUF metadata more compact by representing it in a binary
479
+ format, as opposed to the JSON text format. Nevertheless, a sufficiently large
462
480
number of projects and distributions will introduce scalability challenges at
463
481
some point, and therefore the *bins* role will still need delegations (as
464
482
outlined in figure 2) in order to address the problem. The JSON format is an
0 commit comments