Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Principles for Package Repository Security #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Principles for Package Repository Security #37
Changes from 3 commits
1893108
7d517f7
ab4ceb7
b2e695b
04ac7ab
cfec1b7
c5065be
1ce0137
b42e807
c65f480
1146a6c
a77177a
05eb852
86b0642
f8bbc23
c53f265
771106f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another sort of package repository is one that doesn't have user accounts and doesn't allow uploads to it directly, but only mirrors another package repository. I think it would be good to have guidance around ensuring the mirror contains the same data as the source, and perhaps guidance around how the mirror removes content that the source has removed because it's malicious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sometimes differentiate between "registry" and "repository". A "repository" actually stores the package contents. A "registry" registers a set of packages, but depends on something else to actually store the data. Would it be useful to make this distinction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We settled on "package repositories" in this doc, with the clarification above:
We last talked about this naming distinction back in November (!) where we looked at how different ecosystems referred to themselves, and the conclusion we came to is that there isn't consistent terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "smaller" mean in this context, both in terms of what is being measured and what threshold of that measurement is considered to be smaller? That is, does this mean:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it's in terms of the amount of resources available to the registry to add/build these security capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, crates.io only allows logging in via OAuth with GitHub-- does this count as having user accounts or not? Some of these items don't apply in this case-- crates.io doesn't have passwords, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decouples verification from the decision whether providing an email address should be a (non-)requirement.
If an email is provided it certainly makes sense to verify it. And for many people it's desirable, so it can also make sense to recommend that users provide a valid email address, but in the end there are valid reasons why one might not want to use one. And a user could always opt to sign up with a throwaway email anyway, so enforcing it at signup is not going to really ensure things anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add: the repository restricts or otherwise disallows using temporary email addresses, the same email address for multiple accounts, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm opposed to that suggestion. Sometimes I intentionally use temporary email addresses to make a pseudonymous account. Major email providers want to tie your email account to your phone number and such. To avoid that association I use a throwaway to confirm the email validation and then only interact with username+password.
Not being able to recover the account is a choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, the added burden of spam/malware that comes from allowing temporary email addresses vastly outweighs the benefit of pseudonymous publishing and has a direct effect on the overall security of the repository. A repository team that is overwhelmed with dealing with spam/malware is one that is unable to handle the needs of the legitimate user base, remove spam/malware, or security features in a timely fashion. It's a tradeoff, and the one that improves the security of the repository (which is the goal of this document) is to prohibit temporary email addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you consider temporary? A newly created account with a randomized address on a freemail provider that doesn't do phone verification is a temporary address for my purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically providers that a) generate email addresses on-demand b) don't require authentication (i.e. an "open inbox"). There is a long list that is maintained here based on that criteria: https://github.com/disposable-email-domains/disposable-email-domains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I have used domains in both the allow list and in the block list on that repo.
I think it would make sense to spell out the distinction. Broader definitions of "temporary email address" might include anything that's not tied to a person's identity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add here "and their account recovery procedures that attempts to reduce the likelihood of the package registry support team being susceptible to social engineering attacks" or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This item seems to be implying that there's also action a user can take if their account has been compromised, and I think it's worth making that explicit here-- that is, adding an item like "The package repository has a support team a user can notify if they suspect their account has been compromised. The support team has a SLA of X for locking and investigating the account".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think blanket requirements introduce friction and may be onerous for casual/new/poor developers that want to publish hobby projects and such.
Proper MFA (hardware tokens) costs money.
Instead individual packages should be labeled based on their level so that consumers who require higher levels can restrict them without burdening those who do not require this.
Generally the commercial/big corporate/government users shouldn't shape registries solely to their needs.
Being able to filter the content based on the criteria they're interested in should provide the same outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Authenticator apps are “proper” just like hardware keys, so i don’t think this is much of a consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Authenticator apps may sit on the same machine as the password manager or on devices that are generally more compromisable. So they provide a lower security level than dedicated hardware tokens.
AIUI webauthn also supports attestation, which means an authentication provider may choose to lock MFA to a set of trusted (and usually more expensive) vendors.
At $work we have to use physical tokens to comply with the security standards of some clients. I forsee such increasingly strict requirements will trickle down to what is deemed critical software infrastructure at some point.
Making "compliance levels" a per-package/dependency-tree instead of per-registry thing is a rather simple change that prevents a security ratchet raising the barrier of entry for hobbyists and less affluent developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the intent here is that MFA could either be hardware-based or TOTP. What we've seen first-hand at GitHub's MFA rollout is that this has substantially reduced account takeover while also providing global access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is not stated here and I am concerned about this wording contributing to increasingly strict future developments.
You're using the word requires here, in the context of an entire platform, not individual projects on that platform.
Today it's perhaps only software TOTP. Tomorrow it'll be a select set of expensive, proprietary hardware tokens. In a decade you can only get those tokens by registering with your national ID, perhaps indirectly by them being tied to a TPM in your phone.
We have seen this with telephony and SIMs. With phone booths it used to be anonymous. Then anyone could buy a SIM with cash, that got you a persistent globally trackable ID that was not directly tied to your name. Now in many countries you can only buy SIMs with an ID. All in the name of fighting crime, terrorism, money laundering or whatever.
With this recommendation and ongoing "critical software infrastructure" efforts in the EU I see this as a small step in that direction.
A step that is easy to mitigate by not making MFA a platform-wide requirement and instead something that individual projects or organizations can opt into. If a project wants to cater to high-assurance clients then they can require MFA from all their maintainers (similar to signed git commits for example) and proudly display their security posture so that their clients can trust the project.
Why would $GOV care that $HOST requires MFA for all users when $GOV only wants to use project $X from $HOST? This requirement just is overly broad. It impacts those who do not share the same values as $GOV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which still does not substantiate your claim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npm/feedback#588 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only talking about "risk to attack", which is not the same as "attack vector". I asked you repeatedly to substantiate your claim that this constitutes an attack vector. Information usable to triaging promising targets does not constitute an attack vector.
Anyway, I consider their risk-based argument lacking nuance/detailed consideration.
For one if the non-mfa population is somewhat large then this information does not steer an attacker towards any particular target. Analogously: Just because a bank in a city block has a safe it does not put any particular private home in the environment at risk of becoming a preferred target for burglars.
It also doesn't consider second-order effects such as pushing people towards using git-based dependencies (which are an option or even the norm in some package managers) if publishing procedures become too onerous for casual users and whether that would have a net-negative effect on security.
There could also be other approaches to mitigating the concern such as only publishing that information if several additional, optional security aspects are also fulfilled. This would dilute the accuracy of the signal.
Yet another option is dismissing this concern because the risk is minuscule. Knowing that some niche repos don't use MFA might not be all that useful for an attacker that's looking to compromise a high-value repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not interested in this argument devolving into pedantry; the fact remains that it is simply not viable to require package or org specific MFA, github and npm will be requiring it wholesale imminently, and this requirement is almost certain to remain as-is in this document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between an attack vector (which usually is something that needs to be fixed) and disclosure about security practices (which is a tradeoff, potentially a tiny one) is significant in my opinion, not pedantry.
I had to ask 3 times for you to even reply to this aspect only to dismiss it as pedantry. I would expect more precise discourse when it comes to designing security documentation for global software infrastructure.
Do you mean the requirement or publishing that information? The requirement itself is clearly viable and has been for years because that's what NPM is doing right now without the world going up in fire.
There might be reasons to change the policy, but they are likely subtle/marginal ones, that are not about fundamental viability of a policy but about tradeoffs between different viable policies.
Just because github chooses a particular approach does not mean it's the only possible approach.
Well there was a request for feedback, which is why I'm commenting.
Anyway, I will again reiterate my suggestion that package registries should list the "compliance level" of individual packages (or even individual artifacts) instead of chosing "levels" as a baseline for the entire registry.
By bundling several aspects of those levels into a compliance level it could alsoo discard bits of information an attacker can use. Instead of learning that the package maintainers use MFA specifically they will only learn that the maintainers do a whole bunch of things to reach said compliance level.
So here's a rephrased proposal:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, but also "has a support team to handle account recovery based on a defined process within [some SLO]"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that consistent internal process for account recovery that is robust to social engineering should be Level 2. Possibly also something about having enough forms of identity verification to enable proper recovery.
If there isn't robust recovery the robust MFA is useless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea that I think would go under "General Capabilities": a way to view the published artifact on the web without needing to download the package? And more advanced would be a way to view the differences between versions of the artifact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO another good "Level 1" item here would be:
(This is what PyPI does, and it's arguably a precursor step to the "event transparency log" noted under "Level 3".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like level 1 tbh. Immutability should be a core requirement for actively maintained packages.
I think there is flexibility here for low used packages (as per the npm policy)
Tombstoning for deletion isn't really neccessary for those packages with low usage imho, those with high usage should be able to be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add here "and documents the support team procedures and SLA for addressing reports of suspicious or malicious packages"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there maybe a better link that describes what this is and how to use it? A JSON blob is not super helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closest I could find is https://github.com/npm/registry/blob/master/docs/REPLICATE-API.md which appears to be older, not sure if there's anything more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea for another item under "CLI Tooling": guidance around how the CLI stores/uses/accesses API tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems overly broad, are we really just considering specific classes of CLI tooling, e.g. "installers", "builders", "publishers", etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- if the doc is meant to be taxonomical, it might make sense to clarify that tooling here is meant to be the "package installer" (based on context).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should tooling do when it encounters known malware? pypa/pip#5777
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would explicitly add "source registry" or "location" or similar here, to prevent repeats of issues such as https://nvd.nist.gov/vuln/detail/CVE-2020-36327
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, something like "pinned based on build/source provenance", but that would probably be Level 2, if not Level 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be expanded: produce SBOMs for what? (applications? libraries?) based on what? (a lock file? a source tree?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this seems like a mix of repository responsibilities (publish sufficiently detailed advisories that make it possible to determine usage) and CLI responsibilities (determine if a given source tree is actually using a vulnerable code path in a dependency with a known vulnerability).