Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prow accepted invalid OWNERS_ALIASES change and broke approvals #360

Open
tuminoid opened this issue Jan 17, 2025 · 21 comments · Fixed by #368 · May be fixed by #390
Open

Prow accepted invalid OWNERS_ALIASES change and broke approvals #360

tuminoid opened this issue Jan 17, 2025 · 21 comments · Fixed by #368 · May be fixed by #390
Assignees
Labels
area/plugins Issues or PRs related to prow's plugins for the hook component kind/bug Categorizes issue or PR as related to a bug.

Comments

@tuminoid
Copy link
Contributor

tuminoid commented Jan 17, 2025

Issue

We use OWNERS file that points to aliases listed in OWNERS_ALIASES.

We promoted all of our reviewers to approvers in this PR, which then caused Prow to stop accepting approves, such as here and here.

Expected outcome

When merging PR116, Prow should've labeled the PR as "do-not-merge/invalid-owners-file" and blocked it.

Actual outcome

Prow/tide did not complain, we merged it, and noticed that approving was broken. Had to manually merge a fix PR, which then solved the situation right away.

Looking at the verify-owners_test.go:

  1. It seems it should fail to empty groups. I'm suspecting there is loophole due usage of OWNERS_ALIASES, which is much less tested here.
  2. Another alternative is the use of {} to signal empty group, which is valid YAML, but maybe not expected by Prow?

Logs

There was no error logs in Prow, just regular comment/approval handling logs, that did not indicate OWNERS was broken. It just claimed "comment did not result in approval".

@tuminoid
Copy link
Contributor Author

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 20, 2025
@nikhilkumar13199
Copy link

Can I assign this to myself? Prima facie, seems like a missed edge case,

@tuminoid
Copy link
Contributor Author

Can I assign this to myself? Prima facie, seems like a missed edge case,

/assign @nikhilkumar13199

Any help would be appreciated, Prow is needing contributors!

@nikhilnishad-goog
Copy link
Contributor

nikhilnishad-goog commented Feb 3, 2025

I see possible problems. One thing is with {} and without {} seemed to be parsed different by go. Also, we are simply iterating over strings(usernames) in each alias in ParseAliasesConfig and not handling the case where an alias group may be empty (have no users in it). Ideally, we should not accept such empty cases and throw.

Image

Image

@tuminoid
Copy link
Contributor Author

tuminoid commented Feb 4, 2025

Thanks for taking a look!

Can you edit the PR description to include Fixes: #368 so the issue/PR get linked to each other?

@petr-muller
Copy link
Contributor

another potential test case for us here https://kubernetes.slack.com/archives/CDECRSC5U/p1738953760981009

https://github.com/kubernetes/test-infra/blob/c06008386f8cae94bf2bd975041e27b8cf12e126/OWNERS_ALIASES#L26

@BenTheElder
Copy link
Member

@nikhilnishad-goog
Copy link
Contributor

@petr-muller That is a yaml error, so it is already caught by the existing logic. But I will add this to my unit test cases in the PR.

@petr-muller
Copy link
Contributor

I have opened a revert #381 so we'll reopen the issue as well (I'll do that once the revert merges). #360 (comment) was actually a lead. It is a YAML error and it did fail to parse, but unfortunately verify-owners does not warn about it, because it actually only validates OWNERS_ALIASES if its parseable (it validates its content, but not its form). The plugin errors out without flagging the PR that changes OWNERS_ALIASES into something not parseable:

// If OWNERS_ALIASES file exists, get all aliases.
// If the file was modified, check for non trusted users in the newly added owners.
nonTrustedUsers, trustedUsers, repoAliases, err := nonTrustedUsersInOwnersAliases(ghc, log, triggerConfig, org, repo, r.Directory(), modifiedOwnerAliasesFile.Patch, ownerAliasesModified, skipTrustedUserCheck, filenames)
if err != nil {
return err
}

So the verify-owners plugin is the thing we need to fix the most - it should not error out when OWNERS_ALIASES is invalid, but should process the case.

Treatment of empty aliases is then a followup - {} is likely an error, but empty list should not be (unless root approvers: is set to an empty alias) IMO, or at least it must not be a hard error: there are existing repos with empty aliases, and we need to take care we don't break them.

@petr-muller
Copy link
Contributor

/reopen

@nikhilnishad-goog FYI :( Do want to follow up?

@k8s-ci-robot k8s-ci-robot reopened this Feb 19, 2025
@k8s-ci-robot
Copy link
Contributor

@petr-muller: Reopened this issue.

In response to this:

/reopen

@nikhilnishad-goog FYI :( Do want to follow up?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nikhilnishad-goog
Copy link
Contributor

@petr-muller Thanks for detecting and reverting.

Yes, I will raise another PR today or tomorrow, only for the case where -{} is used and approvers: is empty. Thinking about not proposing any radical changes in overall behavior.

@petr-muller
Copy link
Contributor

@nikhilnishad-goog I think the main change necessary will be for verify-owners plugin to properly handle the case when OWNERS_ALIASES is not valid - report it to the PR, not error out.

@nikhilnishad-goog
Copy link
Contributor

nikhilnishad-goog commented Feb 19, 2025

@petr-muller Among the following -
VALID CASES (but add a warning label "invalid_owners_file" from verify-owners.go)

aliases:
  alias-group:
aliases:
  alias-group: {}
aliases:
  alias-group: {alias1, alias2}

INVALID CASES -> (throw errors)

aliases:
  alias-group: "username1, username2"
aliases:
  alias-group: <some unexpected type>

For sure, we should error out for the last case when expanded.(type) is something unexpected, Here is how I propose we should handle (see code block below) - not error out for empty alias cases (which was causing problems for many since a lot users had empty aliases), but when it is an invalid case shown above. I propose we should error out for those cases but feel free to correct me if you think that is too harsh for users. I feel it can be harsh but it also feels like the right thing to do. Please guide.

switch v := expanded.(type) {
    case []interface{}:
    	// Convert []interface{} to []string
    	var members []string
    	for _, member := range v {
    		memberAsString, ok := member.(string)
    		if !ok {
    			return result, fmt.Errorf("unexpected type for alias group member: %T", member)
    		}
    		members = append(members, memberAsString)
    	}
    	result[github.NormLogin(alias)] = NormLogins(members)
    case string:
    	// Alias group must contain a list of members.
    	return result, fmt.Errorf("alias group '%s' must contain a list of members", alias)
    case map[string]interface{}:
    	// Handle Flow Style Mapping (Inline Dictionary/Object Syntax). Example - aliases: { alias-group: { alias1, alias2 } }
    	var members []string
    	for key := range v {
    		members = append(members, key)
    	}
    	result[github.NormLogin(alias)] = NormLogins(members)
    case nil:
    	// Handle empty alias group as an empty list. Examples - aliases: { alias-group: }
    	result[github.NormLogin(alias)] = sets.New[string]()
    default:
    	return result, fmt.Errorf("unexpected type for alias group: %T", v)
}

@petr-muller
Copy link
Contributor

petr-muller commented Mar 5, 2025

@nikhilnishad-goog sorry, I was not able to look at your PR yet, but leaving some high-level thoughts here

  1. Parsing of OWNERS_ALIASES files is generally done in two contexts - standard Prow functionality like approve plugin, which deals with existing OWNERS_ALIASES, and the verify-owners plugin, which deals with "candidate" OWNERS_ALIASES file, and I believe we need to treat these differently
  2. Whenever OWNERS_ALIASES is read for standard Prow functionality, we should treat it pretty liberally. There are existing Prow instances handling existing repos with a ton of existing OWNERS_ALIASES files, and our responsibility is not to (especially silently) break them if they work now. That means we should be pretty liberal - like, if there is a single invalid alias group, it should not prevent the whole file from being read. At worst such group should be treated as empty. That should not be a problem in most cases - except when such empty groups result in empty top-level approvers which is a problematic case - which needs to be detected and properly surfaced somehow at runtime.
  3. It could be helpful to somehow provide more transparency for PR authors so they can understand and discover unexpected cases. For example, the approve plugin adds a comment that lists the OWNERS files involved in the PR. Maybe that comment could (in a details block) list the alias groups involved in the files, with a brief summary (N logins, empty group, invalid group)?
  4. The second context where OWNERS_ALIASES is read is the verify-owners plugin, which validates candidate OWNERS/OWNERS_ALIASES files when they change. I think we can be a bit stricter here, but it is not acceptable to just fail to parse the file silently. When that happens, the plugin should warn the PR through the label (that's its job). And even better, it should warn about about specific problems with specific groups, rather than just saying the file failed to parse.
  5. I think we'll need to simply allow and tolerate the plain empty aliases, even in verify-owners plugin (not even warn about them). It's a fairly widely used pattern and starting to fail on it would be hostile (this has an exception where an empty group leads to empty top-level approvers, which must be detected and reported as invalid). The aliases groups look like YAML lists, so I'd allow a alias-group: [] case too (in addition to plain alias-group:), but not the other cases (alias-group: {} etc) .
  6. I think even being more strict in verify-owners would be better with a transition period where the plugin would only flagged (with an invalid label) groups that are newly invalid, but not if they existed previously and were invalid before. We could only warn about these with a comment, and eventually make the behavior stricter.

I think this all leads to the state where parsing method for OWNERS_ALIASES always returns all groups it could successfully read (possibly also those that not, and treat them as empty), and a list of errors encountered. In standard operation. the errors would be just logged and Prow should try to work as much as possible. In verify-owners, the errors would result in a invalid-owners-file label and a comment. Simple empty groups should not be an error, unless this results in empty top-level approvers.

WDYT?

@tuminoid
Copy link
Contributor Author

tuminoid commented Mar 6, 2025

WDYT?

I agree. Well summarized all-around IMO. I had similar thoughts during review of the initial PR too.

@nikhilnishad-goog
Copy link
Contributor

@petr-muller @tuminoid Thanks a lot for the in-detail explanation. I think I agree on all points.

As I understand it, right now, standard prow functionality should never throw errors except when the YAML couldn't be unmarshalled at all. That makes sense, i will update my PR so that the errors discussed in my previous comments...

INVALID CASES -> (throw errors)

aliases:
  alias-group: "username1, username2"
aliases:
  alias-group: <some unexpected type>

... are no longer breaking for any case of expanded.(type). Instead we will only add labels and comment, from within the repoowners.go itself.

I think I can update the github client interface in repoowners.go to include the method for adding labels, comments, etc.

Consequently, both standard prow functionality (like apporve plugin) and verify-owners will not see new errors (right now the only failure they expect from repoowners.go is when unmarshalling fails). Only some added labels and comments.

@petr-muller
Copy link
Contributor

I think I can update the github client interface in repoowners.go to include the method for adding labels, comments, etc.

That should not be necessary, I don't think that's something repowners.go should do. It's essentially a library code for working with OWNERS files. That code is the used in different contexts, like the approve plugin or the verify-owners plugin - that context knows why it is trying to read the OWNERS (or aliases) files and what it wants to do for various classes of errors. The repowoners.go should just provide reasonable interfaces that provide its callers with necessary data so they can do the useful logic.

@nikhilnishad-goog
Copy link
Contributor

Ahh, I see. Makes more sense now.

But now this problem gets challenging, repoowners.go has the ParseAliasesConfig function that only returns (RepoAliases, error). I want to propagate the information that is closer to a warning and not an error about empty groups, and other conditions I labelled as INVALID CASES -> (throw errors) earlier. This propagated information I can then use in approver.go plugin or verify-owners.go plugin to generate warning labels.

Any ideas on how that can be achieved, without making a change in ParseAliasesConfig function signature? I do not want to be put in a situation where I need to make cascading changes because of changed function signature.

@petr-muller
Copy link
Contributor

Hard to achieve without changing the function signature, but the function is not used at too many places so the toil does not seem to be that big, The callsites will need to change anyway, because they will need to process the enhanced error reporting data?

Hypothetically you could implement a rich type that implements error, stuff the necessary data inside and return that as a standard error, then cast it to the rich type at callsite using errors.As to gain access to the data inside.

@petr-muller
Copy link
Contributor

/area plugins

@k8s-ci-robot k8s-ci-robot added the area/plugins Issues or PRs related to prow's plugins for the hook component label Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/plugins Issues or PRs related to prow's plugins for the hook component kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
6 participants