Skip to content

Ensure authz operation overrides transient authz headers #61621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

albertzaharovits
Copy link
Contributor

@albertzaharovits albertzaharovits commented Aug 27, 2020

AuthorizationService#authorize uses the thread context to carry the result of the authorization as transient headers.
The listener argument to the authorize method must necessarily observe the header values.

But we've learned that the TransportService carries over the transient headers from the thread context to the locally executed action handlers (unlike the remotely executed action handlers). This becomes problematic when the authorization is invoked multiple times, eg. because SecurityActionFilter#apply is effectively invoked in the same thread context when a parent action uses the TransportService to execute the child action locally.

The desired outcome is that the authorization transient headers of the child action supersede the ones of the parent action.
This PR is the first step in this direction; it removes a specific transient header (AuthorizationServiceField#INDICES_PERMISSIONS_KEY) before calling AuthorizationService#authorize which would fill in the header with the new value.

Co-authored-by: Tim Vernum [email protected]

@albertzaharovits albertzaharovits added the :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC label Aug 27, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (:Security/Authorization)

@elasticmachine elasticmachine added the Team:Security Meta label for security team label Aug 27, 2020
@ywangd
Copy link
Member

ywangd commented Aug 31, 2020

I still need look closer at the tests. But I think the changes overall look good to me.

I had a slight concern on performance since IndicesAccessControl is now always discarded before every authorization attempt. But then I realised we are actually already doing it but in an incorrect way in that: IndicesAccessControl is always calculated for every Authorization attempt (so no performance difference) and today the new one is thrown away, which is the source of recent issues. The changes are hence really beneficial since nothing is discarded silently. 👍

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left some suggestions. None of them is critical enough to prevent approval.

A bit more discussion that is not directly related to this PR: I am now a bit skeptical about other usages of putTransientIfNonExisting. I cannot tell any problems. But the fact we discard computed result silently is a bit worring. Maybe we could use a debug logging for when it happens.

@albertzaharovits
Copy link
Contributor Author

A bit more discussion that is not directly related to this PR: I am now a bit skeptical about other usages of putTransientIfNonExisting. I cannot tell any problems. But the fact we discard computed result silently is a bit worring. Maybe we could use a debug logging for when it happens.

I agree, this is not strictly correct. The approach in this PR is not something that I wholeheartedly endorse. I think it's subtly wrong that only a single authz header takes the value from the latest authz operation, while the other authz headers do not; taken together as a whole, the authz headers do not correctly describe the authorisation outcome.

My personal preference would be to stash all the authorisation headers (i.e. including ORIGINATING_ACTION_KEY and AUTHORIZATION_INFO_KEY) using some form of SecurityContext#stashContext .

But I'm waiting to see what @jaymode thinks is the best option, as it's now gotten into more of a thread context issue (i.e. the lack of a suitable API) than a security issue.

@ywangd
Copy link
Member

ywangd commented Sep 1, 2020

it's now gotten into more of a thread context issue (i.e. the lack of a suitable API) than a security issue.

Yes. The simple remove method puts responsibility to the caller to backup the original value. Since SecurityActionFilter already stores the threadContext, it is not an issue for now. But could be prone to programming error in future usages. It calls for a method that stores current threadContext and optionally remove certain headers in one go.

@albertzaharovits
Copy link
Contributor Author

This PR is now completely changed by introducing the SecurityContext#stashAuthorizationContext method, that stashes the three transient authorization headers. I've confirmed this approach with @jaymode .
Thanks for reviewing @ywangd , but this now obviously needs a new pass.

@albertzaharovits albertzaharovits changed the title Authz overrides index permission in thread context Ensure authz operation overrides transient authz headers Sep 2, 2020
Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can add a method to ThreadContext that also accepts a list of transient names to filter? So something like newStoredContext(boolean, List<String>) so that we can drop the public removeTransient method?

@albertzaharovits
Copy link
Contributor Author

As discussed with @jaymode , I've redone it such that, upon restore, ALL the headers are reverted (with the possible exception of response headers). The new method is a newStoredContext variant that in addition permits clearing up the specified transient headers.

The previous implementation, with the partial stash, was confusing.
The current implementation is simpler but now authorization reverts non-authorization headers as well (but not for the listener). This is only a theoretical difference, as there's nothing left to do (which would require a certain context) after authorization completed and its listener has been called.

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It would be good to also have @ywangd or @tvernum give this a review too

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some theoretical concerns. It is possible that they are completely nonsense. But given how important ThreadContext is, I'll just let them out :)

It's mainly about the two additional authorization headers, ORIGINATING_ACTION and AUTHORIZATION_INFO. Compare to INDICES_PERMISSIONS, they are less specific to the particular request that is being executed.

INDICES_PERMISSIONS is tightly coupled to the current request. In fact, it is erroneous to reuse it cross requests. This was the problem and we are trying to fix it. But I am not sure the same thing can be said for ORIGINATING_ACTION and AUTHORIZATION_INFO.

For ORIGINATING_ACTION, it seems to make sense that the same value would apply to multiple subsequent requests, i.e. one request causes another. Its value is also checked in AuthorizationUtils#shouldReplaceUserWithSystem, which I am not sure whether the change would lead to any subtle but important differences.

For AUTHORIZATION_INFO, the change here makes it mutable across requests. Currently, when the parent action is authorized, the same Role object is applied to any child actions. This is true even when relevant roles are modified in the middle of the requests. With this PR, it is possible that the Role object will be different for child actions. If it is then gets denied, not sure this would lead to any inconsistency. With that being said, since tranisents are cleared across nodes, it may not be an issue at all.

Another theoretical thing is about multiple TransportInterceptor or ActionFilter. Currently, any extra interceptors/filters run with the parent transients. With this PR, they will run with child transients. If these extra interceptors/filters somehow rely on parent transients to work, it will have issues. Again, this is purely theoretical and is not something to be worried today since no other interceptors/filters check security infos.

*/
try (ThreadContext.StoredContext ignore = threadContext.newStoredContext(false, AuthorizationServiceField.ALL_AUTHORIZATION_KEYS)) {
// prior to doing any authorization lets set the originating action in the context only
threadContext.putTransient(AuthorizationServiceField.ORIGINATING_ACTION_KEY, action);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right semantic? I understand this is the reason why RestSqlSecurityIT needs to be updated. Technical details aside, if a parent action invokes a child action, should the "originating action" still be the parent action? The change here makes it to be the child action. If it's always the child action, why does it need to be called "originating" action?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a naming thing. On the obverse, wouldn't it be odd to only just store the parent action name, no matter the child nesting level, but only if the request doesn't cross thee node boundary?

@albertzaharovits
Copy link
Contributor Author

Thanks for reviewing @ywangd .

For ORIGINATING_ACTION, it seems to make sense that the same value would apply to multiple subsequent requests, i.e. one request causes another. Its value is also checked in AuthorizationUtils#shouldReplaceUserWithSystem, which I am not sure whether the change would lead to any subtle but important differences.

AuthorizationUtils#shouldReplaceUserWithSystem is the only place where we make use of this headers, (apart from the audit logs). The use case for that condition is when a context is marked as a system context. My assessment is that there would be no difference in behaviour, but it's hard to tell for sure.

For AUTHORIZATION_INFO, the change here makes it mutable across requests. Currently, when the parent action is authorized, the same Role object is applied to any child actions. This is true even when relevant roles are modified in the middle of the requests. With this PR, it is possible that the Role object will be different for child actions. If it is then gets denied, not sure this would lead to any inconsistency. With that being said, since tranisents are cleared across nodes, it may not be an issue at all.

Yes this is not an issue, IMO.

@albertzaharovits
Copy link
Contributor Author

Another theoretical thing is about multiple TransportInterceptor or ActionFilter. Currently, any extra interceptors/filters run with the parent transients. With this PR, they will run with child transients. If these extra interceptors/filters somehow rely on parent transients to work, it will have issues. Again, this is purely theoretical and is not something to be worried today since no other interceptors/filters check security infos.

This is only theoretical, agreeed, whenever we make use of these transient headers in another place, we unfortunately need to ensure that the value is set correctly; this is the drawback of "global" variables, like thread locals.

@ywangd
Copy link
Member

ywangd commented Sep 10, 2020

AuthorizationUtils#shouldReplaceUserWithSystem is the only place where we make use of this headers, (apart from the audit logs). The use case for that condition is when a context is marked as a system context. My assessment is that there would be no difference in behaviour, but it's hard to tell for sure.

I am puzzled by the comments in AuthorizationUtils#shouldReplaceUserWithSystem:

// we have a internal action being executed by a user other than the system user, lets verify that there is a
// originating action that is not a internal action. We verify that there must be a originating action as an
// internal action should never be called by user code from a client

My reading is that the code wants to differentiate the "current action" and "originating action". The logic (simplified) is: If "originating action" is not an internal action, execute the "current action" as system user. So I think the following is a valid scenario since "originating action" is the external_action of step (1):

User invokes external_action (1) -> internal_action (2) -> internal_action (3) executed as system user

With changes of this PR, above is no longer valid, since the "originating action" will be "internal_action" at step (2). It will not be a security issue, since the change is more restrictive. But I am not sure whether it could cause subtle failures somewhere. Is it for TransportClient? Or maybe the logic guarantees switching to system user at step (2), which makes step (3) irrelevant.

Most of its code and comments are from 2016 and 2017. So I lack the context to fully understand them. I am OK with it if it looks fine with @jaymode

@jaymode
Copy link
Member

jaymode commented Sep 10, 2020

I am puzzled by the comments in AuthorizationUtils#shouldReplaceUserWithSystem:

Sorry for the lack of clarity. This is a ugly piece of the code that I'd love for the team to revisit and see if we can improve it. If my comments below make it clearer maybe the comment should be updated in a separate PR. That said the logic is intended to be:

User invokes external_action (1) -> external_action authorized for user to execute -> external_action execution triggers internal_action (2) -> execute internal_action as system

The following is what we do not want to happen:

User invokes internal_action (1) -> execute internal_action as system

Is it for TransportClient?

Yes a lot of this complexity existed because of the transport client.

I think this could be an issue for cases where we do need to switch based on the originating action. @albertzaharovits I suggest leaving the ORIGINATING_ACTION out of the headers that get removed.

@albertzaharovits
Copy link
Contributor Author

I think this could be an issue for cases where we do need to switch based on the originating action. @albertzaharovits I suggest leaving the ORIGINATING_ACTION out of the headers that get removed.

Thank you @ywangd and @jaymode ! I must admit that I've just now realised that the method shouldReplaceeUserWithSystem has both action and originatingAction as two variables...
Until now, every time I looked at the method, I thought they are the same thing.

As much as it hurts me to revert to the behaviour of "leaking" of the ORIGINATING_ACTION header across action contexts, I believe this is the correct behaviour 😢

Thank you @ywangd for the vigilance 👍

@albertzaharovits albertzaharovits merged commit 4b7160d into elastic:master Sep 15, 2020
@albertzaharovits albertzaharovits deleted the authz_action_overrides_privs branch September 15, 2020 10:55
albertzaharovits added a commit that referenced this pull request Sep 15, 2020
AuthorizationService#authorize uses the thread context to carry the result of the
authorisation as transient headers. The listener argument to the `authorize` method
must necessarily observe the header values. This PR makes it so that
the authorisation transient headers (`_indices_permissions` and `_authz_info`, but
NOT `_originating_action_name`) of the child action override the ones of the parent action.

Co-authored-by: Tim Vernum [email protected]
albertzaharovits added a commit to albertzaharovits/elasticsearch that referenced this pull request Sep 15, 2020
AuthorizationService#authorize uses the thread context to carry the result of the
authorisation as transient headers. The listener argument to the `authorize` method
must necessarily observe the header values. This PR makes it so that
the authorisation transient headers (`_indices_permissions` and `_authz_info`, but
NOT `_originating_action_name`) of the child action override the ones of the parent action.

Co-authored-by: Tim Vernum [email protected]
albertzaharovits added a commit that referenced this pull request Sep 15, 2020
AuthorizationService#authorize uses the thread context to carry the result of the
authorisation as transient headers. The listener argument to the `authorize` method
must necessarily observe the header values. This PR makes it so that
the authorisation transient headers (`_indices_permissions` and `_authz_info`, but
NOT `_originating_action_name`) of the child action override the ones of the parent action.

Co-authored-by: Tim Vernum [email protected]
albertzaharovits added a commit to albertzaharovits/elasticsearch that referenced this pull request Sep 15, 2020
AuthorizationService#authorize uses the thread context to carry the result of the
authorisation as transient headers. The listener argument to the `authorize` method
must necessarily observe the header values. This PR makes it so that
the authorisation transient headers (`_indices_permissions` and `_authz_info`, but
NOT `_originating_action_name`) of the child action override the ones of the parent action.

Co-authored-by: Tim Vernum [email protected]
albertzaharovits added a commit that referenced this pull request Sep 15, 2020
AuthorizationService#authorize uses the thread context to carry the result of the
authorisation as transient headers. The listener argument to the `authorize` method
must necessarily observe the header values. This PR makes it so that
the authorisation transient headers (`_indices_permissions` and `_authz_info`, but
NOT `_originating_action_name`) of the child action override the ones of the parent action.

Co-authored-by: Tim Vernum [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v6.8.13 v7.9.2 v7.10.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants