Skip to content

Service Accounts - Fleet integration #70724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Mar 31, 2021

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Mar 23, 2021

This PR implements rest of the pieces needed for Fleet integration, including

  • Get service account role descriptor for authorization
  • API for creating service account token and storing in the security index
  • API for list tokens for a service account
  • New named privilege for manage service account
  • Mandate HTTP TLS for both service account auth and service account related APIs
  • Tests for API key related operations using service account

PS: I am happy to break this PR into smaller ones according to the dev roadmap if it helps. I raised a single PR because (1) it is helpful to leverage the CI utilities of a PR ; (2) the meat of the changes is only a few classes, e.g. IndexServiceAccountTokenStore. Many of the request/response/action classes are largely boilerplate.

@ywangd ywangd added >enhancement :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v8.0.0 v7.13.0 labels Mar 23, 2021
@ywangd ywangd requested a review from tvernum March 23, 2021 12:00
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Mar 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@ywangd ywangd marked this pull request as draft March 23, 2021 12:35
@ywangd ywangd force-pushed the service-account-fleet-integration branch from 7f708f5 to 92aea02 Compare March 26, 2021 02:06
Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look

.field("name", serviceAccountToken.getQualifiedName())
.field("creation_time", clock.instant().toEpochMilli())
.field("enabled", true)
.startObject("creator")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

.filter(QueryBuilders.termQuery("doc_type", SERVICE_ACCOUNT_TOKEN_DOC_TYPE))
.must(QueryBuilders.prefixQuery("name", accountId.asPrincipal()));
final SearchRequest request = client.prepareSearch(SECURITY_MAIN_ALIAS)
.setScroll(DEFAULT_KEEPALIVE_SETTING.get(getSettings()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to scroll here, or should we place a limit on how many tokens there can be for a service account?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say we should replace this and a few other places with the current best practice of "PIT + search_after". I need to check the details, but I think PIT is sharable across different requests. Given we need it in places, e.g. users, application privileges, it is not much overhead to have it here as well. It can be a follow up PR and I can raise an issue for it.
Now, it is also possible to cap it as you suggested. But I am not very convinced because:

  1. It feels arbitrary. Yes it is unlikely to have many service tokens. But it does not inherently says otherwise either.
  2. It introduces extra code complexity, e.g. you need query the current count before creating a new one. There can also be complicated concurrent issues, e.g. two requests at the same time, one tries to create and the other tries to delete. We can potential solve these issues, but the effort is not really justified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that scrolls are a limited resource, and we overuse them.
What we really want is semantics that only create a scroll (or PIT etc) if there were more documents than could be returned in a single request, but I don't know of anything like that.

A regular PIT could work, but 99% of the time it's heavier than we need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this could be a feature request for the search team. It is not possible to be perfect unless directly tackled on the shard level. Anyway, I think this worth to be its own separate task and can wait till the current one is over?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it can wait. But I know @albertzaharovits ran into some cases recently where security was over-using scrolls to the detriment of the overall cluster, so I'm extra conscious of them now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, I raised an issue for replacing scroll with PIT + search_after #71032

@ywangd ywangd marked this pull request as ready for review March 26, 2021 11:41
@ywangd
Copy link
Member Author

ywangd commented Mar 26, 2021

Added more comprehensive tests. This PR is now formal.

@ywangd ywangd requested a review from tvernum March 26, 2021 12:48
Comment on lines +75 to +79
builder.endObject().field("file_tokens").startObject();
for (TokenInfo info : tokenInfosBySource.getOrDefault(TokenInfo.TokenSource.FILE, List.of())) {
info.toXContent(builder, params);
}
builder.endObject().endObject();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to output this section if there aren't any? I would have assumed the "file_tokens" field would only exist if there were file tokens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more explicit to have an empty list instead of not shown at all, similar to how the roles field in the GetUser response.


public class CreateServiceAccountTokenResponse extends ActionResponse implements ToXContentObject {

private final boolean created;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this field now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about removing it but decided to keep it in case we need to update a service account token in future. But it is a weak reason and I am happy to remove it.

.filter(QueryBuilders.termQuery("doc_type", SERVICE_ACCOUNT_TOKEN_DOC_TYPE))
.must(QueryBuilders.prefixQuery("name", accountId.asPrincipal()));
final SearchRequest request = client.prepareSearch(SECURITY_MAIN_ALIAS)
.setScroll(DEFAULT_KEEPALIVE_SETTING.get(getSettings()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that scrolls are a limited resource, and we overuse them.
What we really want is semantics that only create a scroll (or PIT etc) if there were more documents than could be returned in a single request, but I don't know of anything like that.

A regular PIT could work, but 99% of the time it's heavier than we need.

public TlsRuntimeCheck(Settings settings) {
this.settings = settings;
this.httpTlsEnabled = XPackSettings.HTTP_SSL_ENABLED.get(settings);
this.transportTlsEnabled = XPackSettings.TRANSPORT_SSL_ENABLED.get(settings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to check TransportTLS here.

  1. It requires transport TLS even when in dev mode
  2. It means you need to set transport.ssl.enabled even if using single node discovery.
  3. It duplicates checks that are done elsewhere

Since security already has checks for transport TLS (outside of trial / dev mode) I don't think it's worth duplicating them here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense I'll remove it and make it only check the HTTP interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "security already has checks for transport TLS", do you mean TLSLicenseBootstrapCheck which errors out if the cluster is in non-trial production? Also I assume the transport TLS check in TransportNodesReloadSecureSettingsAction is redundant as well?

}

public void checkTlsThenExecute(Consumer<Exception> exceptionConsumer, String featureName, Runnable andThen) {
if (false == httpTlsEnabled || false == transportTlsEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enforces TLS even when bound to localhost, and if bootstrap checks are off.
I think that's too extreme.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to have the localhost check etc as a separate task, i.e. it will be strict first and relaxed before beta or GA. But we can also tackle it now.

Using the current token service bootstrap check as a reference, I am implementing the logic as follows:

  • If xpack.security.http.ssl.enabled=true, all good.
  • If security is disabled, skip check (this should not happen, but added for completeness)
  • If all binding addresses are local, skip check.
  • If discovery type is "single-node", skip check.
  • Otherwise throw error.

@ywangd ywangd requested a review from tvernum March 29, 2021 12:08
public void checkTlsThenExecute(Consumer<Exception> exceptionConsumer, String featureName, Runnable andThen) {
// If security is enabled, but TLS is not enabled for the HTTP interface
if (securityEnabled && false == httpTlsEnabled) {
if (false == initialized.get()) {
Copy link
Member Author

@ywangd ywangd Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to this initialized AtomicBoolean is org.elasticsearch.common.MemoizedSupplier. But I think AtomicBoolean could be slightly more performant.

@ywangd
Copy link
Member Author

ywangd commented Mar 31, 2021

@tvernum The rename of elastic/fleet-server service account touched quite a few files and bumped the number of changed files (with rather trivial changes). I hope it does not disturb the review process too much. Please let me know if you have any concerns. Thanks!

@tvernum
Copy link
Contributor

tvernum commented Mar 31, 2021

I think it would have been better not to do that in this PR - but please don't change it now.

This PR is already too big and touches too many things - if it simply added the index based tokens it could have been merged by now but it's been stalled due to unrelated parts like the TLS requirement.

Omnibus PRs almost always slow things down rather than speeding them up.

final boolean boundToLocal = Arrays.stream(transport.boundAddress().boundAddresses())
.allMatch(b -> b.address().getAddress().isLoopbackAddress())
&& transport.boundAddress().publishAddress().address().getAddress().isLoopbackAddress();
this.enforce = false == boundToLocal && false == singleNodeDiscovery;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate that we need to duplicate this check all over the place, but I don't think we should solve it here.

Copy link
Member Author

@ywangd ywangd Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right about this. It was long and hard for me to come up with this solution because:

  • TransportService is not available in createComponents and I realised much later that I can hook into the TransportFactory logic to directly get access to the Transport.
  • Having it done once and passed everywhere else requires quite a lot of changes including potentially change the createComponents method signature and it will be very ugly for this PR.

I will raise an issue for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #71091

exceptionConsumer.accept(new ElasticsearchException(message.getFormattedMessage()));
return;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you, @jkakavas and I have a chat about this logic and whether it's really what we want before we GA service accounts.

I'm conscious that we're making up the detail of the requirements on the fly, and we ought to be more intentional about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should definitely chat about it. With the security on by default project, we need have a written down spec for how it should behave at runtime.

@Override
public List<Route> routes() {
return List.of(
new Route(GET, "/_security/service/{namespace}/{service}/credential")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you previously mentioned that you preferred /service_account/ and {service-name}.
Do you still? Should we discuss?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still slightly prefer /service_account/ over /service/ ({service} is fine). The main reason is that I don't know whether in the future we would need a more accurate usage of /service/, e.g. configures some service within the security domain, say PUT /_security/service/builtin_directory/config. But it may not be an issue at all because it can simply just be PUT /_security/builtin_directory/config. So I think it is a minor personal preference and I am happy to stick to the current design.

@ywangd ywangd merged commit bceb5fb into elastic:master Mar 31, 2021
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Apr 8, 2021
This PR implements rest of the pieces needed for Fleet integration, including:

* Get service account role descriptor for authorization
* API for creating service account token and storing in the security index
* API for list tokens for a service account
* New named privilege for manage service account
* Mandate HTTP TLS for both service account auth and service account related
  APIs
* Tests for API key related operations using service account
ywangd added a commit that referenced this pull request Apr 9, 2021
* Service Accounts - Initial bootstrap plumbing for essential classes (#70391)

This PR is the initial effort to add essential classes for service accounts to
lay down the foundation of future works.  The classes are wired in places, but
not yet been used. Also intentionally left out the actual credential store
implementation. It is a good first commit which does not bring in too many
changes.

* Service Accounts - New CLI tool for managing file tokens (#70454)

This is the second PR for service accounts. It adds a new CLI tool
elasticsearch-service-tokens to manage file tokens. The file tokens are stored
in the service_tokens file under the config directory. Out of the planned create,
remove and list sub-commands, this PR only implements the create function since
it is the most important one. The other two sub-commands will be handled in
separate PRs.

* Service Accounts - Authentication with file tokens (#70543)

This the 3rd PR for service accounts. It adds support for authentication with
file tokens. It also adds a cache for performance so that expensive pbkdf2
hashing does not have to be performed on every request. Adding a cache comes
with its own housekeeping work around invalidation. This PR ensures that cache
gets invalidated when underlying token file is changed. It does not implement
APIs for active invalidation. It will be handled in a separate PR after the API
token is in place.

* [Test] Service Account - fix test assumption

* [Test] Service Accounts - handle token names with leading hyphen (#70983)

The CLI tool needs an option terminator (--) for another option names that
begin with a hyphen. Otherwise it errors out with message of "not recognized
option". The service account token name can begin with a hyphen. Hence we need
to use -- when it is the case. An example of equivalent command line is
./bin/elasticsearch-service-tokens create elastic/fleet -- -lead-with-hyphen.

* Service Accounts - Fleet integration (#70724)

This PR implements rest of the pieces needed for Fleet integration, including:

* Get service account role descriptor for authorization
* API for creating service account token and storing in the security index
* API for list tokens for a service account
* New named privilege for manage service account
* Mandate HTTP TLS for both service account auth and service account related
  APIs
* Tests for API key related operations using service account

* [Test] Service Accounts - Remove colon from invalid token name generator (#71099)

The colon character is interpreted as the separate between token name and token
secret. So if a token name contains a colon, it is in theory invalid. But the
parser takes only the part before the colon as the token name and thus consider
it as a valid token name. Subsequent authentication will still fail. But for
tests, this generates a different exception and fails the expectation. This PR
removes the colon char from being used to generate invalid token names for
simplicity.

* Fix for 7.x quirks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Security Meta label for security team v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants