-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Service Accounts - Fleet integration #70724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service Accounts - Fleet integration #70724
Conversation
Pinging @elastic/es-security (Team:Security) |
7f708f5
to
92aea02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick look
...va/org/elasticsearch/xpack/core/security/action/service/CreateServiceAccountTokenAction.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/core/security/action/service/CreateServiceAccountTokenRequest.java
Show resolved
Hide resolved
...main/java/org/elasticsearch/xpack/security/authc/service/IndexServiceAccountsTokenStore.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/xpack/security/authc/service/IndexServiceAccountsTokenStore.java
Outdated
Show resolved
Hide resolved
.field("name", serviceAccountToken.getQualifiedName()) | ||
.field("creation_time", clock.instant().toEpochMilli()) | ||
.field("enabled", true) | ||
.startObject("creator") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
.filter(QueryBuilders.termQuery("doc_type", SERVICE_ACCOUNT_TOKEN_DOC_TYPE)) | ||
.must(QueryBuilders.prefixQuery("name", accountId.asPrincipal())); | ||
final SearchRequest request = client.prepareSearch(SECURITY_MAIN_ALIAS) | ||
.setScroll(DEFAULT_KEEPALIVE_SETTING.get(getSettings())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to scroll here, or should we place a limit on how many tokens there can be for a service account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to say we should replace this and a few other places with the current best practice of "PIT + search_after". I need to check the details, but I think PIT is sharable across different requests. Given we need it in places, e.g. users, application privileges, it is not much overhead to have it here as well. It can be a follow up PR and I can raise an issue for it.
Now, it is also possible to cap it as you suggested. But I am not very convinced because:
- It feels arbitrary. Yes it is unlikely to have many service tokens. But it does not inherently says otherwise either.
- It introduces extra code complexity, e.g. you need query the current count before creating a new one. There can also be complicated concurrent issues, e.g. two requests at the same time, one tries to create and the other tries to delete. We can potential solve these issues, but the effort is not really justified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that scrolls are a limited resource, and we overuse them.
What we really want is semantics that only create a scroll (or PIT etc) if there were more documents than could be returned in a single request, but I don't know of anything like that.
A regular PIT could work, but 99% of the time it's heavier than we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this could be a feature request for the search team. It is not possible to be perfect unless directly tackled on the shard level. Anyway, I think this worth to be its own separate task and can wait till the current one is over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it can wait. But I know @albertzaharovits ran into some cases recently where security was over-using scrolls to the detriment of the overall cluster, so I'm extra conscious of them now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion, I raised an issue for replacing scroll with PIT + search_after #71032
…security/action/service/CreateServiceAccountTokenAction.java Co-authored-by: Tim Vernum <[email protected]>
Added more comprehensive tests. This PR is now formal. |
builder.endObject().field("file_tokens").startObject(); | ||
for (TokenInfo info : tokenInfosBySource.getOrDefault(TokenInfo.TokenSource.FILE, List.of())) { | ||
info.toXContent(builder, params); | ||
} | ||
builder.endObject().endObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to output this section if there aren't any? I would have assumed the "file_tokens" field would only exist if there were file tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more explicit to have an empty list instead of not shown at all, similar to how the roles
field in the GetUser response.
...elasticsearch/xpack/core/security/action/service/CreateServiceAccountTokenResponseTests.java
Outdated
Show resolved
Hide resolved
...rg/elasticsearch/xpack/core/security/action/service/GetServiceAccountTokensRequestTests.java
Show resolved
Hide resolved
.../main/java/org/elasticsearch/xpack/security/authc/service/FileServiceAccountsTokenStore.java
Outdated
Show resolved
Hide resolved
|
||
public class CreateServiceAccountTokenResponse extends ActionResponse implements ToXContentObject { | ||
|
||
private final boolean created; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this field now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about removing it but decided to keep it in case we need to update a service account token in future. But it is a weak reason and I am happy to remove it.
.filter(QueryBuilders.termQuery("doc_type", SERVICE_ACCOUNT_TOKEN_DOC_TYPE)) | ||
.must(QueryBuilders.prefixQuery("name", accountId.asPrincipal())); | ||
final SearchRequest request = client.prepareSearch(SECURITY_MAIN_ALIAS) | ||
.setScroll(DEFAULT_KEEPALIVE_SETTING.get(getSettings())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that scrolls are a limited resource, and we overuse them.
What we really want is semantics that only create a scroll (or PIT etc) if there were more documents than could be returned in a single request, but I don't know of anything like that.
A regular PIT could work, but 99% of the time it's heavier than we need.
public TlsRuntimeCheck(Settings settings) { | ||
this.settings = settings; | ||
this.httpTlsEnabled = XPackSettings.HTTP_SSL_ENABLED.get(settings); | ||
this.transportTlsEnabled = XPackSettings.TRANSPORT_SSL_ENABLED.get(settings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to check TransportTLS here.
- It requires transport TLS even when in dev mode
- It means you need to set
transport.ssl.enabled
even if using single node discovery. - It duplicates checks that are done elsewhere
Since security already has checks for transport TLS (outside of trial / dev mode) I don't think it's worth duplicating them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense I'll remove it and make it only check the HTTP interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "security already has checks for transport TLS", do you mean TLSLicenseBootstrapCheck which errors out if the cluster is in non-trial production? Also I assume the transport TLS check in TransportNodesReloadSecureSettingsAction is redundant as well?
} | ||
|
||
public void checkTlsThenExecute(Consumer<Exception> exceptionConsumer, String featureName, Runnable andThen) { | ||
if (false == httpTlsEnabled || false == transportTlsEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enforces TLS even when bound to localhost, and if bootstrap checks are off.
I think that's too extreme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to have the localhost check etc as a separate task, i.e. it will be strict first and relaxed before beta or GA. But we can also tackle it now.
Using the current token service bootstrap check as a reference, I am implementing the logic as follows:
- If
xpack.security.http.ssl.enabled=true
, all good. - If security is disabled, skip check (this should not happen, but added for completeness)
- If all binding addresses are local, skip check.
- If discovery type is "single-node", skip check.
- Otherwise throw error.
...rg/elasticsearch/xpack/security/rest/action/service/RestCreateServiceAccountTokenAction.java
Show resolved
Hide resolved
Co-authored-by: Tim Vernum <[email protected]>
public void checkTlsThenExecute(Consumer<Exception> exceptionConsumer, String featureName, Runnable andThen) { | ||
// If security is enabled, but TLS is not enabled for the HTTP interface | ||
if (securityEnabled && false == httpTlsEnabled) { | ||
if (false == initialized.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to this initialized
AtomicBoolean is org.elasticsearch.common.MemoizedSupplier
. But I think AtomicBoolean could be slightly more performant.
I think it would have been better not to do that in this PR - but please don't change it now. This PR is already too big and touches too many things - if it simply added the index based tokens it could have been merged by now but it's been stalled due to unrelated parts like the TLS requirement. Omnibus PRs almost always slow things down rather than speeding them up. |
final boolean boundToLocal = Arrays.stream(transport.boundAddress().boundAddresses()) | ||
.allMatch(b -> b.address().getAddress().isLoopbackAddress()) | ||
&& transport.boundAddress().publishAddress().address().getAddress().isLoopbackAddress(); | ||
this.enforce = false == boundToLocal && false == singleNodeDiscovery; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate that we need to duplicate this check all over the place, but I don't think we should solve it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right about this. It was long and hard for me to come up with this solution because:
TransportService
is not available increateComponents
and I realised much later that I can hook into the TransportFactory logic to directly get access to theTransport
.- Having it done once and passed everywhere else requires quite a lot of changes including potentially change the
createComponents
method signature and it will be very ugly for this PR.
I will raise an issue for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised #71091
exceptionConsumer.accept(new ElasticsearchException(message.getFormattedMessage())); | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you, @jkakavas and I have a chat about this logic and whether it's really what we want before we GA service accounts.
I'm conscious that we're making up the detail of the requirements on the fly, and we ought to be more intentional about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should definitely chat about it. With the security on by default project, we need have a written down spec for how it should behave at runtime.
@Override | ||
public List<Route> routes() { | ||
return List.of( | ||
new Route(GET, "/_security/service/{namespace}/{service}/credential") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you previously mentioned that you preferred /service_account/
and {service-name}
.
Do you still? Should we discuss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still slightly prefer /service_account/
over /service/
({service}
is fine). The main reason is that I don't know whether in the future we would need a more accurate usage of /service/
, e.g. configures some service within the security domain, say PUT /_security/service/builtin_directory/config
. But it may not be an issue at all because it can simply just be PUT /_security/builtin_directory/config
. So I think it is a minor personal preference and I am happy to stick to the current design.
This PR implements rest of the pieces needed for Fleet integration, including: * Get service account role descriptor for authorization * API for creating service account token and storing in the security index * API for list tokens for a service account * New named privilege for manage service account * Mandate HTTP TLS for both service account auth and service account related APIs * Tests for API key related operations using service account
* Service Accounts - Initial bootstrap plumbing for essential classes (#70391) This PR is the initial effort to add essential classes for service accounts to lay down the foundation of future works. The classes are wired in places, but not yet been used. Also intentionally left out the actual credential store implementation. It is a good first commit which does not bring in too many changes. * Service Accounts - New CLI tool for managing file tokens (#70454) This is the second PR for service accounts. It adds a new CLI tool elasticsearch-service-tokens to manage file tokens. The file tokens are stored in the service_tokens file under the config directory. Out of the planned create, remove and list sub-commands, this PR only implements the create function since it is the most important one. The other two sub-commands will be handled in separate PRs. * Service Accounts - Authentication with file tokens (#70543) This the 3rd PR for service accounts. It adds support for authentication with file tokens. It also adds a cache for performance so that expensive pbkdf2 hashing does not have to be performed on every request. Adding a cache comes with its own housekeeping work around invalidation. This PR ensures that cache gets invalidated when underlying token file is changed. It does not implement APIs for active invalidation. It will be handled in a separate PR after the API token is in place. * [Test] Service Account - fix test assumption * [Test] Service Accounts - handle token names with leading hyphen (#70983) The CLI tool needs an option terminator (--) for another option names that begin with a hyphen. Otherwise it errors out with message of "not recognized option". The service account token name can begin with a hyphen. Hence we need to use -- when it is the case. An example of equivalent command line is ./bin/elasticsearch-service-tokens create elastic/fleet -- -lead-with-hyphen. * Service Accounts - Fleet integration (#70724) This PR implements rest of the pieces needed for Fleet integration, including: * Get service account role descriptor for authorization * API for creating service account token and storing in the security index * API for list tokens for a service account * New named privilege for manage service account * Mandate HTTP TLS for both service account auth and service account related APIs * Tests for API key related operations using service account * [Test] Service Accounts - Remove colon from invalid token name generator (#71099) The colon character is interpreted as the separate between token name and token secret. So if a token name contains a colon, it is in theory invalid. But the parser takes only the part before the colon as the token name and thus consider it as a valid token name. Subsequent authentication will still fail. But for tests, this generates a different exception and fails the expectation. This PR removes the colon char from being used to generate invalid token names for simplicity. * Fix for 7.x quirks
This PR implements rest of the pieces needed for Fleet integration, including
PS: I am happy to break this PR into smaller ones according to the dev roadmap if it helps. I raised a single PR because (1) it is helpful to leverage the CI utilities of a PR ; (2) the meat of the changes is only a few classes, e.g.
IndexServiceAccountTokenStore
. Many of the request/response/action classes are largely boilerplate.