Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid extra allocations in RestGetAliasesAction #126177

Merged

Conversation

JeremyDahlgren
Copy link
Contributor

@JeremyDahlgren JeremyDahlgren commented Apr 3, 2025

When no explicit aliases are provided in the call there is no need to collect the index names or aliases into HashSets if they won't be used.

Also fixed where the index name was being added for each entry in the alias list for the index.

Noticed this while reading the code after looking through a support issue. With the number of aliases seen, this change would avoid 40K HashSet.add() calls (from JSON response in RestGetAliasesAction, but not the _cat API which goes through RestAliasAction). The JSON and _cat paths both use TransportGetAliasesAction. It looks like there may be places to reduce allocations down in there as well.

Based on the comments in RestGetAliasesAction, and reviewing the code in TransportGetAliasesAction and ProjectMetadata.findAliasInfo(), it might be the case that this filtering code in RestGetAliasesAction is no longer needed?

@JeremyDahlgren JeremyDahlgren added >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. labels Apr 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @JeremyDahlgren, I've created a changelog YAML for you.

@JeremyDahlgren JeremyDahlgren force-pushed the get-alias-action-reduce-alloc branch 2 times, most recently from 0fd9816 to 733b053 Compare April 3, 2025 01:34
When no explicit aliases are provided in the call there is no need
to collect the index names or aliases into HashSets if they won't be
used.

Also fixed where the index name was being added for each loop of the
alias list.
@JeremyDahlgren JeremyDahlgren force-pushed the get-alias-action-reduce-alloc branch from 733b053 to ec666bf Compare April 3, 2025 02:13
Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the intention 👍
I think beyond performance we should also take the chance to simplify the code.
I've left a suggestion to look into removing aliasesExplicitlyRequested. With that out, we can refine the code that avoids computing returnedAliasNames that you're proposing.

if (aliasesExplicitlyRequested) {
// only display indices that have aliases
indicesToDisplay.add(cursor.getKey());
if (aliasesExplicitlyRequested || requestedAliases.length > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've peered into the original code...
Does aliasesExplicitlyRequested have any influence? I think we can remove it (consider it either true or false in the original code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be true even when requestedAliases.length == 0 in the case _all is used for the name param. In prepareRequest() it uses request.paramAsStringArrayOrEmptyIfAll("name"). I originally wanted to eliminate aliasesExplicitlyRequested, but some tests broke even when the RestGetAliasesActionTests passed. An example was:

ClientYamlTestSuiteIT > test {yaml=indices.get_alias/10_basic/Get aliases via /_alias/_all} FAILED
    java.lang.AssertionError: Failure at [indices.get_alias/10_basic:70]: field [test_index_3] has a true value but it shouldn't

I added a new unit test case in RestGetAliasesActionTests that uses _all to cover this scenario.

I also refactored to simplify the code and just use aliasesExplicitlyRequested. Please let me know how this looks to you.

.stream()
.flatMap(entry -> entry.getValue().stream())
.forEach(dataStreamAlias -> returnedAliasNames.add(dataStreamAlias.getName()));
}

// compute explicitly requested aliases that have are not returned in the result
final SortedSet<String> missingAliases = new TreeSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the code that computes the missingAliases into a separate function.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should move more of this logic into a TransportSomethingAction rather than doing a bunch of looping over things to build a GetAliasesResponse which then needs a bunch more processing to convert into the final form. This whole thing is a pure function of cluster state. But anyway this looks like a reasonable change to make to me.

.flatMap(entry -> entry.getValue().stream())
.forEach(dataStreamAlias -> returnedAliasNames.add(dataStreamAlias.getName()));
if (aliasesExplicitlyRequested) {
responseAliasMap.entrySet().stream().filter(entry -> entry.getValue().isEmpty() == false).forEach(entry -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streams are themselves pretty expensive in terms of both allocations and general CPU usage. Could we just use a plain old-fashioned loop here?

Comment on lines 95 to 96
dataStreamAliases.entrySet()
.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well destreamify this one here too.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (couple of nonblocking nits)

|| Metadata.ALL.equals(requestedAliases[j].substring(1))) {
// aliases[i] is excluded by aliases[j]
break;
for (final AliasMetadata aliasMetadata : cursor.getValue()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
for (final AliasMetadata aliasMetadata : cursor.getValue()) {
for (final AliasMetadata aliasMetadata : aliases) {

// aliases[i] is not in the result set
missingAliases.add(requestedAliases[i]);

for (final Map.Entry<String, List<DataStreamAlias>> entry : dataStreamAliases.entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't need the keys here right?

Suggested change
for (final Map.Entry<String, List<DataStreamAlias>> entry : dataStreamAliases.entrySet()) {
for (final List<DataStreamAlias> dataStreamAliases : dataStreamAliases.values()) {

@JeremyDahlgren JeremyDahlgren merged commit 7929743 into elastic:main Apr 7, 2025
17 checks passed
mridula-s109 pushed a commit that referenced this pull request Apr 7, 2025
When no explicit aliases are provided in the call there is no need
to collect the index names or aliases into HashSets if they won't be
used. Also fixed where the index name was being added for each
loop of the alias list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants