-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid extra allocations in RestGetAliasesAction
#126177
Avoid extra allocations in RestGetAliasesAction
#126177
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hi @JeremyDahlgren, I've created a changelog YAML for you. |
0fd9816
to
733b053
Compare
When no explicit aliases are provided in the call there is no need to collect the index names or aliases into HashSets if they won't be used. Also fixed where the index name was being added for each loop of the alias list.
733b053
to
ec666bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the intention 👍
I think beyond performance we should also take the chance to simplify the code.
I've left a suggestion to look into removing aliasesExplicitlyRequested
. With that out, we can refine the code that avoids computing returnedAliasNames
that you're proposing.
if (aliasesExplicitlyRequested) { | ||
// only display indices that have aliases | ||
indicesToDisplay.add(cursor.getKey()); | ||
if (aliasesExplicitlyRequested || requestedAliases.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've peered into the original code...
Does aliasesExplicitlyRequested
have any influence? I think we can remove it (consider it either true or false in the original code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be true
even when requestedAliases.length == 0
in the case _all
is used for the name
param. In prepareRequest()
it uses request.paramAsStringArrayOrEmptyIfAll("name")
. I originally wanted to eliminate aliasesExplicitlyRequested
, but some tests broke even when the RestGetAliasesActionTests
passed. An example was:
ClientYamlTestSuiteIT > test {yaml=indices.get_alias/10_basic/Get aliases via /_alias/_all} FAILED
java.lang.AssertionError: Failure at [indices.get_alias/10_basic:70]: field [test_index_3] has a true value but it shouldn't
I added a new unit test case in RestGetAliasesActionTests
that uses _all
to cover this scenario.
I also refactored to simplify the code and just use aliasesExplicitlyRequested
. Please let me know how this looks to you.
.stream() | ||
.flatMap(entry -> entry.getValue().stream()) | ||
.forEach(dataStreamAlias -> returnedAliasNames.add(dataStreamAlias.getName())); | ||
} | ||
|
||
// compute explicitly requested aliases that have are not returned in the result | ||
final SortedSet<String> missingAliases = new TreeSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move the code that computes the missingAliases
into a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should move more of this logic into a TransportSomethingAction
rather than doing a bunch of looping over things to build a GetAliasesResponse
which then needs a bunch more processing to convert into the final form. This whole thing is a pure function of cluster state. But anyway this looks like a reasonable change to make to me.
.flatMap(entry -> entry.getValue().stream()) | ||
.forEach(dataStreamAlias -> returnedAliasNames.add(dataStreamAlias.getName())); | ||
if (aliasesExplicitlyRequested) { | ||
responseAliasMap.entrySet().stream().filter(entry -> entry.getValue().isEmpty() == false).forEach(entry -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streams are themselves pretty expensive in terms of both allocations and general CPU usage. Could we just use a plain old-fashioned loop here?
dataStreamAliases.entrySet() | ||
.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well destreamify this one here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (couple of nonblocking nits)
|| Metadata.ALL.equals(requestedAliases[j].substring(1))) { | ||
// aliases[i] is excluded by aliases[j] | ||
break; | ||
for (final AliasMetadata aliasMetadata : cursor.getValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
for (final AliasMetadata aliasMetadata : cursor.getValue()) { | |
for (final AliasMetadata aliasMetadata : aliases) { |
// aliases[i] is not in the result set | ||
missingAliases.add(requestedAliases[i]); | ||
|
||
for (final Map.Entry<String, List<DataStreamAlias>> entry : dataStreamAliases.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we don't need the keys here right?
for (final Map.Entry<String, List<DataStreamAlias>> entry : dataStreamAliases.entrySet()) { | |
for (final List<DataStreamAlias> dataStreamAliases : dataStreamAliases.values()) { |
When no explicit aliases are provided in the call there is no need to collect the index names or aliases into HashSets if they won't be used. Also fixed where the index name was being added for each loop of the alias list.
When no explicit aliases are provided in the call there is no need to collect the index names or aliases into
HashSet
s if they won't be used.Also fixed where the index name was being added for each entry in the alias list for the index.
Noticed this while reading the code after looking through a support issue. With the number of aliases seen, this change would avoid 40K
HashSet.add()
calls (from JSON response inRestGetAliasesAction
, but not the_cat
API which goes through RestAliasAction). The JSON and_cat
paths both useTransportGetAliasesAction
. It looks like there may be places to reduce allocations down in there as well.Based on the comments in
RestGetAliasesAction
, and reviewing the code inTransportGetAliasesAction
andProjectMetadata.findAliasInfo()
, it might be the case that this filtering code inRestGetAliasesAction
is no longer needed?