Skip to content

Commit 34b6383

Browse files
dmeissChristoph Büscher
authored and
Christoph Büscher
committed
Edits to text of Profile API documentation (#38742)
Minor edits of text.
1 parent 251ced7 commit 34b6383

File tree

1 file changed

+24
-24
lines changed

1 file changed

+24
-24
lines changed

docs/reference/search/profile.asciidoc

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -204,16 +204,16 @@ by a unique ID
204204

205205
Because a search request may be executed against one or more shards in an index, and a search may cover
206206
one or more indices, the top level element in the profile response is an array of `shard` objects.
207-
Each shard object lists it's `id` which uniquely identifies the shard. The ID's format is
207+
Each shard object lists its `id` which uniquely identifies the shard. The ID's format is
208208
`[nodeID][indexName][shardID]`.
209209

210210
The profile itself may consist of one or more "searches", where a search is a query executed against the underlying
211-
Lucene index. Most Search Requests submitted by the user will only execute a single `search` against the Lucene index.
211+
Lucene index. Most search requests submitted by the user will only execute a single `search` against the Lucene index.
212212
But occasionally multiple searches will be executed, such as including a global aggregation (which needs to execute
213213
a secondary "match_all" query for the global context).
214214

215215
Inside each `search` object there will be two arrays of profiled information:
216-
a `query` array and a `collector` array. Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc
216+
a `query` array and a `collector` array. Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc.
217217

218218
There will also be a `rewrite` metric showing the total time spent rewriting the query (in nanoseconds).
219219

@@ -344,12 +344,12 @@ The meaning of the stats are as follows:
344344
`build_scorer`::
345345

346346
This parameter shows how long it takes to build a Scorer for the query. A Scorer is the mechanism that
347-
iterates over matching documents generates a score per-document (e.g. how well does "foo" match the document?).
347+
iterates over matching documents and generates a score per-document (e.g. how well does "foo" match the document?).
348348
Note, this records the time required to generate the Scorer object, not actually score the documents. Some
349349
queries have faster or slower initialization of the Scorer, depending on optimizations, complexity, etc.
350350
{empty} +
351351
{empty} +
352-
This may also showing timing associated with caching, if enabled and/or applicable for the query
352+
This may also show timing associated with caching, if enabled and/or applicable for the query
353353

354354
`next_doc`::
355355

@@ -369,7 +369,7 @@ The meaning of the stats are as follows:
369369

370370
`matches`::
371371

372-
Some queries, such as phrase queries, match documents using a "Two Phase" process. First, the document is
372+
Some queries, such as phrase queries, match documents using a "two-phase" process. First, the document is
373373
"approximately" matched, and if it matches approximately, it is checked a second time with a more rigorous
374374
(and expensive) process. The second phase verification is what the `matches` statistic measures.
375375
{empty} +
@@ -384,7 +384,7 @@ The meaning of the stats are as follows:
384384

385385
`score`::
386386

387-
This records the time taken to score a particular document via it's Scorer
387+
This records the time taken to score a particular document via its Scorer
388388

389389
`*_count`::
390390
Records the number of invocations of the particular method. For example, `"next_doc_count": 2,`
@@ -394,7 +394,7 @@ The meaning of the stats are as follows:
394394
==== `collectors` Section
395395

396396
The Collectors portion of the response shows high-level execution details. Lucene works by defining a "Collector"
397-
which is responsible for coordinating the traversal, scoring and collection of matching documents. Collectors
397+
which is responsible for coordinating the traversal, scoring, and collection of matching documents. Collectors
398398
are also how a single query can record aggregation results, execute unscoped "global" queries, execute post-query
399399
filters, etc.
400400

@@ -422,16 +422,16 @@ Looking at the previous example:
422422
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
423423

424424
We see a single collector named `SimpleTopScoreDocCollector` wrapped into `CancellableCollector`. `SimpleTopScoreDocCollector` is the default "scoring and sorting"
425-
`Collector` used by Elasticsearch. The `reason` field attempts to give a plain english description of the class name. The
425+
`Collector` used by Elasticsearch. The `reason` field attempts to give a plain English description of the class name. The
426426
`time_in_nanos` is similar to the time in the Query tree: a wall-clock time inclusive of all children. Similarly, `children` lists
427427
all sub-collectors. The `CancellableCollector` that wraps `SimpleTopScoreDocCollector` is used by Elasticsearch to detect if the current
428428
search was cancelled and stop collecting documents as soon as it occurs.
429429

430-
It should be noted that Collector times are **independent** from the Query times. They are calculated, combined
430+
It should be noted that Collector times are **independent** from the Query times. They are calculated, combined,
431431
and normalized independently! Due to the nature of Lucene's execution, it is impossible to "merge" the times
432432
from the Collectors into the Query section, so they are displayed in separate portions.
433433

434-
For reference, the various collector reason's are:
434+
For reference, the various collector reasons are:
435435

436436
[horizontal]
437437
`search_sorted`::
@@ -457,7 +457,7 @@ For reference, the various collector reason's are:
457457
`search_multi`::
458458

459459
A collector that wraps several other collectors. This is seen when combinations of search, aggregations,
460-
global aggs and post_filters are combined in a single search.
460+
global aggs, and post_filters are combined in a single search.
461461

462462
`search_timeout`::
463463

@@ -473,7 +473,7 @@ For reference, the various collector reason's are:
473473
`global_aggregation`::
474474

475475
A collector that executes an aggregation against the global query scope, rather than the specified query.
476-
Because the global scope is necessarily different from the executed query, it must execute it's own
476+
Because the global scope is necessarily different from the executed query, it must execute its own
477477
match_all query (which you will see added to the Query section) to collect your entire dataset
478478

479479

@@ -648,9 +648,9 @@ And the response:
648648
// TESTRESPONSE[s/\.\.\.//]
649649
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
650650
// TESTRESPONSE[s/"id": "\[P6-vulHtQRWuD4YnubWb7A\]\[test\]\[0\]"/"id": $body.profile.shards.0.id/]
651-
<1> The ``"aggregations"` portion has been omitted because it will be covered in the next section
651+
<1> The `"aggregations"` portion has been omitted because it will be covered in the next section
652652

653-
As you can see, the output is significantly verbose from before. All the major portions of the query are
653+
As you can see, the output is significantly more verbose than before. All the major portions of the query are
654654
represented:
655655

656656
1. The first `TermQuery` (user:test) represents the main `term` query
@@ -662,14 +662,14 @@ The Collector tree is fairly straightforward, showing how a single CancellableCo
662662

663663
==== Understanding MultiTermQuery output
664664

665-
A special note needs to be made about the `MultiTermQuery` class of queries. This includes wildcards, regex and fuzzy
665+
A special note needs to be made about the `MultiTermQuery` class of queries. This includes wildcards, regex, and fuzzy
666666
queries. These queries emit very verbose responses, and are not overly structured.
667667

668668
Essentially, these queries rewrite themselves on a per-segment basis. If you imagine the wildcard query `b*`, it technically
669669
can match any token that begins with the letter "b". It would be impossible to enumerate all possible combinations,
670-
so Lucene rewrites the query in context of the segment being evaluated. E.g. one segment may contain the tokens
670+
so Lucene rewrites the query in context of the segment being evaluated, e.g., one segment may contain the tokens
671671
`[bar, baz]`, so the query rewrites to a BooleanQuery combination of "bar" and "baz". Another segment may only have the
672-
token `[bakery]`, so query rewrites to a single TermQuery for "bakery".
672+
token `[bakery]`, so the query rewrites to a single TermQuery for "bakery".
673673

674674
Due to this dynamic, per-segment rewriting, the clean tree structure becomes distorted and no longer follows a clean
675675
"lineage" showing how one query rewrites into the next. At present time, all we can do is apologize, and suggest you
@@ -729,7 +729,7 @@ GET /twitter/_search
729729
// TEST[s/_search/_search\?filter_path=profile.shards.aggregations/]
730730
// TEST[continued]
731731

732-
Which yields the following aggregation profile output
732+
This yields the following aggregation profile output:
733733

734734
[source,js]
735735
--------------------------------------------------
@@ -797,7 +797,7 @@ Which yields the following aggregation profile output
797797

798798
From the profile structure we can see that the `my_scoped_agg` is internally being run as a `LongTermsAggregator` (because the field it is
799799
aggregating, `likes`, is a numeric field). At the same level, we see a `GlobalAggregator` which comes from `my_global_agg`. That
800-
aggregation then has a child `LongTermsAggregator` which from the second terms aggregation on `likes`.
800+
aggregation then has a child `LongTermsAggregator` which comes from the second term's aggregation on `likes`.
801801

802802
The `time_in_nanos` field shows the time executed by each aggregation, and is inclusive of all children. While the overall time is useful,
803803
the `breakdown` field will give detailed stats about how the time was spent.
@@ -859,7 +859,7 @@ The meaning of the stats are as follows:
859859
==== Performance Notes
860860

861861
Like any profiler, the Profile API introduces a non-negligible overhead to search execution. The act of instrumenting
862-
low-level method calls such as `collect`, `advance` and `next_doc` can be fairly expensive, since these methods are called
862+
low-level method calls such as `collect`, `advance`, and `next_doc` can be fairly expensive, since these methods are called
863863
in tight loops. Therefore, profiling should not be enabled in production settings by default, and should not
864864
be compared against non-profiled query times. Profiling is just a diagnostic tool.
865865

@@ -871,11 +871,11 @@ not have a drastic effect compared to other components in the profiled query.
871871
==== Limitations
872872

873873
- Profiling currently does not measure the search fetch phase nor the network overhead
874-
- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node or
875-
additional work like e.g. building global ordinals (an internal data structure used to speed up search)
874+
- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node, or
875+
additional work such as building global ordinals (an internal data structure used to speed up search)
876876
- Profiling statistics are currently not available for suggestions, highlighting, `dfs_query_then_fetch`
877877
- Profiling of the reduce phase of aggregation is currently not available
878878
- The Profiler is still highly experimental. The Profiler is instrumenting parts of Lucene that were
879879
never designed to be exposed in this manner, and so all results should be viewed as a best effort to provide detailed
880-
diagnostics. We hope to improve this over time. If you find obviously wrong numbers, strange query structures or
880+
diagnostics. We hope to improve this over time. If you find obviously wrong numbers, strange query structures, or
881881
other bugs, please report them!

0 commit comments

Comments
 (0)