Skip to content

Commit 86a6523

Browse files
dmeissChristoph Büscher
authored and
Christoph Büscher
committed
Edits to text of Profile API documentation (#38742)
Minor edits of text.
1 parent 51b4c98 commit 86a6523

File tree

1 file changed

+24
-24
lines changed

1 file changed

+24
-24
lines changed

docs/reference/search/profile.asciidoc

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -189,16 +189,16 @@ by a unique ID
189189

190190
Because a search request may be executed against one or more shards in an index, and a search may cover
191191
one or more indices, the top level element in the profile response is an array of `shard` objects.
192-
Each shard object lists it's `id` which uniquely identifies the shard. The ID's format is
192+
Each shard object lists its `id` which uniquely identifies the shard. The ID's format is
193193
`[nodeID][indexName][shardID]`.
194194

195195
The profile itself may consist of one or more "searches", where a search is a query executed against the underlying
196-
Lucene index. Most Search Requests submitted by the user will only execute a single `search` against the Lucene index.
196+
Lucene index. Most search requests submitted by the user will only execute a single `search` against the Lucene index.
197197
But occasionally multiple searches will be executed, such as including a global aggregation (which needs to execute
198198
a secondary "match_all" query for the global context).
199199

200200
Inside each `search` object there will be two arrays of profiled information:
201-
a `query` array and a `collector` array. Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc
201+
a `query` array and a `collector` array. Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc.
202202

203203
There will also be a `rewrite` metric showing the total time spent rewriting the query (in nanoseconds).
204204

@@ -325,12 +325,12 @@ The meaning of the stats are as follows:
325325
`build_scorer`::
326326

327327
This parameter shows how long it takes to build a Scorer for the query. A Scorer is the mechanism that
328-
iterates over matching documents generates a score per-document (e.g. how well does "foo" match the document?).
328+
iterates over matching documents and generates a score per-document (e.g. how well does "foo" match the document?).
329329
Note, this records the time required to generate the Scorer object, not actually score the documents. Some
330330
queries have faster or slower initialization of the Scorer, depending on optimizations, complexity, etc.
331331
{empty} +
332332
{empty} +
333-
This may also showing timing associated with caching, if enabled and/or applicable for the query
333+
This may also show timing associated with caching, if enabled and/or applicable for the query
334334

335335
`next_doc`::
336336

@@ -350,7 +350,7 @@ The meaning of the stats are as follows:
350350

351351
`matches`::
352352

353-
Some queries, such as phrase queries, match documents using a "Two Phase" process. First, the document is
353+
Some queries, such as phrase queries, match documents using a "two-phase" process. First, the document is
354354
"approximately" matched, and if it matches approximately, it is checked a second time with a more rigorous
355355
(and expensive) process. The second phase verification is what the `matches` statistic measures.
356356
{empty} +
@@ -365,7 +365,7 @@ The meaning of the stats are as follows:
365365

366366
`score`::
367367

368-
This records the time taken to score a particular document via it's Scorer
368+
This records the time taken to score a particular document via its Scorer
369369

370370
`*_count`::
371371
Records the number of invocations of the particular method. For example, `"next_doc_count": 2,`
@@ -375,7 +375,7 @@ The meaning of the stats are as follows:
375375
==== `collectors` Section
376376

377377
The Collectors portion of the response shows high-level execution details. Lucene works by defining a "Collector"
378-
which is responsible for coordinating the traversal, scoring and collection of matching documents. Collectors
378+
which is responsible for coordinating the traversal, scoring, and collection of matching documents. Collectors
379379
are also how a single query can record aggregation results, execute unscoped "global" queries, execute post-query
380380
filters, etc.
381381

@@ -403,16 +403,16 @@ Looking at the previous example:
403403
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
404404

405405
We see a single collector named `SimpleTopScoreDocCollector` wrapped into `CancellableCollector`. `SimpleTopScoreDocCollector` is the default "scoring and sorting"
406-
`Collector` used by Elasticsearch. The `reason` field attempts to give a plain english description of the class name. The
406+
`Collector` used by Elasticsearch. The `reason` field attempts to give a plain English description of the class name. The
407407
`time_in_nanos` is similar to the time in the Query tree: a wall-clock time inclusive of all children. Similarly, `children` lists
408408
all sub-collectors. The `CancellableCollector` that wraps `SimpleTopScoreDocCollector` is used by Elasticsearch to detect if the current
409409
search was cancelled and stop collecting documents as soon as it occurs.
410410

411-
It should be noted that Collector times are **independent** from the Query times. They are calculated, combined
411+
It should be noted that Collector times are **independent** from the Query times. They are calculated, combined,
412412
and normalized independently! Due to the nature of Lucene's execution, it is impossible to "merge" the times
413413
from the Collectors into the Query section, so they are displayed in separate portions.
414414

415-
For reference, the various collector reason's are:
415+
For reference, the various collector reasons are:
416416

417417
[horizontal]
418418
`search_sorted`::
@@ -438,7 +438,7 @@ For reference, the various collector reason's are:
438438
`search_multi`::
439439

440440
A collector that wraps several other collectors. This is seen when combinations of search, aggregations,
441-
global aggs and post_filters are combined in a single search.
441+
global aggs, and post_filters are combined in a single search.
442442

443443
`search_timeout`::
444444

@@ -454,7 +454,7 @@ For reference, the various collector reason's are:
454454
`global_aggregation`::
455455

456456
A collector that executes an aggregation against the global query scope, rather than the specified query.
457-
Because the global scope is necessarily different from the executed query, it must execute it's own
457+
Because the global scope is necessarily different from the executed query, it must execute its own
458458
match_all query (which you will see added to the Query section) to collect your entire dataset
459459

460460

@@ -621,9 +621,9 @@ And the response:
621621
// TESTRESPONSE[s/\.\.\.//]
622622
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
623623
// TESTRESPONSE[s/"id": "\[P6-vulHtQRWuD4YnubWb7A\]\[test\]\[0\]"/"id": $body.profile.shards.0.id/]
624-
<1> The ``"aggregations"` portion has been omitted because it will be covered in the next section
624+
<1> The `"aggregations"` portion has been omitted because it will be covered in the next section
625625

626-
As you can see, the output is significantly verbose from before. All the major portions of the query are
626+
As you can see, the output is significantly more verbose than before. All the major portions of the query are
627627
represented:
628628

629629
1. The first `TermQuery` (user:test) represents the main `term` query
@@ -635,14 +635,14 @@ The Collector tree is fairly straightforward, showing how a single CancellableCo
635635

636636
==== Understanding MultiTermQuery output
637637

638-
A special note needs to be made about the `MultiTermQuery` class of queries. This includes wildcards, regex and fuzzy
638+
A special note needs to be made about the `MultiTermQuery` class of queries. This includes wildcards, regex, and fuzzy
639639
queries. These queries emit very verbose responses, and are not overly structured.
640640

641641
Essentially, these queries rewrite themselves on a per-segment basis. If you imagine the wildcard query `b*`, it technically
642642
can match any token that begins with the letter "b". It would be impossible to enumerate all possible combinations,
643-
so Lucene rewrites the query in context of the segment being evaluated. E.g. one segment may contain the tokens
643+
so Lucene rewrites the query in context of the segment being evaluated, e.g., one segment may contain the tokens
644644
`[bar, baz]`, so the query rewrites to a BooleanQuery combination of "bar" and "baz". Another segment may only have the
645-
token `[bakery]`, so query rewrites to a single TermQuery for "bakery".
645+
token `[bakery]`, so the query rewrites to a single TermQuery for "bakery".
646646

647647
Due to this dynamic, per-segment rewriting, the clean tree structure becomes distorted and no longer follows a clean
648648
"lineage" showing how one query rewrites into the next. At present time, all we can do is apologize, and suggest you
@@ -702,7 +702,7 @@ GET /twitter/_search
702702
// TEST[s/_search/_search\?filter_path=profile.shards.aggregations/]
703703
// TEST[continued]
704704

705-
Which yields the following aggregation profile output
705+
This yields the following aggregation profile output:
706706

707707
[source,js]
708708
--------------------------------------------------
@@ -770,7 +770,7 @@ Which yields the following aggregation profile output
770770

771771
From the profile structure we can see that the `my_scoped_agg` is internally being run as a `LongTermsAggregator` (because the field it is
772772
aggregating, `likes`, is a numeric field). At the same level, we see a `GlobalAggregator` which comes from `my_global_agg`. That
773-
aggregation then has a child `LongTermsAggregator` which from the second terms aggregation on `likes`.
773+
aggregation then has a child `LongTermsAggregator` which comes from the second term's aggregation on `likes`.
774774

775775
The `time_in_nanos` field shows the time executed by each aggregation, and is inclusive of all children. While the overall time is useful,
776776
the `breakdown` field will give detailed stats about how the time was spent.
@@ -832,7 +832,7 @@ The meaning of the stats are as follows:
832832
==== Performance Notes
833833

834834
Like any profiler, the Profile API introduces a non-negligible overhead to search execution. The act of instrumenting
835-
low-level method calls such as `collect`, `advance` and `next_doc` can be fairly expensive, since these methods are called
835+
low-level method calls such as `collect`, `advance`, and `next_doc` can be fairly expensive, since these methods are called
836836
in tight loops. Therefore, profiling should not be enabled in production settings by default, and should not
837837
be compared against non-profiled query times. Profiling is just a diagnostic tool.
838838

@@ -844,11 +844,11 @@ not have a drastic effect compared to other components in the profiled query.
844844
==== Limitations
845845

846846
- Profiling currently does not measure the search fetch phase nor the network overhead
847-
- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node or
848-
additional work like e.g. building global ordinals (an internal data structure used to speed up search)
847+
- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node, or
848+
additional work such as building global ordinals (an internal data structure used to speed up search)
849849
- Profiling statistics are currently not available for suggestions, highlighting, `dfs_query_then_fetch`
850850
- Profiling of the reduce phase of aggregation is currently not available
851851
- The Profiler is still highly experimental. The Profiler is instrumenting parts of Lucene that were
852852
never designed to be exposed in this manner, and so all results should be viewed as a best effort to provide detailed
853-
diagnostics. We hope to improve this over time. If you find obviously wrong numbers, strange query structures or
853+
diagnostics. We hope to improve this over time. If you find obviously wrong numbers, strange query structures, or
854854
other bugs, please report them!

0 commit comments

Comments
 (0)